AI Glossary: What Is Model Poisoning (MP)? Definition & Meaning

El envenenamiento de modelos es un tipo de ataque adversarial on aprendizaje automático systems where an attacker intentionally manipulates the datos de entrenamiento used to build a model. This manipulation can lead to the modelo aprenda incorrect patterns or making biased predictions, ultimately undermining its reliability and effectiveness. The attacker typically aims to introduce harmful data points into the dataset, which are designed to mislead the model during the training phase.

En la práctica, el envenenamiento de modelos puede ocurrir en varios escenarios, especialmente en aprendizaje colaborativo environments where multiple participants contribute to a shared model. For instance, in federated learning, where multiple devices train a model collectively without sharing their data, an attacker may alter their local dataset to influence the overall model’s performance negatively.

Existen varias técnicas que los atacantes pueden emplear durante un ataque de envenenamiento de modelos. Por ejemplo, podrían inyectar datos que malinterpreten la verdadera distribución de los datos, crear valores atípicos que distorsionen el aprendizaje del modelo, o introducir ejemplos específicos que empujen al modelo a hacer predicciones incorrectas en tareas críticas. El impacto del envenenamiento de modelos puede variar desde una degradación sutil del rendimiento hasta fallos catastróficos cuando el modelo se despliega en aplicaciones del mundo real.

To defend against model poisoning, researchers and practitioners employ various strategies, such as detección de anomalías to identify suspicious data, robust learning algorithms that are less sensitive to outliers, and regular audits of the training data to ensure its integrity. Understanding model poisoning is crucial for developing resilient AI systems that maintain their performance and ethical standards in the face of potential attacks.