AI Glossary: What Is Model Poisoning (MP)? Definition & Meaning

Envenenamento de modelo é um tipo de ataque adversarial on aprendizado de máquina systems where an attacker intentionally manipulates the dados de treinamento used to build a model. This manipulation can lead to the modelo a aprender incorrect patterns or making biased predictions, ultimately undermining its reliability and effectiveness. The attacker typically aims to introduce harmful data points into the dataset, which are designed to mislead the model during the training phase.

Na prática, o envenenamento de modelos pode ocorrer em vários cenários, especialmente em aprendizado colaborativo environments where multiple participants contribute to a shared model. For instance, in federated learning, where multiple devices train a model collectively without sharing their data, an attacker may alter their local dataset to influence the overall model’s performance negatively.

Existem várias técnicas que os atacantes podem empregar durante um ataque de intoxicação de modelos. Por exemplo, eles podem inserir dados que representam incorretamente a verdadeira distribuição dos dados, criar outliers que distorcem o aprendizado do modelo ou introduzir exemplos específicos que levam o modelo a fazer previsões incorretas em tarefas críticas. O impacto da intoxicação de modelos pode variar desde uma degradação sutil do desempenho até falhas catastróficas quando o modelo é implantado em aplicações do mundo real.

To defend against model poisoning, researchers and practitioners employ various strategies, such as detecção de anomalias to identify suspicious data, robust learning algorithms that are less sensitive to outliers, and regular audits of the training data to ensure its integrity. Understanding model poisoning is crucial for developing resilient AI systems that maintain their performance and ethical standards in the face of potential attacks.