AI Glossary: What Is Adversarial Training (AT)? Definition & Meaning

Gegenspielertraining

Adversariales Training is a method im maschinellen Lernen, particularly in the Bereich der künstlichen Intelligenz verwendet wird, to enhance the robustness of models against adversarial attacks. Adversarial attacks involve intentionally crafting inputs that are designed to deceive or mislead the model, often leading to incorrect predictions or classifications.

In adversarial training, the model is exposed to both normal data and adversarial examples during the training process. These adversarial examples are generated using specific algorithms that manipulate the original inputs in subtle ways, often imperceptible to humans, but capable of causing the model to make mistakes. By including these challenging examples in the Trainingsdaten, the model learns to recognize and resist such manipulations.

Der Prozess umfasst typischerweise die folgenden Schritte:

Erzeuge adversariale Beispiele: Techniques like the Schnelle Gradient-Sign-Methode (FGSM) or Projected Gradient Descent (PGD) are used to create adversarial inputs from the training data.
Trainiere das Modell: The model is trained on a combined dataset that includes both regular and adversarial examples, allowing it to adapt and learn to handle these deceptive inputs.
Bewerte die Robustheit: After training, the model is tested on a separate set of adversarial examples to assess its Fähigkeit, die Leistung angesichts von Angriffen aufrechtzuerhalten.

Adversarial training has been shown to improve the resilience of machine learning models, making them less susceptible to attacks. However, it is not a panacea; while it can significantly enhance robustness, it may also lead to a decrease in performance on standard data if not properly balanced. As KI-Systemen become increasingly integrated into critical applications, the importance of techniques like adversarial training becomes paramount for ensuring their reliability and safety.