AI Glossary: What Is Gradient Masking (GM)? Definition & Meaning

Gradient Maskierung is a defensive technique employed in maschinellem Lernen to mitigate the vulnerability of models against adversarialen Angriffen zu verringern.. Adversarial attacks involve making small, often imperceptible perturbations to input data that can drastically alter the model’s predictions. These attacks exploit the gradients, or the derivatives of the Verlustfunktion, which indicate how sensitive the model’s output is to changes in its Eingabe.

Beim Gradient Masking wird das Modell so gestaltet, dass die Gradienten weniger informativ oder irreführend für potenzielle Angreifer werden. Dies kann durch verschiedene Methoden erreicht werden, darunter:

Rauschen hinzufügen: Introducing random noise to the gradients can obscure the true direction and magnitude of updates that an adversary might use um adversarielle Beispiele zu generieren.
Verwendung nicht differenzierbarer Funktionen: Implementing components that are not differentiable can make it difficult for attackers to compute gradients accurately.
Gradienten verschleiern: Modifying the loss function in a way that hides the true gradients can create a false sense of security.

While gradient masking can provide a temporary shield against certain types of attacks, it is important to note that it does not offer a foolproof solution. Skilled adversaries may still find ways to exploit masked gradients, leading to a phenomenon known as ‘adversarial training’, where models are trained on both clean and adversarial examples to improve their robustness.

Overall, while gradient masking is a valuable tool in the arsenal of defenses against adversarial attacks, it should be used in conjunction with other strategies to ensure a more comprehensive approach to Modellsicherheit.