AI Glossary: Interpretability Terms & Definitions

Concept Activation Vector

CAV

A Concept Activation Vector (CAV) is a mathematical representation used in AI to identify and quantify concepts in neural networks.

FA

Feature attribution identifies the contribution of individual features to a model's predictions.

Interpretability AI focuses on making AI models understandable to humans, enhancing trust and transparency.

An Interpretability Score quantifies how easily a model's predictions can be understood by humans.

IML

Interpretable Machine Learning focuses on making AI models understandable to humans.

LIME is a technique for interpreting machine learning models by explaining individual predictions.

LIME

Local Interpretable Models help explain AI predictions by approximating complex models with simpler, interpretable ones.

MI

Mechanistic Interpretability is the study of understanding how AI models make decisions by examining their internal processes.

SHAP

SHAP Values explain how much each feature contributes to a model's prediction.