AI Glossary: What Is Interpretability? Definition & Meaning

解釈性

解釈性 in the context of 人工知能 (AI) refers to the degree to which a human can comprehend the reasons, mechanisms, and processes that an AI model uses to arrive at its predictions or decisions. As AIシステム become increasingly complex, especially with the rise of 深層学習, understanding how they make choices is crucial for trust, accountability, and transparency.

解釈性には主に二つの側面があります：

モデルの解釈性: This pertains to the design of the AI model itself. Some models, such as linear regression or decision trees, are inherently interpretable because their structure allows for straightforward insights into how input features influence output predictions. In contrast, deep neural networks are often considered ‘black boxes’ due to their intricate architectures, making it difficult to trace how inputs are transformed into outputs.
ポストホック解釈性： This involves techniques applied after a model has been trained to help users understand its behavior. Methods such as feature importance scores, LIME (ローカル解釈可能モデル非依存の説明), and SHAP (SHapley Additive exPlanations) provide insights into which features are most influential in a model’s predictions.

解釈性は、医療、金融、刑事司法などの healthcare, finance, and criminal justice, where understanding the basis of decisions can significantly impact individuals’ lives. As AI systems are deployed more widely, ensuring they are interpretable helps foster trust among users and encourages more ethical applications of technology.