Explore 25 AI terms in AI Safety
Agent Collapse refers to a failure in AI systems where agents cease to function effectively, often due to alignment issues.
AI risk refers to potential negative consequences arising from the development and deployment of artificial intelligence systems.
A framework categorizing AI systems based on their alignment with human values and intentions.
Anthropic refers to concepts or principles related to human existence and the implications for AI safety and ethics.
Corrigibility refers to an AI's ability to accept corrections and updates while remaining aligned with user intentions.
Capabilities of AI that pose risks to safety, privacy, or ethical standards.
Dark Knowledge refers to the insights and strategies gained from adversarial learning and attacks in AI systems.
Deceptive Alignment refers to a situation where an AI's goals appear aligned with human values but actually lead to unintended consequences.
A failure mode is a specific way in which a system or component can fail, affecting its functionality or performance.
A false alarm in AI refers to a situation where an alarm is triggered without a genuine threat or event occurring.
Goal misgeneralization occurs when AI systems pursue unintended objectives due to misinterpretations of their goals.
Hallucination AI refers to instances where AI generates false or misleading information confidently.
Hallucination Cascade refers to a compounding effect in AI where initial inaccuracies lead to further erroneous outputs.
The Helpfulness-Harmlessness Tradeoff is a balance between AI providing useful assistance and the risks of causing harm.
Human Oversight refers to the involvement of people in monitoring and guiding AI systems to ensure ethical and accurate decision-making.
Inner Alignment refers to the alignment of an AI's goals with human intentions during its operation.
An intelligence explosion refers to a rapid increase in artificial intelligence capabilities, often leading to superintelligence.
Jailbreak Prompting refers to techniques that manipulate AI behavior beyond intended safeguards.
Mesa-optimization refers to AI systems optimizing their own behavior or objectives in ways not originally intended by their creators.
Model alignment ensures AI systems operate in ways consistent with human values and intentions.
Model robustness refers to the ability of a machine learning model to maintain performance despite changes in input data or environment.
Model Safety refers to ensuring the reliability and security of AI models during development and deployment.
OpenAI is an AI research organization focused on developing safe and beneficial artificial intelligence.
An out-of-distribution sample is a data point that does not conform to the training distribution of a model.
Outer Alignment refers to ensuring that an AI's goals align with human values and societal norms.