AI Glossary: What Is Scene Understanding (SU)? Definition & Meaning

Comprensión de Escenas refers to the ability of inteligencia artificial (AI) systems to interpret and analyze visual information from the world around them. This involves not just identifying objects within an image or video, but also understanding their spatial relationships, actions, and context within a scene.

At its core, scene understanding combines various techniques from computer vision, procesamiento de lenguaje natural, and machine learning. For example, when a self-driving car navigates through a city, it must recognize pedestrians, other vehicles, traffic signs, and obstacles while also understanding their movements and interactions. This requires a sophisticated level of perception that goes beyond simple recognition.

Las tareas comunes asociadas con la comprensión de escenas incluyen:

Detección de objetos: Identificar y localizar objetos dentro de una imagen.
Segmentación Semántica: Assigning a label to every pixel in an image, effectively categorizing different regions based on the objects present.
Segmentación de instancias: Diferenciar entre instancias separadas del mismo objeto dentro de una escena.
Reconocimiento de acciones: Entender qué acciones están ocurriendo y quién las está realizando.
Escena Clasificación: Categorizing an entire image into a specific label or class, such as ‘beach’, ‘forest’, or ‘urban area’.

La comprensión de escenas tiene numerosas aplicaciones, incluyendo vehículos autónomos, robotics, augmented reality, and surveillance systems. As AI technologies continue to evolve, improving scene understanding capabilities will enhance how machines interact with and respond to their environments.