AI Glossary: What Is Mask R-CNN? Definition & Meaning

Máscara R-CNN

Mask R-CNN es un avanzado modelo de aprendizaje profundo primarily used for detección de objetos and segmentación de instancias in images. Desarrollado por Facebook AI Research in 2017, it builds upon the Faster R-CNN framework, which is well-known for object detection tasks. The key innovation of Mask R-CNN is its ability to not only detect objects within an image but also to delineate their precise boundaries by generating high-quality segmentation masks.

El architecture of Mask R-CNN consists of two main stages. In the first stage, it identifies objects and their bounding boxes using a Region Proposal Network (RPN). This network proposes regions in the image that are likely to contain objects. In the second stage, these proposed regions are refined, and the model generates binary masks for each detected object, effectively segmenting them from the background.

Mask R-CNN employs a fully convolutional network (FCN) to create segmentation masks, which allows it to output a pixel-wise mask for each detected object. This capability makes it particularly useful for applications requiring detailed image analysis, such as autonomous driving, imagen médica, and video surveillance.

One of the advantages of Mask R-CNN is its flexibility; it can be trained to recognize multiple object classes and can be adapted for various tasks beyond traditional object detection, including estimación de poses humanas and image captioning. Overall, Mask R-CNN represents a significant advancement in the field of computer vision, enabling machines to understand and interpret visual data more effectively.