Retour sur investissement (RoI) Align, or Region of Interest Align, is a crucial technique used in apprentissage profond for détection d'objets tasks, particularly in models like Mask R-CNN. It addresses a common issue in object detection frameworks where the regions of interest (RoIs) are extracted from feature maps with quantization errors. These errors can lead to a misalignment between the RoIs and the actual object boundaries, resulting in poorer performance.
The primary goal of RoI Align is to improve the accuracy of the feature extraction process for each RoI by ensuring that the extracted features correspond more closely to the original input image. Unlike earlier methods such as RoI Pooling, which would round the coordinates of the RoI to the nearest pixel grid, RoI Align uses interpolation bilinéaire to calculate the values at non-integer coordinates. This allows for a more precise mapping of the RoIs to the underlying feature map.
Dans un flux de travail typique, RoI Align fonctionne comme suit : une fois que le réseau de neurones convolutionnels (CNN) generates feature maps from the input image, the RoIs are defined based on the bounding boxes of detected objects. RoI Align then takes these RoIs and samples the feature map using bilinear interpolation, producing a fixed-size feature vector for each RoI. This feature vector can then be fed into subsequent layers of the model for classification or segmentation tasks.
En améliorant le precision of feature extraction, RoI Align significantly enhances the performance of downstream tasks, making it a vital component in state-of-the-art object detection and segmentation models.