Retorno sobre Investimento (RoI) Align, or Region of Interest Align, is a crucial technique used in aprendizado profundo for detecção de objetos tasks, particularly in models like Máscara R-CNN. It addresses a common issue in object detection frameworks where the regions of interest (RoIs) are extracted from feature maps with quantization errors. These errors can lead to a misalignment between the RoIs and the actual object boundaries, resulting in poorer performance.
The primary goal of RoI Align is to improve the accuracy of the feature extraction process for each RoI by ensuring that the extracted features correspond more closely to the original input image. Unlike earlier methods such as RoI Pooling, which would round the coordinates of the RoI to the nearest pixel grid, RoI Align uses interpolação bilinear to calculate the values at non-integer coordinates. This allows for a more precise mapping of the RoIs to the underlying feature map.
Em um fluxo de trabalho típico, o RoI Align opera da seguinte forma: uma vez que o rede neural convolucional (CNN) generates feature maps from the input image, the RoIs are defined based on the bounding boxes of detected objects. RoI Align then takes these RoIs and samples the feature map using bilinear interpolation, producing a fixed-size feature vector for each RoI. This feature vector can then be fed into subsequent layers of the model for classification or segmentation tasks.
Ao melhorar o precision of feature extraction, RoI Align significantly enhances the performance of downstream tasks, making it a vital component in state-of-the-art object detection and segmentation models.