AI Glossary: What Is Bag-of-words Model In Computer Vision (BoW)? Definition & Meaning

Modèle de sac de mots en vision par ordinateur

La Sac de mots (BoW) model in vision par ordinateur is a technique used to represent images as collections of visual features. Inspired by the traditional bag-of-words model in traitement du langage naturel, it treats visual elements of images as ‘words’ that can be analyzed and classified.

In the BoW model, images are first processed to extract key visual features, such as edges, colors, or textures. These features are typically gathered into small regions called ‘keypoints’ or ‘patches.’ Each of these keypoints is then quantified into a visual vocabulary, which is essentially a dictionary of visual words. The process involves clustering these features using algorithms like K-means to group similar features together.

Once a visual vocabulary is established, each image can be represented as a histogram that counts the occurrence of each visual word in the image. This histogram serves as a représentation compacte of the image, allowing for easier comparison and classification of images based on their content. For instance, two images might be similar if they contain many of the same visual words, even if the images themselves look different at a glance.

Le modèle BoW est largement utilisé dans diverses applications de vision par ordinateur, notamment classification d'image, object recognition, and scene understanding. While it simplifies the analysis by ignoring spatial relationships between features, it can still provide effective results in many scenarios. Advances in deep learning have led to the development of more sophisticated models, but the BoW approach remains a foundational concept in computer vision.