AI Glossary: What Is Cross-Modal Retrieval (CMR)? Definition & Meaning

クロスモーダル検索 refers to the capability of an 人工知能 system to retrieve information across different modalities or types of data. For example, it allows users to search for images using text queries or to find relevant text documents based on visual input. This technique is particularly useful in applications where information is stored in various formats, such as multimedia databases 画像と説明テキストの両方を含む。

基礎となる technology often involves advanced 機械学習 algorithms, particularly 深層学習, which can learn to associate features from different modalities. For instance, a ニューラルネットワーク might be trained to recognize certain visual features in images that correspond to specific keywords or phrases in text. By mapping both images and text into a shared semantic space, the system can effectively compare and retrieve information from one modality based on queries from another.

Cross-Modal Retrieval has a wide range of applications, including in e-commerce, where users can search for products using images, or in デジタルライブラリ, where researchers can find articles related to particular figures or diagrams. As AI continues to evolve, the effectiveness of cross-modal retrieval systems is expected to improve, enabling more intuitive and efficient ways to access and discover information across diverse sources.