AI Glossary: Computer Vision Terms & Definitions

3D Vision

3DV

3D Vision refers to the ability to perceive depth and distance in a three-dimensional space using visual information.

Action Recognition

AR

Action Recognition is the process of identifying specific actions in video data using AI techniques.

Albumentations

None

Albumentations is a Python library for image augmentation in deep learning, enhancing model training with diverse image transformations.

AlphaPose

AP

AlphaPose is a real-time human pose estimation framework using deep learning techniques.

Anchor Box

AB

An anchor box is a predefined bounding box used in object detection models to help identify and locate objects in images.

Anchor Box Regression

Anchor Box Regression is a technique used in object detection to refine proposed bounding boxes.

ArcFace

ArcFace is a facial recognition algorithm that improves accuracy by using angular distance for feature representation.

Atrous Convolution

AC

Atrous convolution is a type of convolution that uses dilated filters to capture multi-scale features in neural networks.

Attention Mechanism

AM

An attention mechanism helps AI models focus on relevant parts of input data, improving performance in tasks like translation and image recognition.

Attention Pooling

AP

Attention Pooling is a technique in AI used to summarize information from various input features by focusing on relevant parts.

Average Pooling

Avg Pool

Average pooling reduces the size of feature maps by taking the average value of sub-regions.

Bag-of-words model in computer vision

BoW

A model that represents images as collections of visual features for analysis and classification.

BLIP

BLIP is a model that combines vision and language processing for tasks like image captioning and visual question answering.

Blob Detection

Blob detection identifies regions in images that differ in properties like intensity or color from surrounding areas.

Boundary Detection

BD

Boundary detection identifies edges or transitions in images or data, crucial for object recognition and image analysis.

Bounding Box Coordinates

Bounding box coordinates define the location and size of an object in an image or 3D space.

Capsule Network Routing

Capsule Network Routing is a technique in deep learning that improves how neural networks process spatial hierarchies in data.

Capsule neural network

CapsNet

A capsule neural network is an advanced neural network architecture that enhances the ability to recognize patterns and spatial hierarchies.

Cascade R-CNN

Cascade R-CNN is an advanced object detection framework that improves accuracy using multiple stages of region proposal networks.

CenterNet

CT

CenterNet is an object detection framework that detects objects as points, simplifying the detection process.

CIFAR

CIFAR is a dataset widely used for training machine learning models in computer vision tasks.

CIFAR-100 Dataset

The CIFAR-100 dataset is a collection of 60,000 32x32 color images in 100 classes for machine learning research.

Cityscapes Dataset

CS

A large dataset for training AI to understand urban scenes and segment objects in city environments.

Class Activation Map

CAM

Class Activation Maps (CAMs) visualize how CNNs focus on specific image areas for classification.

Class Activation Mapping

CAM

Class Activation Mapping highlights important image regions for deep learning model predictions.

CLIP

CLIP is an AI model that connects images and text for better understanding and interpretation.

Co-Attention Mechanism

Co-Attention

A co-attention mechanism allows models to focus on two sets of inputs simultaneously, enhancing their understanding and representation.

COCO

COCO is a large-scale dataset for image recognition, segmentation, and captioning in AI applications.