Explore 300 AI terms in Computer Vision
3D Vision refers to the ability to perceive depth and distance in a three-dimensional space using visual information.
Action Recognition is the process of identifying specific actions in video data using AI techniques.
Albumentations is a Python library for image augmentation in deep learning, enhancing model training with diverse image transformations.
AlphaPose is a real-time human pose estimation framework using deep learning techniques.
An anchor box is a predefined bounding box used in object detection models to help identify and locate objects in images.
Anchor Box Regression is a technique used in object detection to refine proposed bounding boxes.
ArcFace is a facial recognition algorithm that improves accuracy by using angular distance for feature representation.
Atrous convolution is a type of convolution that uses dilated filters to capture multi-scale features in neural networks.
An attention mechanism helps AI models focus on relevant parts of input data, improving performance in tasks like translation and image recognition.
Attention Pooling is a technique in AI used to summarize information from various input features by focusing on relevant parts.
Average pooling reduces the size of feature maps by taking the average value of sub-regions.
A model that represents images as collections of visual features for analysis and classification.
BLIP is a model that combines vision and language processing for tasks like image captioning and visual question answering.
Blob detection identifies regions in images that differ in properties like intensity or color from surrounding areas.
Boundary detection identifies edges or transitions in images or data, crucial for object recognition and image analysis.
Bounding box coordinates define the location and size of an object in an image or 3D space.
Capsule Network Routing is a technique in deep learning that improves how neural networks process spatial hierarchies in data.
A capsule neural network is an advanced neural network architecture that enhances the ability to recognize patterns and spatial hierarchies.
Cascade R-CNN is an advanced object detection framework that improves accuracy using multiple stages of region proposal networks.
CenterNet is an object detection framework that detects objects as points, simplifying the detection process.
CIFAR is a dataset widely used for training machine learning models in computer vision tasks.
The CIFAR-100 dataset is a collection of 60,000 32x32 color images in 100 classes for machine learning research.
A large dataset for training AI to understand urban scenes and segment objects in city environments.
Class Activation Maps (CAMs) visualize how CNNs focus on specific image areas for classification.
Class Activation Mapping highlights important image regions for deep learning model predictions.
CLIP is an AI model that connects images and text for better understanding and interpretation.
A co-attention mechanism allows models to focus on two sets of inputs simultaneously, enhancing their understanding and representation.
COCO is a large-scale dataset for image recognition, segmentation, and captioning in AI applications.