Big Data

Explore 20 AI terms in Big Data

Anonymization

Anonymization is the process of removing personal identifiers from data to protect individual privacy.

Apache Arrow

Apache Arrow is an open-source framework for high-performance data processing and analytics.

Apache Kafka

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and applications.

Dark Data

Dark data refers to information that organizations collect but do not use for analysis or decision-making.

Data integration

DI

Data integration is the process of combining data from different sources into a unified view.

Data Lake

DL

A data lake is a centralized repository that stores large amounts of raw data in its native format.

Data Lakehouse

DLH

A Data Lakehouse combines the best features of data lakes and data warehouses for efficient data management and analytics.

Data Pipeline

A data pipeline is a series of processes that move and transform data from one system to another.

Data Slicing

Data slicing is the process of extracting specific subsets of data from a larger dataset for analysis.

Data Stream

A data stream is a continuous flow of data generated in real-time, often used for analysis and processing.

Data Velocity

Data Velocity refers to the speed at which data is generated, processed, and analyzed, crucial for real-time decision-making.

Databricks ML

DB ML

Databricks ML is a machine learning platform integrated with Apache Spark for collaborative data science and model deployment.

Delta Lake

DL

Delta Lake is an open-source storage layer that brings reliability and performance to data lakes.

Distributed Computing

Distributed Computing involves multiple interconnected computers working together to solve complex tasks efficiently.

Hadoop Framework

Hadoop is an open-source framework for distributed storage and processing of big data using a cluster of computers.

Large Scale Data

Large Scale Data refers to vast datasets that require advanced processing and storage techniques due to their size and complexity.

Online Data

Online data refers to information that is accessible via the internet, including user-generated content and real-time data streams.

Out-of-Core Algorithm

An out-of-core algorithm processes data that exceeds memory capacity by using external storage.

Out-of-Core Processing

Out-of-core processing is a technique for handling data that doesn't fit into a computer's memory by utilizing disk storage.

SingleStore

SingleStore is a distributed SQL database designed for real-time analytics and transactional workloads.

Back to All Terms
Ctrl + /