AI Glossary: What Is Gradient Descent (GD)? Definition & Meaning

Gradient Descent

Gradient Descent is a widely used optimization algorithm in machine learning and statistics, particularly for training models. The core idea behind gradient descent is to minimize a function by iteratively adjusting parameters in the direction of the steepest descent, which is identified by the gradient of the function.

Specifically, gradient descent starts with an initial set of parameters (or weights) and calculates the gradient, which is a vector that points in the direction of the steepest increase of the function. To minimize the function, parameters are updated by moving a small step in the opposite direction of the gradient. This step size is determined by a value known as the learning rate.

The process can be summarized in the following steps:

Choose an initial set of parameters.
Calculate the gradient of the loss function with respect to the parameters.
Update the parameters by moving in the opposite direction of the gradient.
Repeat the process until convergence, which occurs when the changes in the parameters are smaller than a predefined threshold.

There are several variations of gradient descent:

Batch Gradient Descent: Uses the entire dataset to compute the gradient, which can be slow for large datasets.
Stochastic Gradient Descent (SGD): Uses one random sample to update parameters, which introduces variability but can be faster and help escape local minima.
Mini-batch Gradient Descent: Combines the benefits of both by using a small batch of samples.

Gradient descent is essential for training various models, including linear regression, neural networks, and support vector machines, making it a fundamental concept in the field of artificial intelligence.