Earth Mover’s Distance (EMD) is a measure used in various fields such as visión por computadora, aprendizaje automático, and statistics to quantify the difference between two distribuciones de probabilidad. The concept is based on the idea of ‘moving’ distribution mass (or probability) from one distribution to another, akin to moving earth from one location to another in a physical landscape.
Mathematically, EMD is defined as the minimum cost of transforming one distribution into another. The cost is calculated based on the amount of ‘earth’ (mass) that needs to be moved and the distance it must be moved. More formally, if you have two distributions represented by two sets of points (often called ‘bins’ or ‘features’), EMD calculates the optimal way to convert one distribution into the other by considering both the amounts of mass and the distance between the points in these distributions.
La EMD tiene varias ventajas, incluyendo its robustness to noise and its ability to handle distributions of different shapes and sizes. It’s particularly useful in applications such as recuperación de imágenes and classification, where one needs to compare histograms or feature distributions. One of the notable properties of EMD is that it satisfies the triangle inequality, making it a proper metric.
Despite its strengths, EMD can be computationally expensive, especially for large datasets, due to the need for solving transportation problems. Various approximations and optimizations have been developed to make it more efficient for practical applications.