AI Glossary: What Is Mel Frequency Cepstral Coefficients (MFCC)? Definition & Meaning

Mel Frequency Cepstral Coefficients (MFCCs) are a representation of the short-term power spectrum of sound, commonly used in traitement audio and reconnaissance vocale. They are derived from the Transformée de Fourier of a signal, capturing the frequency content in a way that mimics human perception du son.

Le processus d'obtention des MFCCs implique plusieurs étapes. Tout d'abord, le signal audio est divisé en trames superposées, et chaque trame est fenêtrée pour réduire la fuite spectrale. Ensuite, la transformée de Fourier est appliquée à chaque trame pour générer un spectre de puissance. Ce spectre est ensuite mappé sur l'échelle de Mel, qui est une échelle perceptuelle des hauteurs. L'espacement de l'échelle de Mel est conçu pour refléter la façon dont les humains perçoivent le son, en mettant l'accent sur les basses fréquences tout en compressant les hautes fréquences.

After mapping to the Mel scale, the logarithm of the power spectrum is taken, followed by the application of a Transformée en cosinus discrète (DCT). The resulting coefficients represent the short-term power spectrum in a compact form, with the first few coefficients typically containing the most relevant information for tasks such as speaker recognition or phoneme classification.

MFCCs have become a standard feature set in various audio and speech processing applications due to their effectiveness in capturing the characteristics of the human voice and other sounds. They are widely utilisée dans les modèles d'apprentissage automatique for tasks related to speech recognition, speaker identification, and even music genre classification.