AI Glossary: What Is Mel Frequency Cepstral Coefficients (MFCC)? Definition & Meaning

Mel Frequency Cepstral Coefficients (MFCCs) are a representation of the short-term power spectrum of sound, commonly used in 音声処理 and 音声認識. They are derived from the フーリエ変換 of a signal, capturing the frequency content in a way that mimics human perception 音の特性を模倣しています。

MFCCを取得するプロセスは、いくつかのステップから成ります。まず、音声信号を重複するフレームに分割し、各フレームにウィンドウをかけてスペクトルリークを減らします。次に、各フレームにフーリエ変換を適用してパワースペクトルを生成します。このスペクトルをメル尺度にマッピングし、これは音の知覚尺度です。メル尺度の間隔は、人間が音を知覚する方法を反映するように設計されており、低周波数を強調し、高周波数を圧縮します。

After mapping to the Mel scale, the logarithm of the power spectrum is taken, followed by the application of a 離散コサイン変換 (DCT). The resulting coefficients represent the short-term power spectrum in a compact form, with the first few coefficients typically containing the most relevant information for tasks such as speaker recognition or phoneme classification.

MFCCs have become a standard feature set in various audio and speech processing applications due to their effectiveness in capturing the characteristics of the human voice and other sounds. They are widely 機械学習モデルで使用される for tasks related to speech recognition, speaker identification, and even music genre classification.