AI Glossary: What Is N-gram Model? Definition & Meaning

Modelo de N-gramas

Un modelo de N-gramas es un modelo estadístico modelo de lenguaje used in procesamiento de lenguaje natural (NLP) and lingüística computacional. It predicts the next item in a sequence (such as a word or character) based on the history of the previous ‘n-1’ items. The term ‘N-gram’ refers to the number of items in the sequence. For example, in a bigram model (where n=2), the model looks at pairs of words, while in a trigram model (where n=3), it looks at triplets of words.

El modelo de N-gramas opera bajo el principio de probabilidad condicional. It computes the probability of a word given the previous words in the sequence. This is expressed mathematically as:

P(w_n | w_1, w_2, …, w_{n-1})

where ‘w_n’ is the current word, and ‘w_1, w_2, …, w_{n-1}’ are the preceding words. The model is built by analyzing a large corpus of text to count occurrences of these N-grams and using these counts to estimate probabilities.

N-gram models are widely used in various applications, including text prediction, speech recognition, and traducción automática. They are simple to implement and can provide reasonable performance, especially when combined with techniques like smoothing to handle unseen N-grams. However, they also have limitations, such as the inability to capture long-range dependencies (context beyond n-1 words) and the exponential growth of the state space as ‘n’ increases, which can lead to data sparsity issues.