N

Ngram

Ein N-Gram ist eine zusammenhängende Sequenz von n Elementen aus einer gegebenen Text- oder Sprachprobe, die in der natürlichen Sprachverarbeitung verwendet wird.

An N-Gramm is a statistical Sprachmodell that represents a contiguous sequence of n items (usually words or characters) from a given sample of text or speech. In der Verarbeitung natürlicher Sprache (NLP) and computational linguistics, N-grams are used to analyze and model the structure of language, providing a method to predict the likelihood of a sequence of words occurring in a given context.

Der Wert von n bestimmt die Anzahl der Elemente in der Sequenz:

  • Unigramme: (n=1) single words, e.g., “the”, “cat”.
  • Bigramme: (n=2) pairs of consecutive words, e.g., “the cat”, “cat sat”.
  • Trigramme: (n=3) sequences of three consecutive words, e.g., “the cat sat”.

By analyzing N-grams, models can capture local context and dependencies between words, which is essential for various NLP tasks such as text classification, language modeling, and maschinelle Übersetzung. For instance, a bigram model could help predict the next word based on the previous word, which enhances the understanding of language patterns.

N-Gramme werden häufig in Anwendungen wie Suchmaschinen verwendet, Spracherkennung systems, and predictive text input. However, while they are powerful, they also have limitations, such as requiring large amounts of data to be effective and the inability to capture long-range dependencies in language. Overall, N-grams are foundational in the field of NLP, serving as a building block for more complex models.

Strg + /