Language Identification (LID) is the process of finding out in which language a given text is written. It is is a key task in Natural Language Processing, commonly used in pre-classification or document selection.
Traditionally, LID was performed with Markov chain-based methods, like N-grams used in
lang-2.0. The new
lang-4.0 API is built on a deep neural network capable of detecting more than 180 different languages. It offers a high precision for both long and short texts without sacrificing performance.