Title: Google's Compact Language Detector 2
Description: Bindings to Google's C++ library Compact Language Detector 2 (see <> for more information). Probabilistically detects over 80 languages in plain text or HTML. For mixed-language input it returns the top three detected languages and their approximate proportion of the total classified text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3' package on CRAN which uses a neural network model instead.
Authors: Jeroen Ooms [aut, cre] , Dirk Sites [cph] (Author of CLD2 C++ library)
Maintainer: Jeroen Ooms <[email protected]>
License: Apache License 2.0
Version: 1.2.5
The function detect_language() is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined. The function detect_language_multi() is not vectorised and analyses the entire character vector as a whole. The output includes the top 3 detected languages including the relative proportion and the total number of text bytes that was reliably classified.


detect_language(text, plain_text = TRUE, lang_code = TRUE)

detect_language_mixed(text, plain_text = TRUE)



a string with text to classify or a connection to read from


if FALSE then code skips HTML tags and expands HTML entities


return a language code instead of name


# Vectorized function
text <- c("To be or not to be?", "Ce n'est pas grave.", "Nou breekt mijn klomp!")

# Read HTML from connection
detect_language(url(''), plain_text = FALSE)

# More detailed classification output
  url(''), plain_text = FALSE)

  url(''), plain_text = FALSE)

