Package 'cld3' reference manual

Package 'cld3'

Title:	Google's Compact Language Detector 3
Description:	Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See <https://github.com/google/cld3#readme> for more information.
Authors:	Jeroen Ooms [aut, cre] (ORCID: <https://orcid.org/0000-0002-4035-0289>), Google Inc [cph] (CLD3 C++ library)
Maintainer:	Jeroen Ooms <[email protected]>
License:	Apache License 2.0
Version:	1.6.1
Built:	2026-07-01 08:17:30 UTC
Source:	https://github.com/ropensci/cld3

Title:

Google's Compact Language Detector 3

Description:

Google's Compact Language Detector 3 is a neural network model for language identification and the successor of 'cld2' (available from CRAN). The algorithm is still experimental and takes a novel approach to language detection with different properties and outcomes. It can be useful to combine this with the Bayesian classifier results from 'cld2'. See <https://github.com/google/cld3#readme> for more information.

Authors:

Jeroen Ooms [aut, cre] (ORCID: <https://orcid.org/0000-0002-4035-0289>), Google Inc [cph] (CLD3 C++ library)

Maintainer:

Jeroen Ooms <[email protected]>

License:

Apache License 2.0

Version:

1.6.1

Built:

2026-07-01 08:17:30 UTC

Source:

https://github.com/ropensci/cld3

Help Index

Compact Language Detector 3

Description

The function detect_language() is vectorised and guesses the the language of each string in text or returns NA if the language could not reliably be determined. The function detect_language_multi() is not vectorised and detects all languages inside the entire character vector as a whole.

Usage

detect_language(text)

detect_language_mixed(text, size = 3)
detect_language(text)

detect_language_mixed(text, size = 3)

Arguments

text

a string with text to classify or a connection to read from

size

number of languages to detect

Examples

# Vectorized best guess
text <- c("To be or not to be?", "Ce n'est pas grave.",
  "Hij heeft de klok horen luiden maar weet niet waar de klepel hangt.")
detect_language(text)

# Multiple languages in one text (doesn't seem to work well)
detect_language_mixed(text)
# Vectorized best guess
text <- c("To be or not to be?", "Ce n'est pas grave.",
  "Hij heeft de klok horen luiden maar weet niet waar de klepel hangt.")
detect_language(text)

# Multiple languages in one text (doesn't seem to work well)
detect_language_mixed(text)