# -------------------------------------------- # CITATION file created with {cffr} R package # See also: https://docs.ropensci.org/cffr/ # -------------------------------------------- cff-version: 1.2.0 message: 'To cite package "tokenizers" in publications use:' type: software license: MIT title: 'tokenizers: Fast, Consistent Tokenization of Natural Language Text' version: 0.3.1 doi: 10.21105/joss.00655 identifiers: - type: doi value: 10.32614/CRAN.package.tokenizers abstract: Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'. authors: - family-names: Charlon given-names: Thomas email: charlon@protonmail.com orcid: https://orcid.org/0000-0001-7497-0470 - family-names: Mullen given-names: Lincoln email: lincoln@lincolnmullen.com orcid: https://orcid.org/0000-0001-5103-6917 preferred-citation: type: article title: Fast, Consistent Tokenization of Natural Language Text authors: - family-names: Mullen given-names: Lincoln A. - family-names: Benoit given-names: Kenneth - family-names: Keyes given-names: Os - family-names: Selivanov given-names: Dmitry - family-names: Arnold given-names: Jeffrey journal: Journal of Open Source Software year: '2018' volume: '3' issue: '23' url: https://doi.org/10.21105/joss.00655 doi: 10.21105/joss.00655 start: '655' repository: https://ropensci.r-universe.dev repository-code: https://github.com/ropensci/tokenizers commit: b80863d088d4b39695b602ca11e061ac34770ec7 url: https://docs.ropensci.org/tokenizers/ date-released: '2024-03-27' contact: - family-names: Charlon given-names: Thomas email: charlon@protonmail.com orcid: https://orcid.org/0000-0001-7497-0470