Package: pangoling 1.0.1

Bruno Nicenboim

pangoling: Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2'; Radford et al., 2019) and masked/bidirectional LLMs (e.g., 'BERT'; Devlin et al., 2019, <doi:10.48550/arXiv.1810.04805>) to compute the probability of words, phrases, or tokens given their linguistic context. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Authors:Bruno Nicenboim [aut, cre], Chris Emmerly [ctb], Giovanni Cassani [ctb], Lisa Levinson [rev], Utku Turk [rev]

pangoling_1.0.1.tar.gz
pangoling_1.0.1.zip(r-4.5)pangoling_1.0.1.zip(r-4.4)pangoling_1.0.1.zip(r-4.3)
pangoling_1.0.1.tgz(r-4.5-any)pangoling_1.0.1.tgz(r-4.4-any)pangoling_1.0.1.tgz(r-4.3-any)
pangoling_1.0.1.tar.gz(r-4.5-noble)pangoling_1.0.1.tar.gz(r-4.4-noble)
pangoling_1.0.1.tgz(r-4.4-emscripten)pangoling_1.0.1.tgz(r-4.3-emscripten)
pangoling.pdf |pangoling.html✨
pangoling/json (API)
NEWS

# Install 'pangoling' in R:

install.packages('pangoling', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org'))

Reviews:rOpenSci Software Review #575

Bug tracker:https://github.com/ropensci/pangoling/issues

Pkgdown site:https://docs.ropensci.org

Datasets:

df_jaeger14 - Self-Paced Reading Dataset on Chinese Relative Clauses
df_sent - Example dataset: Two word-by-word sentences

On CRAN:

nlp psycholinguistics transformers

4.90 score 8 stars 24 exports 26 dependencies

Last updated 4 hours agofrom:967d98b74e (on main). Checks:4 OK, 5 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 11 2025
R-4.5-win	OK	Mar 11 2025
R-4.5-mac	OK	Mar 11 2025
R-4.5-linux	OK	Mar 11 2025
R-4.4-win	NOTE	Mar 11 2025
R-4.4-mac	NOTE	Mar 11 2025
R-4.4-linux	NOTE	Mar 11 2025
R-4.3-win	NOTE	Mar 11 2025
R-4.3-mac	NOTE	Mar 11 2025

Exports:causal_config causal_lp causal_lp_mats causal_next_tokens_pred_tbl causal_next_tokens_tbl causal_pred_mats causal_preload causal_targets_pred causal_tokens_lp_tbl causal_tokens_pred_lst causal_words_pred install_py_pangoling installed_py_pangoling masked_config masked_lp masked_preload masked_targets_pred masked_tokens_pred_tbl masked_tokens_tbl ntokens perplexity_calc set_cache_folder tokenize_lst transformer_vocab

Dependencies:cachem cli data.table fastmap glue here jsonlite lattice lifecycle magrittr Matrix memoise pillar png rappdirs Rcpp RcppTOML reticulate rlang rprojroot rstudioapi tidyselect tidytable utf8 vctrs withr

Troubleshooting the use of Python in R

Rendered fromtroubleshooting.Rmdusingknitr::rmarkdownon Mar 11 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a Bert model to get the predictability of words in their context

Rendered fromintro-bert.Rmdusingknitr::rmarkdownon Mar 11 2025.

Last update: 2025-03-11
Started: 2025-03-11

Using a GPT2 transformer model to get word predictability

Rendered fromintro-gpt2.Rmdusingknitr::rmarkdownon Mar 11 2025.

Last update: 2025-03-11
Started: 2025-03-11

Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Rendered fromexample.Rmdusingknitr::rmarkdownon Mar 11 2025.

Last update: 2025-03-11
Started: 2025-03-11

Help page	Topics
Returns the configuration of a causal model	causal_config
Generate next tokens after a context and their predictability using a causal transformer model	causal_next_tokens_pred_tbl
Generate a list of predictability matrices using a causal transformer model	causal_pred_mats
Preloads a causal language model	causal_preload
Compute predictability using a causal transformer model	causal_targets_pred causal_tokens_pred_lst causal_words_pred
Self-Paced Reading Dataset on Chinese Relative Clauses	df_jaeger14
Example dataset: Two word-by-word sentences	df_sent
Install the Python packages needed for 'pangoling'	install_py_pangoling
Check if the required Python dependencies for 'pangoling' are installed	installed_py_pangoling
Returns the configuration of a masked model	masked_config
Preloads a masked language model	masked_preload
Get the predictability of a target word (or phrase) given a left and right context	masked_targets_pred
Get the possible tokens and their log probabilities for each mask in a sentence	masked_tokens_pred_tbl
The number of tokens in a string or vector of strings	ntokens
Calculates perplexity	perplexity_calc
Set cache folder for HuggingFace transformers	set_cache_folder
Tokenize an input	tokenize_lst
Returns the vocabulary of a model	transformer_vocab

Package: pangoling 1.0.1

pangoling: Access to Large Language Model Predictions

Troubleshooting the use of Python in R

Using a Bert model to get the predictability of words in their context

Using a GPT2 transformer model to get word predictability

Worked-out example: Surprisal from a causal (GPT) model as a cognitive processing bottleneck in reading

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)