---
title: "Data caching and updating"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data caching and updating}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set (
    collapse = TRUE,
    comment = "#>"
)
```

The "pkgmatch" package package relies on pre-generated [Language Model (LM)
embeddings](https://en.wikipedia.org/wiki/Word_embedding). Inputs of text,
code, or entire packages are converted into embeddings, and the results
compared with the pre-generated embeddings to discern the best-matching result.
The pre-generated embeddings are calculated for the entire package suites of
both [rOpenSci](https://ropensci.org/packages) and
[CRAN](https://cran.r-project.org).

## Local caching and updating for users

The pre-generated embeddings are downloaded whenever needed in initial package
calls. The download location is determined by [the `rappdirs`
package](https://rappdirs.r-lib.org/) as `fs::path(rappdirs::user_cache_dir(),
"R", "pkgmatch)"`. Users should generally not need to worry about managing these
data files themselves, although the data can be safely deleted at any time, as
can the entire directory in which are stored.

The remote data are regularly updated, and so locally-cached data also require
regular updating. By default, if any one of the locally-cached embeddings files
needed for functionality is more than 30 days old, a newer version will be
automatically downloaded. This update frequency can also be over-ridden by
setting a value like 100 days with: ```{r op, eval = FALSE} options
("pkgmatch.update_frequency" = 100L) ``` If you want to ensure your data are
always up to date, set an update frequency of 1, and they'll be updated every
day. Alternatively, you can set an enduring environment variable, typically in
your `~/.Renviron` file, to specify a fixed update frequency:
```{bash, eval = FALSE}
PKGMATCH_UPDATE_FREQUENCY=100
```
If you wish to prevent any updating, set that environment variable to a really
high value, such as `1e6`.


## Data updating for developers

These package suites are constantly changing, and therefore the embeddings also
need to be regularly updated. The "pkgmatch" package includes several files in
the `/R` directory prefixed with "data-update" containing functions which
implement this updating. These functions are intended to be used only by the
developers. They are ultimately used in [this GitHub workflow
file](https://github.com/ropensci-review-tools/pkgmatch/blob/main/.github/workflows/update.yaml)
which is automatically run every day to update all embedding data for both CRAN
and rOpenSci. The embeddings data thus always reflect the current daily state
of both repositories.