---
title: "Why local language models (LMs)?"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Why local language models (LMs)?}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set (
    collapse = TRUE,
    comment = "#>"
)
```

The "pkgmatch" package uses Language Models (LMs, or equivalently, "LLMs" for
large large models) to assess relationships between R packages. Software which
relies on LMs commonly accesses them through Application Programming
Interfaces (APIs) provided by external organisations such as
[mistral.ai](https://mistral.ai), [jina.ai](https://jina.ai), or a host of
alternative providers. Inputs, generally in text form, are sent to the external
service which then responds in some specified form, such as text completions.

Accessing LMs through APIs has the two key advantages of:

- Being easier to develop, as external APIs generally take care of much of the
processing that might otherwise have to be written and executed locally; and
- Being able to access the latest and biggest and fastest models which are
generally only available in the form of external APIs.

In spite of those advantages, building software around external APIs entails
several drawbacks, notably including:

- There is no guarantee that the API will continue to be available in the
future, or that processes used to generate responses will remain stable and
reproducible.
- Most APIs cost money. These costs must generally be borne by the users of
software.
- Data submitted to such APIs is generally used by the organizations providing
them to train and refine models, so privacy-protecting use is generally not
possible.

We, the developers of this package, believe that the disadvantages of external
APIs far outweigh these advantages, and so have developed this package to
interface with LMs exclusively through a local server. The server used here is
provided by [the "ollama" software](https://ollama.com), which is used to run
and serve results from two openly published LM models, both from
[jina.ai](https://jina.ai):

- [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)
- [jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code)

The package relies on comparing user inputs with equivalent results from two
main corpora of R packages. For the package to work, results must be guaranteed
to be directly comparable. The use of stable, openly published models
guarantees this ability in ways that external APIs can not.