---
title: "Why local LLMs?"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Why local LLMs?}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set (
    collapse = TRUE,
    comment = "#>"
)
```

The "pkgmatch" package uses Large Language Models (LLMs) to assess relationships
between R packages. Software which relies on LLMs commonly accesses them
through Application Programming Interfaces (APIs) provided by external
organisations such as [mistral.ai](https://mistral.ai),
[jina.ai](https://jina.ai), or a host of alternative providers.
Inputs, generally in text form, are sent to the external service which then
responds in some specified form, such as text completions.

Accessing LLMs through APIs has the two key advantages of:

- Being easier to develop, as external APIs generally take care of much of the
processing that might otherwise have to be written and executed locally; and
- Being able to access the latest and biggest and fastest models which are
generally only available in the form of external APIs.

In spite of those advantages, building software around external APIs entails
several drawbacks, notably including:

- There is no guarantee that the API will continue to be available in the
future, or that processes used to generate responses will remain stable and
reproducible.
- Most APIs cost money. These costs must generally be borne by the users of
software.
- Data submitted to such APIs is generally used by the organizations providing
them to train and refine models, so privacy-protecting use is generally not
possible.

We, the developers of this package, believe that the disadvantages of external
APIs far outweigh these advantages, and so have developed this package to
interface with LLMs exclusively through a local server. The server used here is
provided by [the "ollama" software](https://ollama.com), which is used to run
and serve results from two openly published LLM models, both from
[jina.ai](https://jina.ai):

- [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en)
- [jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code)

The package relies on comparing user inputs with equivalent results from two
main corpora of R packages. For the package to work, results must be guaranteed
to be directly comparable. The use of stable, openly published models
guarantees this ability in ways that external APIs can not.