--- title: "Why local language models (LMs)?" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Why local language models (LMs)?} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set ( collapse = TRUE, comment = "#>" ) ``` The "pkgmatch" package uses Language Models (LMs, or equivalently, "LLMs" for large large models) to assess relationships between R packages. Software which relies on LMs commonly accesses them through Application Programming Interfaces (APIs) provided by external organisations such as [mistral.ai](https://mistral.ai), [jina.ai](https://jina.ai), or a host of alternative providers. Inputs, generally in text form, are sent to the external service which then responds in some specified form, such as text completions. Accessing LMs through APIs has the two key advantages of: - Being easier to develop, as external APIs generally take care of much of the processing that might otherwise have to be written and executed locally; and - Being able to access the latest and biggest and fastest models which are generally only available in the form of external APIs. In spite of those advantages, building software around external APIs entails several drawbacks, notably including: - There is no guarantee that the API will continue to be available in the future, or that processes used to generate responses will remain stable and reproducible. - Most APIs cost money. These costs must generally be borne by the users of software. - Data submitted to such APIs is generally used by the organizations providing them to train and refine models, so privacy-protecting use is generally not possible. We, the developers of this package, believe that the disadvantages of external APIs far outweigh these advantages, and so have developed this package to interface with LMs exclusively through a local server. The server used here is provided by [the "ollama" software](https://ollama.com), which is used to run and serve results from two openly published LM models, both from [jina.ai](https://jina.ai): - [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en) - [jina-embeddings-v2-base-code](https://huggingface.co/jinaai/jina-embeddings-v2-base-code) The package relies on comparing user inputs with equivalent results from two main corpora of R packages. For the package to work, results must be guaranteed to be directly comparable. The use of stable, openly published models guarantees this ability in ways that external APIs can not.