Good-enough practices for language model
packages
This document suggests minimal “Good-enough Practices” for
software packages which rely on language model (LM, or “LLM” for large
language model) outputs.
- Prefer local LMs, for reasons described in this
separate vignette, and generally avoid relying on closed or
commercial APIs.
- Provide direct links to all models used, such as through links to huggingface model pages, “Model
cards” hosted elsewhere, or original published research. Include
explicit statements about long-term availability and stability of all
models.
- Summarise the training data used in all models, including estimation
of the proportion of data drawn from public domains, and the extent to
which use of such data in model training may violate licensing
conditions.
- Combine LM output with equivalent output from alternative
algorithms. Reasons for this are exemplified in this blog
post from
anthropic.ai
.
- Use or implement efficient algorithms to combine ranks from these
multiple outputs, including separate pre-processing of the most
computationally intensive stages.
- Provide a “Summary” of how the software generates results. This
should include the following sections where applicable:
- Input chunking, describing chunking methods used, and possible user
control
- LM sizes, including input or context size, and ouput or embedding
sizes
- Similarity algorithms, including metrics applied to LM outputs, and
metrics for alternative, non-LM algorithms
- Final ranking, including description of how different components are
combined, such as outputs from different LM chunks and from alternative,
non-LM algorithms. Tie-breaking procedures may also be described.
- Reproducibility statement, including descriptions of long-term
stability of model results, along with any components relying on random
numbers, and how seeding can be used to generate reproducible
outputs.
- Software should also provide more extended and non-technical
descriptions of all aspects presented in the previous summary.
- LM package should include routines to update all data used, and
demonstrate that such data updates are automated, and are performed with
sufficient regularity. See this
blog post for the difficulty and importance of updating LM data, and
the vignette
for this package on how data for this package are updated.