---
title: "How to use autotest"
author:
- "Mark Padgham"
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_float: true
number_sections: false
theme: flatly
always_allow_html: true
vignette: >
%\VignetteIndexEntry{How to use autotest}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = TRUE,
message = TRUE,
width = 120,
comment = "#>",
fig.retina = 2,
fig.path = "README-"
)
```
This vignette demonstrates the easiest way to use `autotest`, which is to apply
it continuously through the entire process of package development. The best way
to understand the process is to obtain a local copy of the vignette itself from
[this
link](https://github.com/ropensci-review-tools/autotest/blob/master/vignettes/autotest.Rmd),
and step through the code. We begin by constructing a simple package in the
local
[`tempdir()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/tempfile.html).
Package Construction
To create a package in one simple line, we use
[`usethis::create_package()`](https://usethis.r-lib.org/reference/create_package.html),
and name our package `"demo"`.
```{r create_package}
path <- file.path (tempdir (), "demo")
usethis::create_package (path, check_name = FALSE, open = FALSE)
```
The structure looks like this:
```{r dir_tree}
fs::dir_tree (path)
```
Having constructed a minimal package structure, we can then insert some code in
the `R/` directory, including initial [`roxygen2`](https://roxygen2.r-lib.org)
documentation lines, and use the [`roxygenise()`
function](https://roxygen2.r-lib.org/reference/roxygenize.html) to create the
corresponding `man` files.
`autotest` works by parsing and running "example" code from function
documentation, so our code needs to include at least one example line.
```{r first-fn}
code <- c ("#' my_function",
"#'",
"#' @param x An input",
"#' @return Something else",
"#' @examples",
"#' y <- my_function (x = 1)",
"#' @export",
"my_function <- function (x) {",
" return (x + 1)",
"}")
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
```
Our package now looks like this:
```{r dir_tree2}
fs::dir_tree (path)
```
We can already apply `autotest` to that package to see what happens, first
ensuring that we've loaded the package ready to use.
```{r autotest1-fakey, eval = FALSE, echo = TRUE}
library (autotest)
x0 <- autotest_package (path)
```
```{r autotest1, eval = TRUE, echo = FALSE}
devtools::load_all (".", export_all = FALSE)
x0 <- autotest_package (path)
```
We use the [`DT` package](https://rstudio.github.io/DT) to display the results
here.
```{r}
DT::datatable (x0, options = list (dom = "t")) # display table only
```
The first thing to notice is the first column, which has `test_type = "dummy"`
for all rows. The [`autotest_package()`
function](https://docs.ropensci.org/autotest/reference/autotest_package.html)
has a parameter `test` with a default value of `FALSE`, so that the default
call demonstrated above does not actually implement the tests, rather it
returns an object listing all tests that would be performed with actually doing
so. Applying the tests by setting `test = TRUE` gives the following result.
```{r autotest-TRUE}
x1 <- autotest_package (path, test = TRUE)
DT::datatable (x1, options = list (dom = "t"))
```
Of the `r nrow(x0)` tests which were performed, only `r nrow(x1)` yielded
unexpected behaviour. The first indicates that the parameter `x` has only been
used as an integer, yet was not specified as such. The second states that the
parameter `x` is "assumed to be a single numeric". `autotest` does its best to
figure out what types of inputs are expected for each parameter, and with the
example only demonstrating `x = 1`, assumes that `x` is always expected to be
a single value. We can resolve the first of these by replacing `x = 1` with `x
= 1.` to clearly indicate that it is not an integer, and the second by
asserting that `length(x) == 1`, as follows:
```{r assert-length}
code <- c ("#' my_function",
"#'",
"#' @param x An input",
"#' @return Something else",
"#' @examples",
"#' y <- my_function (x = 1.)",
"#' @export",
"my_function <- function (x) {",
" if (length(x) > 1) {",
" warning(\"only the first value of x will be used\")",
" x <- x [1]",
" }",
" return (x + 1)",
"}")
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
```
This is then sufficient to pass all `autotest` tests and so return `NULL`.
```{r autotest-TRUE2}
autotest_package (path, test = TRUE)
```
## Integer input
Note that `autotest` distinguishes integer and non-integer types by their
[`storage.mode`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/mode.html)
of `"integer"` and `"double"`, and not by their respective classes of
`"integer"` and `"numeric"`, because `"numeric"` is ambiguous in R, and
`is.numeric(1L)` is `TRUE`, even though `storage.mode(1L)` is `"integer"`, and
not `"numeric"`. Replacing `x = 1` with `x = 1.` explicitly identifies that
parameter as a `"double"` parameter, and allowed the preceding tests to pass.
Note what happens if we instead specify that parameter as an integer (`x =
1L`).
```{r int-input}
code [6] <- gsub ("1\\.", "1L", code [6])
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
x2 <- autotest_package (path, test = TRUE)
DT::datatable (x2, options = list (dom = "t"))
```
That then generates two additional messages, the second of which reflects an
expectation that parameters assumed to be integer-valued should assert that,
for example by converting with `as.integer()`. The following suffices to remove
that message.
```{r use-as-int}
code <- c (code [1:12],
" if (is.numeric (x))",
" x <- as.integer (x)",
code [13:length (code)])
```
The remaining message concerns integer ranges. For any parameters which
`autotest` identifies as single integers, routines will try a full range of
values between `+/- .Machine$integer.max`, to ensure that all values are
appropriately handled. Many routines may sensibly allow unrestricted ranges,
while many others may not implement explicit control over permissible ranges,
yet may error on, for example, unexpectedly large positive or negative values.
The content of the diagnostic message indicates one way to resolve this issue,
which is simply by describing the input as `"unrestricted"`.
```{r unrestricted}
code [3] <- gsub ("An input", "An unrestricted input", code [3])
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
autotest_package (path, test = TRUE)
```
An alternative, and frequently better way, is to ensure and document specific
control over permissible ranges, as in the following revision of our function.
```{r input-range}
code <- c ("#' my_function",
"#'",
"#' @param x An input between 0 and 10",
"#' @return Something else",
"#' @examples",
"#' y <- my_function (x = 1L)",
"#' @export",
"my_function <- function (x) {",
" if (length(x) > 1) {",
" warning(\"only the first value of x will be used\")",
" x <- x [1]",
" }",
" if (is.numeric (x))",
" x <- as.integer (x)",
" if (x < 0 | x > 10) {",
" stop (\"x must be between 0 and 10\")",
" }",
" return (x + 1L)",
"}")
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
autotest_package (path, test = TRUE)
```
Respective limits of ranges may be specified with any of the following words:
- Lower limits: "more", "greater", "larger than", "lower limit of", "above"
- Upper limits: "less", "lower", "smaller than", "upper limit of", "below"
## Vector input
The initial test results above suggested that the input was *assumed* to be of
length one. Let us now revert our function to its original format which
accepted vectors of length > 1, and include an example demonstrating such
input.
```{r vector-input}
code <- c ("#' my_function",
"#'",
"#' @param x An input",
"#' @return Something else",
"#' @examples",
"#' y <- my_function (x = 1)",
"#' y <- my_function (x = 1:2)",
"#' @export",
"my_function <- function (x) {",
" if (is.numeric (x)) {",
" x <- as.integer (x)",
" }",
" return (x + 1L)",
"}")
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
```
Note that the first example no longer has `x = 1L`. This is because vector
inputs are identified as `integer` by examining all individual values, and
presuming `integer` representations for any parameters for which all values are
whole numbers, regardless of `storage.mode`.
```{r autotest-TRUE3}
x3 <- autotest_package (path, test = TRUE)
DT::datatable (x3, options = list (dom = "t"))
```
### List-column conversion
The above result reflects one of the standard tests, which is to determine
whether list-column formats are appropriately processed. List-columns commonly
arise when using (either directly or indirectly), the [`tidyr::nest()`
function](https://tidyr.tidyverse.org/reference/nest.html), or equivalently in
base R with the [`I` or `AsIs`
function](https://stat.ethz.ch/R-manual/R-devel/library/base/html/AsIs.html).
They look like this:
```{r list-col-demo}
dat <- data.frame (x = 1:3, y = 4:6)
dat$x <- I (as.list (dat$x)) # base R
dat <- tidyr::nest (dat, y = y)
print (dat)
```
The use of packages like [`tidyr`](https://tidyr.tidyverse.org) and
[`purrr`](https://purrr.tidyverse.org) quite often leads to
[`tibble`](https://tibble.tidyverse.org)-class inputs which contain
list-columns. Any functions which fail to identify and appropriately respond to
such inputs may generate unexpected errors, and this `autotest` is intended to
enforce appropriate handling of these kinds of inputs. The following lines
demonstrate the kinds of results that can arise without such checks.
```{r mtcars-error, error = TRUE}
m <- mtcars
head (m, n = 2L)
m$mpg <- I (as.list (m$mpg))
head (m, n = 2L) # looks exaxtly the same
cor (m)
```
In contrast, many functions either assume inputs to be lists, and convert when
not, or implicitly `unlist`. Either way, such functions may respond entirely
consistently regardless of the presence of list-columns, like this:
```{r mtcars-okay}
m$mpg <- paste0 ("a", m$mpg)
class (m$mpg)
```
The list-column `autotest` is intended to enforce consistent behaviour in
response to list-column inputs. One way to identify list-column formats is to
check the value of `class(unclass(.))` of each column. The `unclass` function
is necessary to first remove any additional class attributes, such as `I` in
`dat$x` above. A modified version of our function which identifies and responds
to list-column inputs might look like this:
```{r list-col-input}
code <- c ("#' my_function",
"#'",
"#' @param x An input",
"#' @return Something else",
"#' @examples",
"#' y <- my_function (x = 1)",
"#' y <- my_function (x = 1:2)",
"#' @export",
"my_function <- function (x) {",
" if (methods::is (unclass (x), \"list\")) {",
" x <- unlist (x)",
" }",
" if (is.numeric (x)) {",
" x <- as.integer (x)",
" }",
" return (x + 1L)",
"}")
writeLines (code, file.path (path, "R", "myfn.R"))
roxygen2::roxygenise (path)
```
That change once again leads to clean `autotest` results:
```{r autotest-TRUE4}
autotest_package (path, test = TRUE)
```
Of course simply attempting to `unlist` a complex list-column may be dangerous,
and it may be preferable to issue some kind of message or warning, or even
either simply remove any list-columns entirely or generate an error. Replacing
the above, potentially dangerous, line, `x <- unlist (x)` with a simple
`stop("list-columns are not allowed")` will also produce clean `autotest`
results.
## Return results and documentation
Functions which return complicated results, such as objects with specific
classes, need to document those class types, and `autotest` compares return
objects with documentation to ensure that this is done. The following code
constructs a new function to demonstrate some of the ways `autotest` inspects
return objects, demonstrating a vector input (`length(x) > 1`) in the example
to avoid messages regarding length checks an integer ranges.
```{r return-val}
code <- c ("#' my_function3",
"#'",
"#' @param x An input",
"#' @examples",
"#' y <- my_function3 (x = 1:2)",
"#' @export",
"my_function3 <- function (x) {",
" return (datasets::iris)",
"}")
writeLines (code, file.path (path, "R", "myfn3.R"))
roxygen2::roxygenise (path) # need to update docs with seed param
x4 <- autotest_package (path, test = TRUE)
DT::datatable (x4, options = list (dom = "t"))
```
Several new diagnostic messages are then issued regarding the description of
the returned value. Let's insert a description to see the effect.
```{r return-val-2}
code <- c (code [1:3],
"#' @return The iris data set as dataframe",
code [4:length (code)])
writeLines (code, file.path (path, "R", "myfn3.R"))
roxygen2::roxygenise (path) # need to update docs with seed param
x5 <- autotest_package (path, test = TRUE)
DT::datatable (x5, options = list (dom = "t"))
```
That result still contains a couple of diagnostic messages, but it is now
pretty clear what we need to do, which is to be precise with our specification
of the class of return object. The following then suffices to once again
generate clean `autotest` results.
```{r iris-update}
code [4] <- "#' @return The iris data set as data.frame"
writeLines (code, file.path (path, "R", "myfn3.R"))
roxygen2::roxygenise (path) # need to update docs with seed param
autotest_package (path, test = TRUE)
```
### Documentation of input parameters
Similar checks are performed on the documentation of input parameters, as
demonstrated by the following modified version of the preceding function.
```{r input-checks}
code <- c ("#' my_function3",
"#'",
"#' @param x An input",
"#' @return The iris data set as data.frame",
"#' @examples",
"#' y <- my_function3 (x = datasets::iris)",
"#' @export",
"my_function3 <- function (x) {",
" return (x)",
"}")
writeLines (code, file.path (path, "R", "myfn3.R"))
roxygen2::roxygenise (path) # need to update docs with seed param
x6 <- autotest_package (path, test = TRUE)
DT::datatable (x6, options = list (dom = "t"))
```
This warning again indicates precisely how it can be rectified, for example by
replacing the third line with
```{r input-fix, eval = FALSE}
code [3] <- "#' @param x An input which can be a data.frame"
```
## General Procedure
The demonstrations above hopefully suffice to indicate the general procedure
which `autotest` attempts to make as simple as possible. This procedure
consists of the following single point:
- From the moment you develop your first function, and every single time you
modify your code, do whatever steps are necessary to ensure
`autotest_package()` returns `NULL`.
This vignette has only demonstrated a few of the tests included in the package,
but as long as you use `autotest` throughout the entire process of package
development, any additional diagnostic messages should include sufficient
information for you to be able to restructure your code to avoid them.