Adhering to sound versioning practices is crucial for ensuring the reproducibility of software. Despite the expertise in software engineering, the ever-growing complexity and continuous development of new, potentially disruptive features present significant challenges in maintaining code functionality over time. This pertains not only to backward compatibility but also to future-proofing. When code handles critical production loads and relies on numerous external software libraries, it’s likely that these dependencies will evolve. Infrastructure-as-code and other DevOps principles shine in addressing these challenges. However, they may appear less approachable and more labor-intensive to set up for the average R developer.
Are you ready to test your custom R functions and system commands in a a different environment with isolated software builds that are both pure at build and at runtime, without leaving the R console?
Let’s introduce with_nix()
. with_nix()
will
evaluate custom R code or shell commands with command line interfaces
provided by Nixpkgs in a Nix environment, and thereby bring the
read-eval-print-loop feeling. Not only can you evaluate custom R
functions or shell commands in Nix environments, but you can also bring
the results back to your current R session as R objects.
We aim to accommodate various use cases, considering a gradient of declarativity in individual or sets of software environments based on personal preferences. There are two main modes for defining and comparing code running through R and system commands (command line interfaces; CLIs)
with_nix()
from,
too. You are probably on the way of getting a passionate Nix user.Carefully curated software improves over time, so does R. We pick an example from the R changelog, the following literal entry in R 4.2.0:
as.vector()
gains a data.frame
method
which returns a simple named list, also clearing a long standing ‘FIXME’
to enable as.vector(<data.frame>, mode ="list")
. This
breaks code relying on as.vector(<data.frame>)
to
return the unchanged data frame.”The goal is to illustrate this change in behavior from R versions 4.1.3 and before to R versions 4.2.0 and later.
First, we write a default.nix
file containing Nix
expressions that pin R version 4.1.3 from Nixpkgs.
library("rix")
path_env_1 <- file.path(".", "_env_1_R-4-1-3")
rix(
r_ver = "4.1.3",
overwrite = TRUE,
project_path = path_env_1
)
The following expression is written to default.nix in the subfolder
./_env_1_R-4-1-3/
.
let
pkgs = import (fetchTarball "https://github.com/rstats-on-nix/nixpkgs/archive/2022-04-19.tar.gz") {};
system_packages = builtins.attrValues {
inherit (pkgs)
glibcLocales
nix
R;
};
in
pkgs.mkShell {
LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
LANG = "en_US.UTF-8";
LC_ALL = "en_US.UTF-8";
LC_TIME = "en_US.UTF-8";
LC_MONETARY = "en_US.UTF-8";
LC_PAPER = "en_US.UTF-8";
LC_MEASUREMENT = "en_US.UTF-8";
buildInputs = [ system_packages ];
}
This also includes a custom .Rprofile
file that ensure
that this subshell will not load any packages installed to the user’s
library of packages.
We now have set up the configuration for R 4.1.3 set up in a
default.nix
file in the folder
./_env_1_R-4-1-3
. Since you are sure you are using an R
version higher 4.2.0 available on your system, you can check what that
as.vector.data.frame()
S3 method returns a list.
df <- data.frame(a = 1:3, b = 4:6)
as.vector(x = df, mode = "list")
#> $a
#> [1] 1 2 3
#>
#> $b
#> [1] 4 5 6
This is different for R versions 4.1.3 and below, where you should get an identical data frame back.
To formally validate in a ‘System-to-Nix’ approach that the object
returned from as.vector.data.frame()
is before
R
< 4.2.0, we define a function that runs the
computation above.
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_system_1 <- df_as_vector(x = df))
#> $a
#> [1] 1 2 3
#>
#> $b
#> [1] 4 5 6
Then, we will evaluate this test code through a
nix-shell
R session. This adds both build-time and run-time
purity with the declarative Nix software configuration we have made
earlier. with_nix()
leverages the following principles
under the hood:
Computing on the Language: Manipulating language objects using code.
Static Code Analysis: Detecting global objects and package environments in the function call stack of ‘expr’. This involves utilizing essential functionality from the ‘codetools’ package, which is recursively iterated.
Serialization of Dependent R objects: Saving
them to disk and deserializing them back into the R session’s RAM via a
temporary folder. This process establishes isolation between two
distinct computational environments, accommodating both ‘System-to-Nix’
and ‘Nix-to-Nix’ computational modes. Simultaneously, it facilitates the
transfer of input arguments, dependencies across the call stack, and
outputs of expr
between the Nix-R and the system’s R
sessions.
This approach guarantees reproducible side effects and effectively
streams messages and errors into the R session. Thereby, the {sys}
package facilitates capturing standard outputs and errors as text output
messages. Please be aware that with_nix()
will invoke
nix-shell
, which will itself run nix-build
in
case the Nix derivation (package) for R version 4.1.3 is not yet in your
Nix store. This will take a bit of time to get the cache. You will see
in your current R console the specific Nix paths that will be downloaded
and copied into your Nix store automatically.
# now run it in `nix-shell`; `with_nix()` takes care
# of exporting global objects of `df_as_vector` recursively
out_nix_1 <- with_nix(
expr = function() df_as_vector(x = df), # wrap to avoid evaluation
program = "R",
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0
expr
argument
of with_nix()
In the previous code snippet we wrapped the top-level
expr
function with function()
or
function(){}
. As an alternative, you can also provide
default arguments when assigning the function used as expr
input like this:
Then, you just supply the name of the function to evaluate with default arguments.
out_nix_1_b <- with_nix(
expr = df_as_vector, # provide name of function
program = "R",
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
It yields the same results.
as.vector.data.frame()
for both R versions
4.1.3 and 4.2.0 from NixpkgsHere follows an example a Nix-to-Nix
solution, with two
subshells to track the evolution of base R in this specific case. We can
verify the breaking changes in case study 1 in more declarative manner
when we use both R 4.1.3 and R 4.2.0 from Nixpkgs. Since we already have
defined R 4.1.3 in the env
_1_R-4-1-3
subshell, we can use it as a source environment where with_nix() is
launched from. Accordingly, we define the R 4.2.0 environment in a
env
_1_2_R-4-2-0
using Nix via
rix::rix()
. The latter environment will be the target
environment where df_as_vector()
will be evaluated in.
library("rix")
path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0")
rix(
r_ver = "4.2.0",
overwrite = TRUE,
project_path = path_env_1_2,
shell_hook = "R"
)
list.files(path_env_1_2)
"default.nix"
Now, initiate a new R session as development environment using
nix-shell
. Open a new terminal at the current working
directory of your R session. The provided expression
default.nix
. defines R 4.1.3 in a “subfolder per subshell”
approach. nix-shell
will use the expression by
default.nix
and prefer it over any other .nix
files, except when you put a shell.nix
file in that folder,
which takes precedence.
After some time downloading caches and doing builds, you will enter
an R console session with R 4.1.3. You did not need to type in R first,
because we set up a R shell hook via rix::rix()
. Next, we
define again the target function to test in R 4.2.0, too.
# current Nix-R session with R 4.1.3
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_nix_1 <- df_as_vector(x = df))
out_nix_1_2 <- with_nix(
expr = function() df_as_vector(x = df),
program = "R",
project_path = path_env_1_2,
message_type = "simple" # you can do `"verbose"`, too
)
You can now formally compare the outputs of the computation of the same code in R 4.1.3 vs. R 4.2.0 environments controlled by Nix.
We add one more layer to the reproducibility of the R ecosystem. User libraries from CRAN or GitHub, one thing that makes R shine is the huge collection of software packages available from the community.
There was a change introduce in {stringr} 1.5.0; in earlier versions, this line of code:
would return the character "a"
. However, this behaviour
is unexpected: it really should return an error. This was addressed in
versions after 1.5.0:
out_system_stringr <- tryCatch(
expr = {
stringr::str_subset(c("", "a"), "")
},
error = function(e) NULL
)
Since the code returns an error, we wrap it inside
tryCatch()
and return NULL
instead of an error
(if we wouldn’t do that, this vignette could not compile!).
Let’s build a subshell with the latest version of R, but an older
version of {stringr}
:
library("rix")
path_env_stringr <- file.path(".", "_env_stringr_1.4.1")
rix(
r_ver = "4.3.1",
r_pkgs = "[email protected]",
overwrite = TRUE,
project_path = path_env_stringr
)
We can now run the code in the subshell
out_nix_stringr <- with_nix(
expr = function() stringr::str_subset(c("", "a"), ""),
program = "R",
project_path = path_env_stringr,
message_type = "simple"
)
Here are the last few lines printed on screen:
==> `expr` succeeded!
### Finished code evaluation in `nix-shell` ###
* Evaluating `expr` in `nix-shell` returns:
[1] "a"
Not only do we see the result of evaluating the code in the subshell,
we also have access to it: out_nix_stringr
holds this
result.
We can now compare the two: the result of the code running in our
main session with the latest version of {stringr}
and the
result of the code running in the subshell with the old version of
{stringr}
:
As expected, the result is FALSE
.
Nix subshells are quite useful in cases where you need to use a
package that might be difficult to install, such as
{arrow}
, or other packages that must be compiled. Depending
on your operating system you need to compile {arrow}
from
source, which can be a frustrating experience, especially if you only
need it to load data and bring it down to a manageable size (using
select()
and filter()
for instance). This use
cases illustrates how to achieve this.
Let’s start by building a subshell that is based on a distinct
revision of nixpkgs
, for which we know that arrow compiles
on both linux and macOS (darwin).
library("rix")
path_env_arrow <- file.path("env_arrow")
rix(
r_ver = "4.1.1",
r_pkgs = c("dplyr", "arrow"),
overwrite = TRUE,
project_path = path_env_arrow
)
This specific revision of R contains {arrow}
13. Let’s
now suppose that you already have a script with some code to load and
transform some data using {arrow}
. It may look something
like this:
library(arrow)
library(dplyr)
arrow_cars <- arrow_table(cars)
arrow_cars %>%
filter(speed > 10) %>%
as.data.frame()
To run this code in a subshell, we recommend wrapping it inside a function:
arrow_script <- function() {
library(arrow)
library(dplyr)
arrow_cars <- arrow_table(cars)
arrow_cars %>%
filter(speed > 10) %>%
as.data.frame()
}
Which we can then run in the subshell:
out_nix_arrow <- with_nix(
expr = function() arrow_script(),
program = "R",
project_path = path_env_arrow,
message_type = "simple"
)
This will run the function in the subshell, and its output will be
saved in the out_nix_arrow
variable, for further
manipulation in your main shell/session.