This vignette is a high-level
overview of targets
and its educational materials. The goal
is to summarize the major features of targets
and direct
users to the appropriate resources. It explains how to get started, and
then it briefly describes each chapter of the user manual.
targets
?The targets
R package is a Make-like pipeline toolkit
for Statistics and data science in R. targets
accelerates
analysis with easy-to-configure parallel computing, enhances
reproducibility, and reduces the burdens of repeated computation and
manual data micromanagement. A fully up-to-date targets
pipeline is tangible evidence that the output aligns with the code and
data, which substantiates trust in the results.
The top of the reference
website links to a number of materials to help new users start
learning targets
. It lists online talks, tutorials, books,
and workshops in the order that a new user should consume them. The rest
of the main page outlines a more comprehensive list of resources.
The user manual
starts with a walkthrough
chapter, a short tutorial to quickly started with targets
using a simple example project. That project also has a repository with
the source code and an RStudio Cloud workspace
that lets you try out the workflow in a web browser. Sign up for a free
RStudio Cloud account, click on the link, and try out functions
tar_make()
and tar_read()
in the R
console.
The help
guide explains how to best get help using targets
,
including reproducible examples and where to post.
The debugging chapter describes two alternative built-in systems for troubleshooting errors. The first system uses workspaces, which let you load a target’s dependencies into you R session. This way is usually preferred, especially with large pipelines on computing clusters, but it still may require some manual work. The second system launches an interactive debugger while the pipeline is actually running, which may not be feasible in some situations, but can often help you reach the problem more quickly.
targets
expects users to adopt a function-oriented style
of programming. User-defined R functions are essential to express the
complexities of data generation, analysis, and reporting. The user manual has a whole
chapter dedicated to user-defined functions for data science, and it
explains why they are important and how to use them in
targets
-powered pipelines.
The target construction chapter explains best practices for creating targets: what a good target should do, how much work a target should do, and guidelines for thinking about side effects and upstream dependencies (i.e. other targets and global objects).
The packages
chapter explains best practices for working with packages in
targets
: how to load them, how to work with packages as
projects, target factories inside packages, and automatically
invalidating targets based on changes inside one or more packages.
The projects
chapter explains best practices for working with
targets
-powered projects: the recommended file structure,
recommended third-party tools, multi-project repositories, and
interdependent projects.
The chapter at https://books.ropensci.org/targets/data.html describes
how the targets package stores data, manages memory, allows you to
customize the data processing model. When a target finishes running
during tar_make()
, it returns an R object. Those return
values, along with descriptive metadata, are saved to persistent storage
so your pipeline stays up to date even after you exit R. By default,
this persistent storage is a special _targets/
folder
created in your working directory by tar_make()
. However,
you can also interact with files outside the data store and send target
data to the cloud.
The chapter at https://books.ropensci.org/targets/literate-programming.html
covers literate programming: how to render an R Markdown or Quarto
report as part of a targets
pipeline. A report can depend
on other targets and take advantage of long computation already
completed upstream.
targets
is capable of distributing the computation in a
pipeline across multiple cores of a laptop or multiple jobs on a
computing cluster. The orchestration and scaling mechanisms are
automatic, and only high-level configuration is required. Visit https://books.ropensci.org/targets/crew.html to learn
more. Configuration happens through the crew
package: https://wlandau.github.io/crew/. The appendix at https://books.ropensci.org/targets/hpc.html describes
how to use targets
with legacy backends
clustermq
and future
.
https://books.ropensci.org/targets/performance.html
explains how to monitor the progress of a running pipeline and optimize
your pipeline for performance. targets
has
easy-to-configure efficiency settings at the level of
tar_target()
and tar_option_set()
.
Sometimes, a pipeline contains more targets than a user can comfortably type by hand. For projects with hundreds of targets, branching can make the _targets.R file more concise and easier to read and maintain. Dynamic branching is a way to create new targets while the pipeline is running, and it is best suited to iterating over a larger number of very similar tasks. The dynamic branching chapter outlines this functionality, including how to create branching patterns, different ways to iterate over data, and recommendations for batching large numbers of small tasks into a comfortably small number of dynamic branches.
Static
branching is the act of defining a group of targets in bulk before
the pipeline starts. Whereas dynamic branching uses last-minute
dependency data to define the branches, static branching uses
metaprogramming to modify the code of the pipeline up front. Whereas
dynamic branching excels at creating a large number of very similar
targets, static branching is most useful for smaller number of
heterogeneous targets. Some users find it more convenient because they
can use tar_manifest()
and tar_visnetwork()
to
check the correctness of static branching before launching the pipeline.
Read more about it in the static branching
chapter.