Computationally intensive raster operations that work in pixel-wise
manner may be handled well with dynamic
branching over tiled subsets of the raster.
tar_terra_tiles()
is a target factory that enables creating
these dynamic branches so downstream targets can iterate over them. This
is useful when, for example, loading an entire raster into memory and
doing computations on it results in out of memory errors.
In order to use tar_terra_tiles()
, we need to break a
raster into smaller pieces. We can do that by providing
extents used by the raster. The concept of extent is
important, so let’s unpack that a bit more.
The extent describes the four points that cover the
area of a raster. The extent of a raster, r
, is printed in
the summary:
# example SpatRaster
f <- system.file("ex/elev.tif", package = "terra")
r <- rast(f)
r
#> class : SpatRaster
#> dimensions : 90, 95, 1 (nrow, ncol, nlyr)
#> resolution : 0.008333333, 0.008333333 (x, y)
#> extent : 5.741667, 6.533333, 49.44167, 50.19167 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> source : elev.tif
#> name : elevation
#> min value : 141
#> max value : 547
But we can get the extent with ext
(extent):
r_ext <- ext(r)
r_ext
#> SpatExtent : 5.74166666666667, 6.53333333333333, 49.4416666666667, 50.1916666666667 (xmin, xmax, ymin, ymax)
Which maps onto the four corners of the raster here:
rect_extent <- function(x, ...) {
rect(x[1], x[3], x[2], x[4], ...)
}
plot_extents <- function(x, ...) {
invisible(lapply(x, rect_extent, border = "hotpink", lwd = 2))
}
extend(r, 5) |> plot()
lines(r_ext, col = "hotpink", lty = 2)
points(r_ext, col = "hotpink", pch = 16)
Some geo-computational operations can be done independently of one another—we want to take advantage of that, and we can facilitate this by breaking the raster into smaller pieces, by creating new extents that describe new subsets of the raster.
We can use this extent information downstream in the analysis to describe how to break up a raster. This is similar to how we might want to chunk up a data frame into groups to distribute to different CPU cores. To help with this, we’ve got some helper functions.
geotargets
provides three helper functions that take a
SpatRaster
and output the extents for tiles:
tile_n()
,tile_grid()
, andtile_blocksize()
We will demonstrate these now.
tile_n()
We can use tile_n()
, which is the simplest of the three.
It produces about n
tiles in a grid.
r_tile_4 <- tile_n(r, 4)
#> creating 2 * 2 = 4 tile extents
r_tile_4
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.141667 49.816667 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 6.141667 6.533333 49.816667 50.191667
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.141667 49.441667 49.816667
#>
#> [[4]]
#> xmin xmax ymin ymax
#> 6.141667 6.533333 49.441667 49.816667
tile_grid()
For more control, use tile_grid()
, which allows
specification of the number of rows and columns to split the raster
into. Here we are specify that we want three columns and 1 row:
r_grid_3x1 <- tile_grid(r, ncol = 3, nrow = 1)
r_grid_3x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.008333 49.441667 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 6.008333 6.266667 49.441667 50.191667
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 6.266667 6.533333 49.441667 50.191667
plot(r)
plot_extents(r_grid_3x1)
tile_blocksize()
The third included helper is tile_blocksize()
, which
tiles by file block size. The block
size is a property of raster files, and is the number of pixels
(in the x and y direction) that is read into memory at a time. Tiling by
multiples of block size may therefore be more efficient because only one
block should need to be loaded to create each tile target. You can find
the blocksize with fileBlocksize
:
This tells us that it reads in the raster in 43x95 pixel sizes.
The tile_blocksize
function is similar to
tile_grid
, except instead of saying how many rows and
columns, we specify in units of blocksize.
If we just run tile_blocksize()
on r
we get
the extents of the specified blocksize:
tile_blocksize(r)
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.833333 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 49.833333
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
Which is the same as specifying blocksize for row and column at unit 1:
r_block_size_1x1 <- tile_blocksize(r, n_blocks_row = 1, n_blocks_col = 1)
r_block_size_1x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.833333 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 49.833333
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_1x1)
Here the block size is the same size for the first two blocks, and then a much more narrow block. This is different to the two other tile methods.
Here the column block size is the full width of the raster.
So we could instead have the blocksize extent be written out to 2 blocks in a row, and 1 block size for the columns:
r_block_size_2x1 <- tile_blocksize(r, n_blocks_row = 2, n_blocks_col = 1)
r_block_size_2x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_2x1)
This only works when the SpatRaster
points to a
file—in-memory rasters have no inherent block size.
The way targets typically works is you write a file named
_targets.R
, which describes the pipeline. See for example,
the _targets.R
file in the demo-geotargets repo.
However, in order to demonstrate many of the features with
targets
and geotargets
, we don’t want to have
to create many _targets.R
files. So instead we use a
targets function targets::tar_script()
.
This allows you to write the code you would have put in a
_targets.R
file.
What this means for you is you can essentially just “copy and paste”
the examples we provide in this vignette. When running the
tar_script
code, it will ask you each time if you want to
overwrite the _targets.R
file. This means if you are
exploring these examples and copying the entire examples, it is
worthwhile doing this in a separate repository to avoid overwriting your
own _targets.R
file.
When developing a targets
pipeline using
tar_terra_tiles()
with tile_blocksize()
, it’s
a good idea to figure out how many tiles tile_blocksize()
will create before implementing tar_terra_tiles()
. We’ll
start by making a bigger raster to experiment with using
terra::disagg()
, (which makes a higher resolution raster by
breaking the pixels into smaller pixels), and making multiple
layers.
targets::tar_script({
# contents of _targets.R
library(targets)
library(geotargets)
library(terra)
geotargets_option_set(gdal_raster_driver = "COG")
list(
tar_target(
raster_file,
system.file("ex/elev.tif", package = "terra"),
format = "file"
),
tar_terra_rast(
r,
disagg(rast(raster_file), fact = 10)
),
# add more layers
tar_terra_rast(
r_big,
c(r, r + 100, r * 10, r / 2),
memory = "transient"
)
)
})
tar_make()
#> terra 1.8.29
#> [34m▶[39m dispatched target raster_file
#> [32m●[39m completed target raster_file [0.001 seconds, 7.994 kilobytes]
#> [34m▶[39m dispatched target r
#> [32m●[39m completed target r [0.01 seconds, 822.781 kilobytes]
#> [34m▶[39m dispatched target r_big
#> [32m●[39m completed target r_big [0.025 seconds, 4.055 megabytes]
#> [34m▶[39m ended pipeline [0.439 seconds]
tar_load(r_big)
tile_blocksize(r_big)
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.168333 49.765000 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 6.168333 6.533333 49.765000 50.191667
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.168333 49.441667 49.765000
#>
#> [[4]]
#> xmin xmax ymin ymax
#> 6.168333 6.533333 49.441667 49.765000
Four tiles is reasonable, so we’ll go with that. Note that we have to
ensure the r_big
target is not in-memory for
tar_terra_tiles()
, so we set the targets option
memory = "transient"
. See the targets
documentation on memory for details.
The process that happens from here can be thought of as
split-apply-combine
.
tar_terra_tiles()
target factory
tile_n()
,
tile_grid()
, or tile_blocksize()
), supplying
this to tile_fun
.app
function from terra
,
which applies some function to the cells of a raster.tar_terra_rast()
and then supply the
pattern = map(tiles)
, where tiles
is the name
of the target created with tar_terra_tiles()
. You can think
of pattern = map(tiles)
as saying: “Do the task for each of
the tiles we have specified and return them as a list”tar_terra_rast()
and use
merge()
on the tiles.targets::tar_script({
# contents of _targets.R
library(targets)
library(geotargets)
library(terra)
geotargets_option_set(gdal_raster_driver = "COG")
tar_option_set(memory = "transient")
list(
tar_target(
raster_file,
system.file("ex/elev.tif", package = "terra"),
format = "file"
),
tar_terra_rast(
r,
disagg(rast(raster_file), fact = 10)
),
tar_terra_rast(
r_big,
c(r, r + 100, r * 10, r / 2),
memory = "transient"
),
# split
tar_terra_tiles(
tiles,
raster = r_big,
tile_fun = tile_blocksize,
description = "split raster into tiles"
),
# apply
tar_terra_rast(
tiles_mean,
app(tiles, \(x) mean(x, na.rm = TRUE)),
pattern = map(tiles),
description = "some computationaly intensive task performed on each tile"
),
# combine
tar_terra_rast(
merged_mean,
merge(sprc(tiles_mean)),
description = "merge tiles into a single SpatRaster"
)
)
})
tar_make()
#> terra 1.8.29
#> [32m✔[39m skipping targets (1 so far)...
#> [34m▶[39m dispatched target tiles_exts
#> [32m●[39m completed target tiles_exts [0.005 seconds, 153 bytes]
#> [34m▶[39m dispatched branch tiles_11882e184aa27102
#> [32m●[39m completed branch tiles_11882e184aa27102 [0.004 seconds, 982.55 kilobytes]
#> [34m▶[39m dispatched branch tiles_e39b1bcab4d45ba0
#> [32m●[39m completed branch tiles_e39b1bcab4d45ba0 [0.037 seconds, 249.778 kilobytes]
#> [34m▶[39m dispatched branch tiles_b14b762418a51f63
#> [32m●[39m completed branch tiles_b14b762418a51f63 [0.003 seconds, 599.615 kilobytes]
#> [34m▶[39m dispatched branch tiles_bfb55de5c880456f
#> [32m●[39m completed branch tiles_bfb55de5c880456f [0.004 seconds, 483.702 kilobytes]
#> [32m●[39m completed pattern tiles
#> [34m▶[39m dispatched branch tiles_mean_1bcefdb80d0879e0
#> [32m●[39m completed branch tiles_mean_1bcefdb80d0879e0 [2.192 seconds, 159.194 kilobytes]
#> [34m▶[39m dispatched branch tiles_mean_4ff32c943e05dc36
#> [32m●[39m completed branch tiles_mean_4ff32c943e05dc36 [1.914 seconds, 49.028 kilobytes]
#> [34m▶[39m dispatched branch tiles_mean_17723520c935f474
#> [32m●[39m completed branch tiles_mean_17723520c935f474 [1.645 seconds, 100.948 kilobytes]
#> [34m▶[39m dispatched branch tiles_mean_984998bf0804c2cf
#> [32m●[39m completed branch tiles_mean_984998bf0804c2cf [1.454 seconds, 87.272 kilobytes]
#> [32m●[39m completed pattern tiles_mean
#> [34m▶[39m dispatched target merged_mean
#> [32m●[39m completed target merged_mean [0.016 seconds, 879.912 kilobytes]
#> [34m▶[39m ended pipeline [7.772 seconds]
We can see from tar_make()
output above and the plots
below that tiles
and tiles_mean
are both
patterns with four branches each.
library(terra)
tar_load(tiles_mean)
op <- par(mfrow = c(2, 2))
for (i in seq_along(tiles_mean)) {
plot(tiles_mean[[i]])
}
And combined, they make the full plot again.