--- title: "Converting Meta (Facebook) Mobility QuadKey-identified Datasets into Raster Files" author: "Florencia D'Andrea" date: "`r Sys.Date()`" output: html_document: toc : true number_sections: true toc_depth: 3 urlcolor: blue pkgdown: as_is: true vignette: > %\VignetteIndexEntry{Converting Meta (Facebook) Mobility QuadKey-identified Datasets into Raster Files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, comment = "#>" ) ``` ```{r setup, echo = FALSE, message = FALSE, warning=FALSE} library(quadkeyr) library(sf) library(dplyr) library(tidyr) library(ggplot2) library(rnaturalearth) library(stars) ``` > Please, visit the [README](https://docs.ropensci.org/quadkeyr/) > for general information about this package This section focuses on creating a raster from QuadKey data formatted as provided by [Meta (formerly known as Facebook) mobility data](https://dataforgood.facebook.com/). If you want to convert other QuadKey-identified datasets you can read [the previous vignette.](https://docs.ropensci.org/quadkeyr/articles/quadkey_identified_data_to_raster.html) Meta (Facebook) mobility data for each location and zoom level is stored in a separate `csv` file for each day and time. # Basic workflow ```{r w3, echo = FALSE, out.width= "80%", fig.align='center'} knitr::include_graphics("workflow_facebook.png") ``` ## Reading and formatting multiple `csv` Files - All these files are for the same area and zoom level, but dates and hours will change. This information will be evident in the filenames: `cityA_2020_04_15_0800.csv`. - `read_fb_mobility_files()` prints a message with the days or hours missing among the provided files. - The columns `day` and `hour` are created internally. ```{r read} files <- quadkeyr::read_fb_mobility_files( path_to_csvs = paste0(system.file("extdata", package = "quadkeyr"), "/"), colnames = c( # The columns not listed here will be omitted "lat", "lon", "quadkey", "date_time", "n_crisis", "percent_change", "day", "hour" ), coltypes = list( lat = 'd', lon = 'd', quadkey = 'c', date_time = 'T', n_crisis = 'c', percent_change = 'c', day = 'D', hour = 'i' ) ) head(files) ``` > Note: The Meta (Facebook) mobility data used in `quadkeyr` has been altered > and doesn't represent real values. > The examples in this vignette are only to demostrate how functions work. > Please, contact [Data for Good](https://dataforgood.facebook.com/) > to get datasets. ## Create a Quadkey polygon grid for your area of analysis This function generates a `sf POLYGON data.frame` with a `quadkey` and `geometry` column. You might be wondering why we're not using the function we've already created, `add_regular_polygon_grid()`, which adds a column with QuadKey polygons, creating a regular grid, to an existing data.frame. There are two reasons for why we're using a different approach: 1 - The `read_fb_mobility_files()` output contains multiple datasets for the same area with the almost the same QuadKeys reported. Using a function that calculates each QuadKey by row would unnecessarily duplicate calculations. 2 - When you receive Meta (Facebook) mobility data, you might not always get exactly the same QuadKeys in all the files, even if they all report the same area. This is especially important considering that you may be receiving new files in the future with QuadKeys that haven't been reported yet. That's why we create an `sf` POLYGON data.frame, retaining all the QuadKey polygons within the area of analysis, and then proceed to join the results. If creating the regular polygon grid using the bounding box of the provided QuadKeys may not work for your case, you can directly create the grid for the full area of analysis using `create_qk_grid()` function. Read the [the previous vignette](https://docs.ropensci.org/quadkeyr/articles/quadkey_identified_data_to_raster.html) to learn more about this function. ```{r grid} regular_grid <- quadkeyr::get_regular_polygon_grid(data = files) head(regular_grid$data) ``` ```{r reg_grid, out.width= "70%", fig.align='center'} ggplot() + geom_sf(data = regular_grid$data, color = 'red', linewidth = 1.5, fill = NA) + theme_minimal() ``` Now, we can merge this grid with the Meta (Facebook) mobility data in the `files` data.frame. We've chosen to use an `dplyr::inner_join()` that will retain only the polygons reported in the files. ```{r joinsrg} files_polygons <- files |> dplyr::inner_join(regular_grid$data, by = c("quadkey")) head(files_polygons) ``` If you want to modify any of the variables you intend to use, it should be done before this point. For example,`apply_weekly_lag()` applies a 7 day lag to `n_crisis` and `percent_change` creating new variables. You could apply this function to `files` and then select the resulting variable `percent_change_7` as the argument `var` in `polygon_to_raster()` We are not demonstrating it in this example. Now that we have the polygons, let's create the raster files. ```{r rasterfiles} # Generate the raster files quadkeyr::polygon_to_raster(data = files_polygons, nx = regular_grid$num_cols, ny = regular_grid$num_rows, template = files_polygons, var = 'percent_change', filename = 'cityA', path = paste0( system.file("extdata", package = "quadkeyr"), "/")) ``` ```{r qkgrid, out.width= "70%", fig.align='center'} raster <- stars::read_stars(paste0(system.file("extdata", package = "quadkeyr"), "/cityA_2020-04-15_16.tif")) # More about plotting: # https://r-spatial.github.io/stars/reference/geom_stars.html ggplot() + geom_stars(data = raster) + coord_equal() + theme_void() + scale_fill_viridis_c(na.value = "transparent")+ scale_x_discrete(expand=c(0,0))+ scale_y_discrete(expand=c(0,0)) ``` # Advanced use: intermediate functions The function `get_regular_polygon_grid()` is a wrapper for `quadkey_to_latlong()`, `regular_qk_grid()`, and `grid_to_polygon()`. Let's explore how these functions are operating in isolation. We will work with the output of the `read_fb_mobility_files()`, the same that we have already used in the basic workflow: ```{r} head(files) ``` There are two functions wrapped inside `read_fb_mobility_files()` that you may want to use individually: * If you wish to correct the Facebook-provided format only, you can utilize the `format_fb_data()` function. * If you need to retrieve the missing combinations of days and hours in a temporal sequence of files, you can employ `missing_combinations()`. Please, refer to the examples in the functions' documentation to understand better how they work. ## Convert the QuadKeys to latitude/longitude coordinates Even if these files correspond to the same location, the amount of QuadKeys reported could vary. To start, we will select from all the files the QuadKeys that appear at least once and convert them to an `sf` POINT data.frame using `quadkey_to_latlong()` ```{r qtll1} quadkey_vector <- unique(files$quadkey) qtll <- quadkey_to_latlong(quadkey_data = quadkey_vector) head(qtll) ``` Let's plot the QuadKey grid. ```{r plotqtll, echo= FALSE, fig.align='center'} ggplot() + geom_sf(data = ne_countries(returnclass = 'sf'), fill = 'beige') + geom_sf(data = qtll, alpha = 0.5, size = 0.5) + coord_sf(xlim = c(-58.65, -58.45), ylim = c(-34.65, -34.45), expand = FALSE) + theme_minimal() + theme(panel.background = element_rect(fill = "lightblue"), panel.ontop = FALSE, panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + ylab("Latitude") + xlab("Longitude") ``` ## Complete the grid Some of the QuadKeys inside the bounding box are missing, we can't consider this a regular grid. In order to create the raster images, we need to obtain a regular grid. We can do that with the function `regular_qk_grid()`. Pay attention that the output is a list with three elements: - `data`. - `num_rows` and; - `num_cols`. ```{r rgc} regular_grid <- regular_qk_grid(data = qtll) head(regular_grid) ``` The outputs `num_cols` and `num_rows` refer to the number of columns and rows, information that we will use to create the raster. The original 150-point grid now has one point per row and cell, resulting in a complete grid of 369 points, as depicted in the plot. The additional points are highlighted in orange: ```{r echo = FALSE, fig.align='center'} ggplot() + geom_sf(data = ne_countries(returnclass = 'sf'), fill = 'beige') + geom_sf(data = regular_grid$data, alpha = 0.5, size = 0.5, color = 'orange') + geom_sf(data = qtll, alpha = 0.5, size = 0.5) + coord_sf(xlim = c(-58.65, -58.45), ylim = c(-34.65, -34.45), expand = FALSE) + theme_minimal() + theme(panel.background = element_rect(fill = "lightblue"), panel.ontop = FALSE, panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + ylab("Latitude") + xlab("Longitude") ``` ## Create the polygons Now we can transform the QuadKeys into polygons. ```{r polygrid} polygrid <- quadkeyr::grid_to_polygon(data = regular_grid$data) head(polygrid) ``` ```{r plotpolygrid, echo = FALSE, fig.align='center'} ggplot() + geom_sf(data = ne_countries(returnclass = 'sf'), fill = 'beige') + geom_sf(data = polygrid, size = 0.1, color = 'red', alpha = 0.5) + coord_sf(xlim = c(-58.65, -58.45), ylim = c(-34.65, -34.45), expand = TRUE) + theme_minimal() + theme(panel.background = element_rect(fill = "lightblue"), panel.ontop = FALSE, panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + ylab("Latitude") + xlab("Longitude") ``` Once we have the complete grid of QuadKey polygons, we combined the information with the Facebook data in `files`. In this step, we can select the variables that will be part of the analysis and also create new variables if needed. ```{r joinfiles} polyvar <- files |> dplyr::inner_join(polygrid, by = 'quadkey') head(polyvar) ``` ## Create the rasters for the variables and data involved. The rasters are going to be created automatically for each day and time reported. Each raster will be created as `__