--- title: "Accessing the OData API" description: > Accessing submission data via the OData pathway. output: rmarkdown::html_vignette: self_contained: false vignette: > %\VignetteIndexEntry{Accessing the OData API} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} resource_files: - media/1701678074209.jpg - media/1701678107041.jpg - media/1701678512568.jpg - media/1701678535420.jpg - media/1701678592627.jpg --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r pkgs} # This vignette requires the following packages: # library(DT) library(leaflet) # library(listviewer) library(magrittr) library(tibble) library(tidyr) library(lubridate) library(knitr) library(dplyr) library(ggplot2) library(ruODK) ``` This vignette demonstrates `ruODK`'s workflow to extract data from ODK Central's OData service endpoint, and to prepare the data and the media attachments for further analysis and visualisation. The demonstrated workflow is roughly equivalent to ODK Central's "Export all data", which downloads all submissions and all repeating subgroups as CSV spreadsheets, and all media attachments in a local subfolder "attachments". An alternative pathway to getting data out of ODK Central is to use the REST API as documented (with live examples in multiple programming languages) at the [ODK Central API docs](https://docs.getodk.org/central-api/). # Configure ruODK The OData service URL is shown in the form's "Submissions" tab > "Analyse via OData" on ODK Central. It contains base URL, project ID, and form ID and is used by `ruODK::ru_setup()`. ```{r ru_setup_demo, eval=FALSE} # ODK Central's OData URL contains base URL, project ID, and form ID # ODK Central credentials can live in .Renviron # See vignette("setup") for setup and authentication options. ruODK::ru_setup( svc = Sys.getenv("ODKC_TEST_SVC"), un = Sys.getenv("ODKC_TEST_UN"), pw = Sys.getenv("ODKC_TEST_PW"), tz = "Australia/Perth", verbose = TRUE ) # File attachment download location loc <- fs::path("media") ``` This vignette shows how to access data, but under the bonnet uses the included package data. This allows to rebuild the vignette offline and without credentials to the originating ODK Central server. ```{r use_pkg_data} # Canned data data("fq_svc") data("fq_form_schema") data("fq_meta") data("fq_raw") data("fq_raw_strata") data("fq_raw_taxa") data("fq_data") data("fq_data_strata") data("fq_data_taxa") ``` To extract data from the OData API endpoints, we have to: * discover data endpoints from the OData service document, * inspect the metadata schema to infer data types, * download submissions from the data endpoints, * download media attachments and adjust their file paths to the downloaded files. ## OData service document Let's start off with the service document. ```{r load_odata_service, eval=F} fq_svc <- ruODK::odata_service_get() ``` The same data is included as example data `fq_svc`. ```{r view_service} fq_svc %>% knitr::kable(.) ``` `ruODK` provides the names and urls of the service endpoints as `tibble`. We see the main data available under the url `Submissions`, and repeating form groups called `taxon_encounter` and `vegetation_stratum` in the ODK form under the url `Submissions.taxon_encounter` and `Submissions.vegetation_stratum`, respectively. The main value we get out of the service document are these names of the form groups, which can differ between forms. ## OData metadata document Let's inspect the form metadata to review our data schema. While we can download the submission data without it, the metadata document contains information about field data types and attachment names. ```{r load_metadata, eval=FALSE} fq_meta <- ruODK::odata_metadata_get() ``` ```{r view_metadata, fig.width=7} if (requireNamespace("listviewer")) { listviewer::jsonedit(fq_meta) } else { ru_msg_info("Please install package listviewer!") } ``` As an alternative to the OData metadata document, ODK Central also offers form metadata as a much cleaner JSON document, which `ruODK` can read and parse into a clean `tibble` of field type, name, and path. `ruODK` uses this introspection to parse submission data. ```{r form_schema, eval=FALSE} fq_form_schema <- ruODK::form_schema() ``` ```{r form_schema_show} fq_form_schema %>% knitr::kable(.) ``` ## OData submission data documents Now let's download the form submissions and, separately, repeating form groups. `ruODK::odata_submission_get()` defaults to download the submission data, parse it into a tidy tibble, parses dates and datetimes, downloads and links file attachments, and handles spatial datatypes. This vignette is built with canned data, so the verbose messages are not shown. With `wkt=TRUE`, we'll receive spatial types as Well Known Text, which `ruODK` parses as plain text. With `wkt=FALSE` (the default), we'll receive spatial types as GeoJSON, which `ruODK` parses into a nested list. `ruODK` retains the original spatial field, and annotates the data with extracted longitude, latitude, altitude, and (where given) accuracy. These additional fields are prefixed with the original field name to prevent name collisions between possibly multiple location fields. ```{r load_odata, eval=FALSE} fq_data <- ruODK::odata_submission_get( table = fq_svc$name[1], local_dir = loc, wkt = TRUE ) fq_data_strata <- ruODK::odata_submission_get( table = fq_svc$name[2], local_dir = loc ) fq_data_taxa <- ruODK::odata_submission_get( table = fq_svc$name[3], local_dir = loc, wkt = TRUE ) ``` # Detour: Data rectangling The function `ruODK::odata_submission_get()` received the original XML response as a nested list of lists. To analyse and visualise the data, this nested list of lists must be transformed into a rectangular shape. The function `ruODK::odata_submission_rectangle()` is used internally to recursively un-nest list columns using `tidyr::unnest_wider()`. Unnamed columns, notably the anonymous lat/lon/alt coordinates, are named automatically to become unique (a feature of `tidyr::unnest_*()`), and then sanitised using the helper `janitor::clean_names()`. By default, form group names are used as prefix to the field names. This behaviour can be disabled by handing the argument `names_sep=NULL` to `tidyr::unnest_wider()` through running `ruODK::odata_submission_get() %>% ruODK::odata_submission_rectangle(names_sep = NULL)`. The vectorised function `ruODK::attachment_get()` is then used internally to download and link attachments like photos and other media to a local, relative path. This will take some time during the first run. Once the files exist locally, the download will be skipped. When used through `ruODK::odata_submission_get()`, `ruODK` will introspect the form schema to detect and then parse media attachment fields automatically. When used manually, field names of media attachment fields can be (partially or fully) specified, see `??ruODK::attachment_get()`. The date formats are parsed from ISO8601 timestamps into POSIXct objects with `ruODK::handle_ru_datetimes()`. We use our local timezone (GMT+08) in this example. `ruODK` introspects the form schema to detect and then parse date and datetime fields automatically. The repeated subgroup `taxon_encounter` is left joined to the main submission data to receive a (repeated) copy of the main submission data (such as location, time and habitat description). We will do the same to the other repeated subgroup `vegetation_stratum`. For clarity, we enable verbose messages from `ruODK::odata_submission_get()` and preserve the message output in the code chunk options with `message=TRUE`. In real-world use cases, messages can be disabled through the chunk option `message=FALSE`. We use a custom local path for attachments (`loc`). This results in a smaller installed package size for `ruODK`, as it shares the attachment files with the other vignettes. The default is a local folder `media`. The raw and unparsed example data is provided as data objects `fq_raw` (main submissions of form Flora Quadrat 0.4), `fq_raw_taxa` (repeated group "Taxon Encounter" within a Flora Quadrat), and `fq_raw_strata` (repeated group "Vegetation Stratum" within a Flora Quadrat). The parsed versions are included as data objects `fq_data`, `fq_data_strata`, and `fq_data_taxa`. To enable users without ODK Central credentials to build this vignette (e.g. on package installation with `build_vignettes=TRUE`), we show the real functions (such as `ruODK::odata_submission_get()`), but do not evaluate them. Instead, we use "canned data". The `ruODK` test suite ensures that canned data are equivalent to live data. The result of this code chunk should be exactly the same as the compact version with `odata_submission_get(parse=TRUE)`. ```{r} # Candidates for ruODK::handle_ru_datetimes() fq_form_schema %>% dplyr::filter(type %in% c("dateTime", "date")) %>% knitr::kable(.) # Candidates for ruODK::handle_ru_attachments() fq_form_schema %>% dplyr::filter(type == "binary") %>% knitr::kable(.) # Candidates for ruODK::handle_ru_geopoints() fq_form_schema %>% dplyr::filter(type == "geopoint") %>% knitr::kable(.) ``` ```{r, eval=FALSE} # The raw submission data fq_raw <- ruODK::odata_submission_get(table = fq_svc$name[1], parse = FALSE) fq_strata <- ruODK::odata_submission_get(table = fq_svc$name[2], parse = FALSE) fq_taxa <- ruODK::odata_submission_get(table = fq_svc$name[3], parse = FALSE) # Parse main data fq_data <- fq_raw %>% ruODK::odata_submission_rectangle() %>% ruODK::handle_ru_datetimes(fq_form_schema) %>% ruODK::handle_ru_geopoints(fq_form_schema) %>% ruODK::handle_ru_geotraces(fq_form_schema) %>% ruODK::handle_ru_geoshapes(fq_form_schema) %>% ruODK::handle_ru_attachments(fq_form_schema, local_dir = t) # Parse nested group "taxa" fq_data_taxa <- fq_taxa %>% ruODK::odata_submission_rectangle() %>% ruODK::handle_ru_datetimes(fq_form_schema) %>% ruODK::handle_ru_geopoints(fq_form_schema) %>% ruODK::handle_ru_geotraces(fq_form_schema) %>% ruODK::handle_ru_geoshapes(fq_form_schema) %>% ruODK::handle_ru_attachments(fq_form_schema, local_dir = t) %>% dplyr::left_join(fq_data, by = c("submissions_id" = "id")) # Parse nested group "strata" fq_data_strata <- fq_strata %>% ruODK::odata_submission_rectangle() %>% ruODK::handle_ru_datetimes(fq_form_schema) %>% ruODK::handle_ru_geopoints(fq_form_schema) %>% ruODK::handle_ru_geotraces(fq_form_schema) %>% ruODK::handle_ru_geoshapes(fq_form_schema) %>% ruODK::handle_ru_attachments(fq_form_schema, local_dir = t) %>% dplyr::left_join(fq_data, by = c("submissions_id" = "id")) ``` Note: A manually resized version of the original photos in this example live in the package source under [`articles/media` ](https://github.com/ropensci/ruODK/tree/main/vignettes/media). To minimise package size, they were resized with imagemagick: ```{sh, eval=FALSE} find vignettes/media -type f -exec mogrify -resize 200x150 {} \\; ``` ## DIY rectangling For those wishing to go one step further, this section demonstrates the inner workings of `ruODK`, the recursive use of `tidyr::unnest_wider()`. The unnesting could also be done manually by building up a pipeline, which stepwise unnests each list column. This requires knowledge of the data structure, which can either be looked up from the metadata, or by inspecting the raw data, `fq_raw`. The following command has been built by stepwise adding `tidyr::unnest_wider()` expressions to the pipe until all list columns were eliminated. The trailing `invisible()` allows us to toggle parts of the pipe by catching the dangling `%>%`. ```{r rectangle_diy} fq_data_diy <- tibble::tibble(value = fq_raw$value) %>% tidyr::unnest_wider(value) %>% # 1. find list columns: tidyr::unnest_wider(`__system`) %>% tidyr::unnest_wider(meta) %>% # add more lines here to unnest other form groups # # 2. rename column names dplyr::rename( uuid = `__id` # add more columns, e.g. # longitude=`...1`, latitude=`...2`, altitude=`...3` ) %>% # 3. handle media attachments # dplyr::mutate(photo_1 = attachment_get(data_url, uuid, photo_1)) %>% invisible() ``` # Visualise data This section provides some examples of standard data visualisations. ## Datatable The package `DT` provides an interactive (and searchable) datatable. ```{r vis_data, eval = requireNamespace("DT")} DT::datatable(fq_data) DT::datatable(fq_data_taxa) DT::datatable(fq_data_strata) # DT::datatable(head(fq_data_diy)) ``` ## Map The R package `leaflet` provides interactive maps. Constructing label and popup requires knowledge of the dataset structure. ```{r map_data} leaflet::leaflet(width = 800, height = 600) %>% leaflet::addProviderTiles("OpenStreetMap.Mapnik", group = "Place names") %>% leaflet::addProviderTiles("Esri.WorldImagery", group = "Aerial") %>% leaflet::clearBounds() %>% leaflet::addAwesomeMarkers( data = fq_data, lng = ~location_corner1_longitude, lat = ~location_corner1_latitude, icon = leaflet::makeAwesomeIcon(text = "Q", markerColor = "red"), label = ~ glue::glue("{location_area_name} {encounter_start_datetime}"), popup = ~ glue::glue( "