--- title: "comtradr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{comtradr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, echo=FALSE} library(comtradr) ``` ## Data availability See [here for an overview](https://uncomtrade.org/docs/why-are-some-converted-datasets-not-accessible-in-the-ui/) of available commodity classifications. ## Package information API wrapper for the [UN Comtrade Database](https://comtradeplus.un.org/). UN Comtrade provides historical data on the weights and value of specific goods shipped between countries, more info can be found [here](https://uncomtrade.org/docs/welcome-to-un-comtrade/). Full API documentation can be found [here](https://comtradedeveloper.un.org/). ## Install and load comtradr Install the development version from GitHub: ```{r eval = FALSE} install.packages("comtradr") ``` Load comtradr ```{r} library(comtradr) ``` ## Authentication 🔐 **Do not be discouraged by the complicated access to the token - you can do it! 💪** As stated above, you need an API token, see the FAQ of Comtrade for details on how to obtain it: ➡️ https://uncomtrade.org/docs/api-subscription-keys/ You need to follow the detailed explanations, which include screenshots, in the Wiki of Comtrade to the letter. ☝️ I am not writing them out here, because they might be updated regularly. However, once you are signed up, select the `comtrade - v1` product, which is the free API. ### Storing the API key If you are in an interactive session, you can call the following function to save your API token to the environment file for the current session. ```{r, eval = F} library(comtradr) set_primary_comtrade_key() ``` If you are not in an interactive session, you can register the token once in your session using the following base-r function. ```{r, eval = F} Sys.setenv('COMTRADE_PRIMARY' = 'xxxxxxxxxxxxxxxxx') ``` If you would like to set the comtrade key permanently, we recommend editing the project `.Renviron` file, where you need to add a line with `COMTRADE_PRIMARY = xxxx-your-key-xxxx`. ℹ️ Do not forget the line break after the last entry. This is the easiest by taking advantage of the great `usethis` package. ```{r, eval = F} usethis::edit_r_environ(scope = 'project') ``` ## Making API calls Lets say we want to get data on the total imports into the United States from Germany, France, Japan, and Mexico, for the last five years. ```{r, echo = FALSE} v_data_1 <- system.file("extdata", "vignette_data_1.rda", package = "comtradr") if (!file.exists(v_data_1)) { stop("internal vignette data set '~/extdata/vignette_data_1.rda' not found", call. = FALSE) } load(v_data_1) ``` ```{r, eval = FALSE} example_1 <- ct_get_data( reporter = 'USA', partner = c('DEU', 'FRA','JPN','MEX'), commodity_code = 'TOTAL', start_date = 2018, end_date = 2023, flow_direction = 'import' ) ``` API calls return a tidy data frame. ```{r} str(example_1) ``` Here are a few more examples to show the different parameter options: By default, the return data is in yearly amounts. We can pass `"monthly"` to arg `freq` to return data in monthly amounts, however the API limits each "monthly" query to a single year. ```{r, eval = FALSE} # all monthly data for a single year (API max of 12 months per call). q <- ct_search(reporters = "USA", partners = c("Germany", "France", "Japan", "Mexico"), flow_direction = "import", start_date = 2012, end_date = 2012, freq = "monthly") # monthly data for specific span of months (API max of twelve months per call). q <- ct_search(reporters = "USA", partners = c("Germany", "France", "Japan", "Mexico"), flow_direction = "import", start_date = "2012-03", end_date = "2012-07", freq = "monthly") ``` Countries passed to parameters `reporters` and `partners` must be spelled as they appear in the official ISO 3 character code convention. Search trade related to specific commodities (say, tomatoes). We can query the Comtrade commodity reference table to see all of the different commodity descriptions available for tomatoes. ```{r} ct_commodity_lookup("tomato") ``` If we want to search for shipment data on all of the commodity descriptions listed, then we can simply adjust the parameters for `ct_commodity_lookup` so that it will return only the codes, which can then be passed along to `ct_search`. ```{r, eval = FALSE} tomato_codes <- ct_commodity_lookup("tomato", return_code = TRUE, return_char = TRUE) q <- ct_get_data( reporter = 'USA', partner = c('DEU', 'FRA','JPN','MEX'), commodity_code = tomato_codes, start_date = "2012", end_date = "2013", flow_direction = 'import' ) ``` On the other hand, if we wanted to exclude juices and sauces from our search, we can pass a vector of the relevant codes to the API call. ```{r, eval = FALSE} q <- ct_get_data( reporter = 'USA', partner = c('DEU', 'FRA','JPN','MEX'), commodity_code = c("0702", "070200", "2002", "200210", "200290"), start_date = "2012", end_date = "2013", flow_direction = 'import' ) ``` ## API search metadata In addition to the trade data, each API return object contains metadata as attributes. ```{r} # The url of the API call. attributes(q)$url # The date-time of the API call. attributes(q)$time ``` ## More on the lookup functions Functions `ct_commodity_lookup` is able to take multiple search terms as input. ```{r} ct_commodity_lookup(c("tomato", "trout"), return_char = TRUE) ``` `ct_commodity_lookup` can return a vector (as seen above) or a named list, using parameter `return_char` ```{r} ct_commodity_lookup(c("tomato", "trout"), return_char = FALSE) ``` For `ct_commodity_lookup`, if any of the input search terms return zero results and parameter `verbose` is set to `TRUE`, a warning will be printed to console (set `verbose` to `FALSE` to turn off this feature). ```{r} ct_commodity_lookup(c("tomato", "sldfkjkfdsklsd"), verbose = TRUE) ``` ## API rate limits The Comtrade API imposes rate limits on users. `comtradr` features automated throttling of API calls to ensure the user stays within the limits defined by Comtrade. Below is a breakdown of those limits, API docs on these details can be found [here](https://uncomtrade.org/docs/subscriptions/). * Without user token: unlimited calls/day, up to 500 records per call (registration and API subscription key not required) -- this end-point is not implemented here. * With valid user token: 500 calls/day, up to 100,000 records per call (free registration and API subscription key required). The API also limits the amount of times it can be queried per minute, but we could not find documentation on this. Hence the function automatically responds to the parameters returned by each request to adjust to the changing wait times. In addition to these rate limits, the API imposes some limits on parameter combinations. * The arguments `reporters`, `partners` do not have an `All` value specified natively anymore, we have implemented it in R for convenience reasons on our side. * For date range the `start_date` and `end_date` must not span more than twelve months or twelve years. There is no more parameter to specify `All` years. * For arg `commodity_codes`, the maximum number of input values is dependent on the maximum length of the request. Hence, if specifying `reporters` or `partners`, this value might be shorter. ## Package Data `comtradr` ships with a few different package data objects, and functions for interacting with and using the package data. **Country/Commodity Reference Tables** As explained previously, making API calls with `comtradr` often requires the user to query the commodity reference table (this is done using functions `ct_commodity_lookup`). These reference tables are generated by the UN Comtrade, and are updated roughly once a year. Since they're updated infrequently, the tables are saved as cached data objects within the `comtradr` package, and are referenced by the package functions when needed. The function features an `update` argument, that checks for updates, downloads the new tables if necessary and makes them available during the current R session. It will also print a message indicating whether updates were found, like so: ```{r, eval = F} ct_commodity_lookup('tomato',update = T) ``` If any updates are found, the message will state which reference table(s) were updated. Additionally, the Comtrade API features a number of different commodity reference tables, based on different trade data classification schemes (for more details, see [this](https://uncomtrade.org/docs/list-of-references-parameter-codes/) page from the API docs). `comtradr` ships with all available commodity reference tables. The user may return and access any of the available commodity tables by specifying arg `commodity_type` within function `ct_get_ref_table` (e.g., `ct_get_ref_table(dataset_id = "S1")` will return the commodity table that follows the "S1" scheme). The `dataset_id`´s are listed in the help page of the function `ct_get_ref_table()`. They are as follows: * Datasets that contain codes for the `commodity_code` argument. The name is the same as you would provide under `commodity_classification`. * 'HS' This is probably the most common classification for goods. * 'B4' * 'B5' * 'EB02' * 'EB10' * 'EB10S' * 'EB' * 'S1' * 'S2' * 'S3' * 'S4' * 'SS' * Datasets that are related to other arguments, can be queried directly with the name of the argument in the `ct_get_data()`-function. * 'reporter' * 'partner' * 'mode_of_transport' * 'customs_code' Furthermore, there is a dataset readily available, with the iso3c-codes for the respective partner and reporter countries `country_codes$iso_3`, but I would recommend using the `ct_get_ref_table()` function, as it allows to update to the latest values on the fly. ## Visualize Once the data is collected, we can use it to create some basic visualizations. **Plot 1**: Plot total value (USD) of Chinese exports to Mexico, South Korea and the United States, by year. ```{r, echo = FALSE} v_data_2 <- system.file("extdata", "vignette_data_2.rda", package = "comtradr") if (!file.exists(v_data_2)) { stop("internal vignette data set '~/extdata/vignette_data_2.rda' not found", call. = FALSE) } load(v_data_2) ``` ```{r, eval = FALSE} # Comtrade api query. example_2 <- ct_get_data( reporter = 'CHN', partner = c('KOR', 'USA','MEX'), commodity_code = 'TOTAL', start_date = 2012, end_date = 2023, flow_direction = 'export' ) ``` ```{r, warning = FALSE, message = FALSE} library(ggplot2) # Apply polished col headers. # Create plot. ggplot(example_2, aes(period, primary_value/1000000000, color = partner_desc, group = partner_desc)) + geom_point(size = 2) + geom_line(size = 1) + scale_color_manual( values = c("darkgreen","red","grey30"), name = "Destination\nCountry") + ylab('Export Value in billions') + xlab('Year') + labs(title = "Total Value (USD) of Chinese Exports", subtitle = 'by year') + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + theme_minimal() ``` **Plot 2**: Plot the top eight destination countries/areas of Thai shrimp exports, by weight (KG), for 2007 - 2011. ```{r, echo = FALSE} v_data_3 <- system.file("extdata", "vignette_data_3.rda", package = "comtradr") if (!file.exists(v_data_3)) { stop("internal vignette data set '~/extdata/vignette_data_3.rda' not found", call. = FALSE) } load(v_data_3) ``` ```{r, eval = FALSE} # First, collect commodity codes related to shrimp. shrimp_codes <- ct_commodity_lookup("shrimp", return_code = TRUE, return_char = TRUE) # Comtrade api query. example_3 <- ct_get_data(reporter = "THA", partner = "all", trade_direction = "exports", start_date = 2007, end_date = 2011, commodity_code = shrimp_codes) ``` ```{r, warning = FALSE, message = FALSE} library(ggplot2) library(dplyr) # Create country specific "total weight per year" dataframe for plotting. plotdf <- example_3 %>% group_by(partner_desc, period) %>% summarise(kg = as.numeric(sum(net_wgt, na.rm = TRUE))) # Get vector of the top 8 destination countries/areas by total weight shipped # across all years, then subset plotdf to only include observations related # to those countries/areas. top8 <- plotdf |> group_by(partner_desc) |> summarise(kg = as.numeric(sum(kg, na.rm = TRUE))) |> slice_max(n = 8, order_by = kg) |> arrange(desc(kg)) |> pull(partner_desc) plotdf <- plotdf %>% filter(partner_desc %in% top8) # Create plots (y-axis is NOT fixed across panels, this will allow us to ID # trends over time within each country/area individually). ggplot(plotdf,aes(period,kg/1000, group = partner_desc))+ geom_line() + geom_point() + facet_wrap(.~partner_desc, nrow = 2, ncol = 4,scales = 'free_y')+ labs(title = "Weight (KG in tons) of Thai Shrimp Exports", subtitle ="by Destination Area, 2007 - 2011")+ theme_minimal()+ theme(axis.text.x = element_text(angle = 45,hjust = 1, vjust = 1)) ``` ## Handling large amounts of Parameters In the `comtradr` package, several function parameters can accept `everything` as a valid input. Using `everything` for these parameters has specific meanings and can be a powerful tool for querying data. Internally, these values are set to `NULL` and the parameter is omitted entirely in the request to the API, the API then by default returns all possible values. Here's a breakdown of how `everything` is handled for different parameters: ### `commodity_code` Setting `commodity_code` to `everything` will query all possible commodity values. This can be useful if you want to retrieve data for all commodities without specifying individual codes. ### `flow_direction` If `flow_direction` is set to `everything`, all possible values for trade flow directions are queried. This includes imports, exports, re-imports, re-exports and some more specified in `ct_get_ref_table('flow_direction')`. ### `reporter` and `partner` Using `everything` for `reporter` or `partner` will query all possible values for reporter and partner countries, but also includes aggregates like `World` or some miscellaneous like `ASEAN`. Be careful when aggregating these values, so as to not count trade values multiple times in different aggregates. Alternatively, specifically for these values, you can also use `all_countries`, which allows you to query all countries which are not aggregates of some kind of grouped parameters like `ASEAN`. These values can usually be safely aggregated. This allows you to retrieve trade data for all countries without specifying individual ISO3 codes. ### `mode_of_transport`, `partner_2`, and `customs_code` Setting these parameters to `everything` will query all possible values related to the mode of transport, secondary partner, and customs procedures. This provides a comprehensive view of the data across different transportation modes and customs categories. ### Example Usage Here's an example of how you might use `everything` parameters to query comprehensive data: ```{r, warning = FALSE, message = FALSE, eval = F} # Querying all commodities and flow directions for USA and Germany from ## 2010 to 2011 data <- ct_get_data( reporter = c('USA', 'DEU'), commodity_code = 'everything', flow_direction = 'everything', start_date = '2010', end_date = '2011' ) ``` Using `everything` parameters can lead to large datasets, as they often remove specific filters on the data. It's essential to be mindful of the size of the data being queried, especially when using multiple `everything` parameters simultaneously.