---
title: "Finding data"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Finding data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```r
library(rnaturalearth)
library(sf)
```
## Available data
There are a lot of data that can be downloaded from [Natural Earth](https://www.naturalearthdata.com/) with `ne_download()`. These data are divided into two main categories: *physical* and *cultural* vector data. The `df_layers_physical` and `df_layers_cultural` data frames included in the `rnaturalearth` packages show what layer of data can be downloaded.
### Physical vector data
```r
data(df_layers_physical)
knitr::kable(
df_layers_physical,
caption = "physical vector data available via ne_download()"
)
```
Table: physical vector data available via ne_download()
|layer | scale10| scale50| scale110|
|:----------------------------------|-------:|-------:|--------:|
|antarctic_ice_shelves_lines | 1| 1| 0|
|antarctic_ice_shelves_polys | 1| 1| 0|
|coastline | 1| 1| 1|
|geographic_lines | 1| 1| 1|
|geography_marine_polys | 1| 1| 1|
|geography_regions_elevation_points | 1| 1| 1|
|geography_regions_points | 1| 1| 1|
|geography_regions_polys | 1| 1| 1|
|glaciated_areas | 1| 1| 1|
|lakes | 1| 1| 1|
|lakes_europe | 1| 0| 0|
|lakes_historic | 1| 1| 0|
|lakes_north_america | 1| 0| 0|
|lakes_pluvial | 1| 0| 0|
|land | 1| 1| 1|
|land_ocean_label_points | 1| 0| 0|
|land_ocean_seams | 1| 0| 0|
|land_scale_rank | 1| 0| 0|
|minor_islands | 1| 0| 0|
|minor_islands_coastline | 1| 0| 0|
|minor_islands_label_points | 1| 0| 0|
|ocean | 1| 1| 1|
|ocean_scale_rank | 1| 0| 0|
|playas | 1| 1| 0|
|reefs | 1| 0| 0|
|rivers_europe | 1| 0| 0|
|rivers_lake_centerlines | 1| 1| 1|
|rivers_lake_centerlines_scale_rank | 1| 1| 0|
|rivers_north_america | 1| 0| 0|
Based on the previous table, we know that we can download the `ocean` vector at small scale (110). Note that scales are defined as one of `110`, `50`, `10` or `small`, `medium`, `large`.
```r
plot(
ne_download(type = "ocean", category = "physical", scale = "small")["geometry"],
col = "lightblue"
)
#> Reading layer `ne_110m_ocean' from data source
#> `/tmp/RtmpapHyoT/ne_110m_ocean.shp' using driver `ESRI Shapefile'
#> Warning in CPL_read_ogr(dsn, layer, query,
#> as.character(options), quiet, : GDAL Message 1:
#> /tmp/RtmpapHyoT/ne_110m_ocean.shp contains polygon(s) with
#> rings with invalid winding order. Autocorrecting them, but
#> that shapefile should be corrected using ogr2ogr for
#> example.
#> Simple feature collection with 2 features and 3 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -180 ymin: -85.60904 xmax: 180 ymax: 90
#> Geodetic CRS: WGS 84
```
![](finding-data.Rmd-3-1.png)
### Cultural vector data
```r
data(df_layers_cultural)
knitr::kable(
df_layers_cultural,
caption = "cultural vector data available via ne_download()"
)
```
Table: cultural vector data available via ne_download()
|layer | scale10| scale50| scale110|
|:-----------------------------------------------|-------:|-------:|--------:|
|admin_0_antarctic_claim_limit_lines | 1| 0| 0|
|admin_0_antarctic_claims | 1| 0| 0|
|admin_0_boundary_lines_disputed_areas | 1| 1| 0|
|admin_0_boundary_lines_land | 1| 1| 1|
|admin_0_boundary_lines_map_units | 1| 0| 0|
|admin_0_boundary_lines_maritime_indicator | 1| 1| 0|
|admin_0_boundary_map_units | 0| 1| 0|
|admin_0_breakaway_disputed_areas | 0| 1| 0|
|admin_0_countries | 1| 1| 1|
|admin_0_countries_lakes | 1| 1| 1|
|admin_0_disputed_areas | 1| 0| 0|
|admin_0_disputed_areas_scale_rank_minor_islands | 1| 0| 0|
|admin_0_label_points | 1| 0| 0|
|admin_0_map_subunits | 1| 1| 0|
|admin_0_map_units | 1| 1| 1|
|admin_0_pacific_groupings | 1| 1| 1|
|admin_0_scale_rank | 1| 1| 1|
|admin_0_scale_rank_minor_islands | 1| 0| 0|
|admin_0_seams | 1| 0| 0|
|admin_0_sovereignty | 1| 1| 1|
|admin_0_tiny_countries | 0| 1| 1|
|admin_0_tiny_countries_scale_rank | 0| 1| 0|
|admin_1_label_points | 1| 0| 0|
|admin_1_seams | 1| 0| 0|
|admin_1_states_provinces | 1| 1| 1|
|admin_1_states_provinces_lakes | 1| 1| 1|
|admin_1_states_provinces_lines | 1| 1| 1|
|admin_1_states_provinces_scale_rank | 1| 1| 1|
|airports | 1| 1| 0|
|parks_and_protected_lands_area | 1| 0| 0|
|parks_and_protected_lands_line | 1| 0| 0|
|parks_and_protected_lands_point | 1| 0| 0|
|parks_and_protected_lands_scale_rank | 1| 0| 0|
|populated_places | 1| 1| 1|
|populated_places_simple | 1| 1| 1|
|ports | 1| 1| 0|
|railroads | 1| 0| 0|
|railroads_north_america | 1| 0| 0|
|roads | 1| 0| 0|
|roads_north_america | 1| 0| 0|
|time_zones | 1| 0| 0|
|urban_areas | 1| 1| 0|
|urban_areas_landscan | 1| 0| 0|
```r
plot(
ne_download(
type = "airports",
category = "cultural",
scale = 10
)["geometry"],
pch = 21,
bg = "grey"
)
#> Reading layer `ne_10m_airports' from data source
#> `/tmp/RtmpapHyoT/ne_10m_airports.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 893 features and 40 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -175.1356 ymin: -53.78147 xmax: 179.1954 ymax: 78.24672
#> Geodetic CRS: WGS 84
```
![](finding-data.Rmd-5-1.png)
## Searching for countries and continents
In this article, we explore how we can search for data available to download within `rnaturalearth`. Let's begin by loading country data using the `read_sf()` function from the `sf` package. In the following code snippet, we read the Natural Earth dataset, which contains information about the sovereignty of countries.
```r
df <- read_sf("/vsizip/vsicurl/https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_sovereignty.zip")
head(df)
#> Simple feature collection with 6 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578
#> Geodetic CRS: WGS 84
#> # A tibble: 6 × 169
#> featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF
#>
#> 1 Admin-0 sov… 5 2 Indonesia IDN 0
#> 2 Admin-0 sov… 5 3 Malaysia MYS 0
#> 3 Admin-0 sov… 0 2 Chile CHL 0
#> 4 Admin-0 sov… 0 3 Bolivia BOL 0
#> 5 Admin-0 sov… 0 2 Peru PER 0
#> 6 Admin-0 sov… 0 2 Argentina ARG 0
#> # ℹ 163 more variables: LEVEL , TYPE , TLC ,
#> # ADMIN , ADM0_A3 , GEOU_DIF ,
#> # GEOUNIT , GU_A3 , SU_DIF , SUBUNIT ,
#> # SU_A3 , BRK_DIFF , NAME ,
#> # NAME_LONG , BRK_A3 , BRK_NAME ,
#> # BRK_GROUP , ABBREV , POSTAL ,
#> # FORMAL_EN , FORMAL_FR , NAME_CIAWF , …
```
### Finding countries
One way to search for countries is to search within the `ADMIN` vector. Let's start by plotting some of the first countries.
```r
lapply(df$ADMIN[1:6], \(x) {
plot(ne_countries(country = x)["geometry"], main = x)
})
```
Suppose that we want to search the polygons for the US, how should we spell it?
```r
ne_countries(country = "USA")
ne_countries(country = "United States")
ne_countries(country = "United States Of America")
ne_countries(country = "United States of America")
```
One possibility consists to search within the `ADMIN` vector using a regular expression to find all occurrences of the word *states*.
```r
df$ADMIN[grepl("states", df$ADMIN, ignore.case = TRUE)]
#> [1] "United States of America"
#> [2] "Federated States of Micronesia"
```
We can now get the data.
```r
plot(ne_countries(country = "United States of America")["geometry"])
```
![](finding-data.Rmd-10-1.png)
### Continents
Finally, let's create plots for each continent using the `ne_countries` function with the continent parameter.
```r
unique(df$CONTINENT)
#> [1] "Asia" "South America"
#> [3] "Europe" "Africa"
#> [5] "North America" "Oceania"
#> [7] "Antarctica" "Seven seas (open ocean)"
```
```r
lapply(unique(df$CONTINENT), \(x) {
plot(
ne_countries(
continent = x,
scale = "medium"
)["geometry"],
main = x
)
})
```