Default location can be set with the env var GBIF_HOME,
otherwise will use the default provided by tools::R_user_dir()
gbif_dir()
gbif_dir()
path to the gbif home directory directory
gbif_dir()
gbif_dir()
Sync a local directory with selected release of the AWS copy of GBIF
gbif_download( version = gbif_version(), dir = gbif_dir(), bucket = gbif_default_bucket(), region = "" )
gbif_download( version = gbif_version(), dir = gbif_dir(), bucket = gbif_default_bucket(), region = "" )
version |
Release date (YYYY-MM-DD) which should be synced. Will detect latest version by default. |
dir |
path to local directory where parquet files should be stored.
Fine to leave at default, see |
bucket |
Name of the regional S3 bucket desired. Default is "gbif-open-data-us-east-1". Select a bucket closer to your compute location for improved performance, e.g. European researchers may prefer "gbif-open-data-eu-central-1" etc. |
region |
bucket region (usually ignored? Just set the bucket appropriately) |
Sync parquet files from GBIF public data catalog, https://registry.opendata.aws/gbif/.
Note that data can also be found on the Microsoft Cloud, https://planetarycomputer.microsoft.com/dataset/gbif
Also, some users may prefer to download this data using an alternative interface or work on a cloud-host machine where data is already available. Note, these data include all CC0 and CC-BY licensed data in GBIF that have coordinates which passed automated quality checks, see https://github.com/gbif/occurrence/blob/master/aws-public-data.md.
logical indicating success or failure.
gbif_download()
gbif_download()
Return a path to the directory containing GBIF example parquet data
gbif_example_data()
gbif_example_data()
example data is taken from the first 1000 rows of the 2011-11-01 release of the parquet data.
path to the example occurrence data installed with the package.
gbif_example_data()
gbif_example_data()
Local connection to a downloaded GBIF Parquet database
gbif_local( dir = gbif_parquet_dir(version = gbif_version(local = TRUE)), tblname = "gbif", backend = c("arrow", "duckdb"), safe = TRUE )
gbif_local( dir = gbif_parquet_dir(version = gbif_version(local = TRUE)), tblname = "gbif", backend = c("arrow", "duckdb"), safe = TRUE )
dir |
location of downloaded GBIF parquet files |
tblname |
name for the database table |
backend |
choose duckdb or arrow. |
safe |
logical. Should we exclude columns |
A summary of this GBIF data, along with column meanings can be found at https://github.com/gbif/occurrence/blob/master/aws-public-data.md
a remote tibble tbl_sql
class object
gbif <- gbif_local(gbif_example_data())
gbif <- gbif_local(gbif_example_data())
Connect to GBIF remote directly. Can be much faster than downloading for one-off use or when using the package from a server in the same region as the data. See Details.
gbif_remote( version = gbif_version(), bucket = gbif_default_bucket(), safe = TRUE, unset_aws = getOption("gbif_unset_aws", TRUE), endpoint_override = Sys.getenv("AWS_S3_ENDPOINT", "s3.amazonaws.com"), backend = c("arrow", "duckdb"), ... )
gbif_remote( version = gbif_version(), bucket = gbif_default_bucket(), safe = TRUE, unset_aws = getOption("gbif_unset_aws", TRUE), endpoint_override = Sys.getenv("AWS_S3_ENDPOINT", "s3.amazonaws.com"), backend = c("arrow", "duckdb"), ... )
version |
GBIF snapshot date |
bucket |
GBIF bucket name (including region). A default can also be set using
the option |
safe |
logical, default TRUE. Should we exclude columns |
unset_aws |
Unset AWS credentials? GBIF is provided in a public bucket,
so credentials are not needed, but having a AWS_ACCESS_KEY_ID or other AWS
environmental variables set can cause the connection to fail. By default,
this will unset any set environmental variables for the duration of the R session.
This behavior can also be turned off globally by setting the option
|
endpoint_override |
optional parameter to |
backend |
duckdb or arrow |
... |
additional parameters passed to the |
Query performance is dramatically improved in queries that return only
a subset of columns. Consider using explicit select()
commands to return only
the columns you need.
A summary of this GBIF data, along with column meanings can be found at https://github.com/gbif/occurrence/blob/master/aws-public-data.md
a remote tibble tbl_sql
class object.
gbif <- gbif_remote() gbif()
gbif <- gbif_remote() gbif()
Can also return latest locally downloaded version, or list all versions
gbif_version( local = FALSE, dir = gbif_dir(), bucket = gbif_default_bucket(), all = FALSE, ... )
gbif_version( local = FALSE, dir = gbif_dir(), bucket = gbif_default_bucket(), all = FALSE, ... )
local |
Search only local versions? logical, default |
dir |
local directory ( |
bucket |
Which remote bucket (region) should be checked |
all |
show all versions? (logical, default |
... |
additional arguments to arrow::s3_bucket |
A default version can be set using option gbif_default_version
latest available gbif version, string
## Latest local version available: gbif_version(local=TRUE) ## default version options(gbif_default_version="2021-01-01") gbif_version() ## Latest online version available: gbif_version() ## All online versions: gbif_version(all=TRUE)
## Latest local version available: gbif_version(local=TRUE) ## default version options(gbif_default_version="2021-01-01") gbif_version() ## Latest online version available: gbif_version() ## All online versions: gbif_version(all=TRUE)