Package 'rgbif'

Title: Interface to the Global Biodiversity Information Facility API
Description: A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (GBIF; <https://www.gbif.org/developer/summary>). GBIF is a database of species occurrence records from sources all over the globe. rgbif includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the GBIF tile map service to make rasters summarizing huge amounts of data.
Authors: Scott Chamberlain [aut] , Damiano Oldoni [aut] , Vijay Barve [ctb] , Peter Desmet [ctb] , Laurens Geffert [ctb], Dan Mcglinn [ctb] , Karthik Ram [ctb] , rOpenSci [fnd] (https://ropensci.org/), John Waller [aut, cre]
Maintainer: John Waller <[email protected]>
License: MIT + file LICENSE
Version: 3.8.1.1
Built: 2025-01-14 03:25:43 UTC
Source: https://github.com/ropensci/rgbif

Help Index


Interface to the Global Biodiversity Information Facility API.

Description

rgbif: A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility.

About

This package gives you access to data from GBIF https://www.gbif.org/ via their API.

Documentation for the GBIF API

Author(s)

Scott Chamberlain

Karthik Ram

Dan Mcglinn

Vijay Barve

John Waller


Check input WKT

Description

Check input WKT

Usage

check_wkt(wkt = NULL, skip_validate = FALSE)

Arguments

wkt

(character) one or more Well Known Text objects

skip_validate

(logical) whether to skip wk::wk_problems call or not. Default: FALSE

Examples

## Not run: 
check_wkt('POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))')
check_wkt('POINT(30.1 10.1)')
check_wkt('LINESTRING(3 4,10 50,20 25)')

# check many passed in at once
check_wkt(c('POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))',
  'POINT(30.1 10.1)'))

# bad WKT
# wkt <- 'POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 a))'
# check_wkt(wkt)

# many wkt's, semi-colon separated, for many repeated "geometry" args
wkt <- "POLYGON((-102.2 46.0,-93.9 46.0,-93.9 43.7,-102.2 43.7,-102.2 46.0))
;POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))"
check_wkt(gsub("\n", '', wkt))

## End(Not run)

Facetted count occurrence search.

Description

Facetted count occurrence search.

Usage

count_facet(keys = NULL, by = "country", countries = 10, removezeros = FALSE)

Arguments

keys

(numeric) GBIF keys, a vector. optional

by

(character) One of georeferenced, basisOfRecord, country, or publishingCountry. default: country

countries

(numeric) Number of countries to facet on, or a vector of country names. default: 10

removezeros

(logical) remove zeros or not? default: FALSE

Examples

## Not run: 
# Select number of countries to facet on
count_facet(by='country', countries=3, removezeros = TRUE)
# Or, pass in country names
count_facet(by='country', countries='AR', removezeros = TRUE)

spplist <- c('Geothlypis trichas','Tiaris olivacea','Pterodroma axillaris',
             'Calidris ferruginea','Pterodroma macroptera',
             'Gallirallus australis',
             'Falco cenchroides','Telespiza cantans','Oreomystis bairdi',
             'Cistothorus palustris')
keys <- sapply(spplist,
  function(x) name_backbone(x, rank="species")$usageKey)
count_facet(keys, by='country', countries=3, removezeros = TRUE)
count_facet(keys, by='country', countries=3, removezeros = FALSE)
count_facet(by='country', countries=20, removezeros = TRUE)
count_facet(keys, by='basisOfRecord', countries=5, removezeros = TRUE)

# get occurrences by georeferenced state
## across all records
count_facet(by='georeferenced')

## by keys
count_facet(keys, by='georeferenced')

# by basisOfRecord
count_facet(by="basisOfRecord")

## End(Not run)

Search for more obscure dataset metadata.

Description

Search for more obscure dataset metadata.

Usage

dataset(
  country = NULL,
  type = NULL,
  identifierType = NULL,
  identifier = NULL,
  machineTagNamespace = NULL,
  machineTagName = NULL,
  machineTagValue = NULL,
  modified = NULL,
  query = NULL,
  deleted = FALSE,
  limit = NULL,
  start = NULL,
  curlopts = list()
)

Arguments

country

The 2-letter country code (as per ISO-3166-1) of the country publishing the dataset.

type

The primary type of the dataset. Available values : OCCURRENCE, CHECKLIST, METADATA, SAMPLING_EVENT, MATERIAL_ENTITY.

identifierType

An identifier type for the identifier parameter. Available values : URL, LSID, HANDLER, DOI, UUID, FTP, URI, UNKNOWN, GBIF_PORTAL, GBIF_NODE, GBIF_PARTICIPANT, GRSCICOLL_ID, GRSCICOLL_URI, IH_IRN, ROR, GRID, CITES, SYMBIOTA_UUID, WIKIDATA, NCBI_BIOCOLLECTION.

identifier

An identifier of the type given by the identifierType parameter.

machineTagNamespace

Filters for entities with a machine tag in the specified namespace.

machineTagName

Filters for entities with a machine tag with the specified name (use in combination with the machineTagNamespace parameter).

machineTagValue

Filters for entities with a machine tag with the specified value (use in combination with the machineTagNamespace and machineTagName parameters).

modified

The modified date of the dataset. Accepts ranges and a ” can be used as a wildcard, e.g.:modified=2023-04-01,

query

Simple full text search parameter. The value for this parameter can be a simple word or a phrase. Wildcards are not supported.

deleted

Logical specifying whether to return only deleted datasets.

limit

Controls the number of results in the page.

start

Determines the start for the search results.

curlopts

options passed on to crul::HttpClient.

Details

This function allows you to search for some more obscure dataset metadata that might not be possible with dataset_search(). For example, searching through registry machinetags.

Value

A list.

Examples

## Not run: 
dataset(limit=3)
dataset(country="US",limit=3)
dataset(type="CHECKLIST",limit=3)
dataset(identifierType = "URL",limit=3)
dataset(identifier = 168,limit=3)
dataset(machineTagNamespace = "metasync.gbif.org",limit=3)
dataset(machineTagName = "datasetTitle",limit=3)
dataset(machineTagValue = "Borkhart",limit=3)
dataset(modified = "2023-04-01", limit=3) 
dataset(q = "dog", limit=3) 
dataset(deleted=TRUE,limit=3)

## End(Not run)

Get a GBIF dataset from a doi

Description

Get a GBIF dataset from a doi

Usage

dataset_doi(doi = NULL, limit = 20, start = NULL, curlopts = list())

Arguments

doi

the doi of the dataset you wish to lookup.

limit

Controls the number of results in the page.

start

Determines the offset for the search results.

curlopts

options passed on to crul::HttpClient.

Details

This function allows for dataset lookup using a doi. Be aware that some doi have more than one dataset associated with them.

Value

A list.

Examples

## Not run: 
dataset_doi('10.15468/igasai')

## End(Not run)

Check if a dataset is gridded

Description

Check if a dataset is gridded

Usage

dataset_gridded(
  uuid = NULL,
  min_dis = 0.05,
  min_per = 50,
  min_dis_count = 30,
  return = "logical",
  warn = TRUE
)

Arguments

uuid

(vector) A character vector of GBIF datasetkey uuids.

min_dis

(numeric) (default 0.02) Minimum distance in degrees to accept as gridded.

min_per

(integer)(default 50%) Minimum percentage of points having same nearest neighbor distance to be considered gridded.

min_dis_count

(default 30) Minimum number of unique points to accept an assessment of 'griddyness'.

return

(character) (default "logical"). Choice of "data" will return a data.frame of more information or "logical" will return just TRUE or FALSE indicating whether a dataset is considered 'gridded".

warn

(logical) indicates whether to warn about missing values or bad values.

Details

Gridded datasets are a known problem at GBIF. Many datasets have equally-spaced points in a regular pattern. These datasets are usually systematic national surveys or data taken from some atlas (“so-called rasterized collection designs”). This function uses the percentage of unique lat-long points with the most common nearest neighbor distance to identify gridded datasets.

https://data-blog.gbif.org/post/finding-gridded-datasets/

I recommend keeping the default values for the parameters.

Value

A logical vector indicating whether a dataset is considered gridded. Or if return="data", a data.frame of more information.

Examples

## Not run: 

dataset_gridded("9070a460-0c6e-11dd-84d2-b8a03c50a862")
dataset_gridded(c("9070a460-0c6e-11dd-84d2-b8a03c50a862",
               "13b70480-bd69-11dd-b15f-b8a03c50a862"))



## End(Not run)

List datasets that are deleted or have no endpoint.

Description

List datasets that are deleted or have no endpoint.

Usage

dataset_duplicate(limit = 20, start = NULL, curlopts = list())

dataset_noendpoint(limit = 20, start = NULL, curlopts = list())

Arguments

limit

Controls the number of results in the page.

start

Determines the start for the search results.

curlopts

options passed on to crul::HttpClient.

Details

Get a list of deleted datasets or datasets with no endpoint. You get the full and no parameters aside from limit and start are accepted.

Value

A list.

Examples

## Not run: 
dataset_noendpoint(limit=3)

## End(Not run)

Get dataset metadata using a datasetkey

Description

Get dataset metadata using a datasetkey

Usage

dataset_get(uuid = NULL, curlopts = list())

dataset_process(uuid = NULL, limit = 20, start = NULL, curlopts = list())

dataset_networks(uuid = NULL, limit = 20, start = NULL, curlopts = list())

dataset_constituents(uuid = NULL, limit = 20, start = NULL, curlopts = list())

dataset_comment(uuid = NULL, curlopts = list())

dataset_contact(uuid = NULL, curlopts = list())

dataset_endpoint(uuid = NULL, curlopts = list())

dataset_identifier(uuid = NULL, curlopts = list())

dataset_machinetag(uuid = NULL, curlopts = list())

dataset_tag(uuid = NULL, curlopts = list())

dataset_metrics(uuid = NULL, curlopts = list())

Arguments

uuid

A GBIF datasetkey uuid.

curlopts

options passed on to crul::HttpClient.

limit

Number of records to return.

start

Record number to start at.

Details

dataset_metrics() can only be used with checklist type datasets.

Value

A tibble or a list.

References

https://techdocs.gbif.org/en/openapi/v1/registry

Examples

## Not run: 
dataset_get("38b4c89f-584c-41bb-bd8f-cd1def33e92f")
dataset_process("38b4c89f-584c-41bb-bd8f-cd1def33e92f",limit=3)
dataset_networks("3dab037f-a520-4bc3-b888-508755c2eb52")
dataset_constituents("7ddf754f-d193-4cc9-b351-99906754a03b",limit=3)
dataset_comment("2e4cc37b-302e-4f1b-bbbb-1f674ff90e14")
dataset_contact("7ddf754f-d193-4cc9-b351-99906754a03b")
dataset_endpoint("7ddf754f-d193-4cc9-b351-99906754a03b")
dataset_identifier("7ddf754f-d193-4cc9-b351-99906754a03b")
dataset_machinetag("7ddf754f-d193-4cc9-b351-99906754a03b")
dataset_tag("c47f13c1-7427-45a0-9f12-237aad351040")
dataset_metrics("7ddf754f-d193-4cc9-b351-99906754a03b")

## End(Not run)

Search for datasets and dataset metadata.

Description

Search for datasets and dataset metadata.

Usage

datasets(
  data = "all",
  type = NULL,
  uuid = NULL,
  query = NULL,
  id = NULL,
  limit = 100,
  start = NULL,
  curlopts = list()
)

Arguments

data

The type of data to get. One or more of: 'organization', 'contact', 'endpoint', 'identifier', 'tag', 'machinetag', 'comment', 'constituents', 'document', 'metadata', 'deleted', 'duplicate', 'subDataset', 'withNoEndpoint', or the special 'all'. Default: all

type

Type of dataset. Options: include occurrence, checklist, metadata, or sampling_event.

uuid

UUID of the data node provider. This must be specified if data is anything other than all

query

Query term(s). Only used when data=all

id

A metadata document id.

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

A list.

References

https://www.gbif.org/developer/registry#datasets

Examples

## Not run: 
datasets(limit=5)
datasets(type="occurrence", limit=10)
datasets(uuid="a6998220-7e3a-485d-9cd6-73076bd85657")
datasets(data='contact', uuid="a6998220-7e3a-485d-9cd6-73076bd85657")
datasets(data='metadata', uuid="a6998220-7e3a-485d-9cd6-73076bd85657")
datasets(data='metadata', uuid="a6998220-7e3a-485d-9cd6-73076bd85657",
  id=598)
datasets(data=c('deleted','duplicate'))
datasets(data=c('deleted','duplicate'), limit=1)

# curl options
datasets(data=c('deleted','duplicate'), curlopts = list(verbose=TRUE))

## End(Not run)

Register a derived dataset for citation.

Description

Register a derived dataset for citation.

Usage

derived_dataset(
  citation_data = NULL,
  title = NULL,
  description = NULL,
  source_url = NULL,
  gbif_download_doi = NULL,
  user = NULL,
  pwd = NULL,
  curlopts = list()
)

derived_dataset_prep(
  citation_data = NULL,
  title = NULL,
  description = NULL,
  source_url = NULL,
  gbif_download_doi = NULL,
  user = NULL,
  pwd = NULL,
  curlopts = list()
)

Arguments

citation_data

(required) A data.frame with two columns. The first column should be GBIF datasetkey uuids and the second column should be occurrence counts from each of your datasets, representing the contribution of each dataset to your final derived dataset.

title

(required) The title for your derived dataset.

description

(required) A description of the dataset. Perhaps describing how it was created.

source_url

(required) A link to where the dataset is stored.

gbif_download_doi

(optional) A DOI from an original GBIF download.

user

(required) Your GBIF username.

pwd

(required) Your GBIF password.

curlopts

a list of arguments to pass to curl.

Value

A list.

Usage

Create a citable DOI for a dataset derived from GBIF mediated occurrences.

Use-case (1) your dataset was obtained with occ_search() and never returned a citable DOI, but you want to cite the data in a research paper.

Use-case (2) your dataset was obtained using occ_download() and you got a DOI, but the data underwent extensive filtering using CoordinateCleaner or some other cleaning pipeline. In this case be sure to fill in your original gbif_download_doi.

Use-case (3) your dataset was generated using a GBIF cloud export but you want a DOI to cite in your research paper.

Use derived_dataset to create a custom citable meta-data description and most importantly a DOI link between an external archive (e.g. Zenodo) and the datasets involved in your research or analysis.

All fields (except gbif_download_doi) are required for the registration to work.

We recommend that you run derived_dataset_prep() to check registration details before making it final with derived_dataset().

Authentication

Some rgbif functions require your GBIF credentials.

For the user and pwd parameters, you can set them in one of three ways:

  1. Set them in your .Renviron/.bash_profile (or similar) file with the names GBIF_USER, GBIF_PWD, and GBIF_EMAIL

  2. Set them in your .Rprofile file with the names gbif_user and gbif_pwd.

  3. Simply pass strings to each of the parameters in the function call.

We strongly recommend the first option - storing your details as environment variables - as it's the most widely used way to store secrets.

You can edit your .Renviron with usethis::edit_r_environ().

After editing, your .Renviron file should look something like this...

GBIF_USER="jwaller"
GBIF_PWD="fakepassword123"
GBIF_EMAIL="[email protected]"

See ?Startup for help.

References

https://data-blog.gbif.org/post/derived-datasets/ https://www.gbif.org/derived-dataset/about

Examples

## Not run: 
data <- data.frame(
 datasetKey = c(
 "3ea36590-9b79-46a8-9300-c9ef0bfed7b8",
 "630eb55d-5169-4473-99d6-a93396aeae38",
 "806bf7d4-f762-11e1-a439-00145eb45e9a"),
 count = c(3, 1, 2781)
 )

## If output looks ok, run derived_dataset to register the dataset
 derived_dataset_prep(
 citation_data = data,
 title = "Test for derived dataset",
 description = "This data was filtered using a fake protocol",
 source_url = "https://zenodo.org/record/4246090#.YPGS2OgzZPY"
 )

#  derived_dataset(
#  citation_data = data,
#  title = "Test for derived dataset",
#  description = "This data was filtered using a fake protocol",
#  source_url = "https://zenodo.org/record/4246090#.YPGS2OgzZPY"
#  )

## Example with occ_search and dplyr
# library(dplyr)

# citation_data <- occ_search(taxonKey=212, limit=20)$data %>%
#   group_by(datasetKey) %>% 
#   count()

# # You would still need to upload your data to Zenodo or something similar 
# derived_dataset_prep(
#   citation_data = citation_data,
#   title="Bird data downloaded for test",
#   description="This data was downloaded using rgbif::occ_search and was 
#   later uploaded to Zenodo.",
#   source_url="https://zenodo.org/record/4246090#.YPGS2OgzZPY",
#   gbif_download_doi = NULL,
# )

## End(Not run)

Download predicate DSL (domain specific language)

Description

Download predicate DSL (domain specific language)

Usage

pred(key, value)

pred_gt(key, value)

pred_gte(key, value)

pred_lt(key, value)

pred_lte(key, value)

pred_not(...)

pred_like(key, value)

pred_within(value)

pred_isnull(key)

pred_notnull(key)

pred_or(..., .list = list())

pred_and(..., .list = list())

pred_in(key, value)

pred_default()

Arguments

key

(character) the key for the predicate. See "Keys" below

value

(various) the value for the predicate

..., .list

For pred_or() or pred_and(), one or more objects of class occ_predicate, created by any ⁠pred*⁠ function

predicate methods and their equivalent types

⁠pred*⁠ functions are named for the 'type' of operation they do, following the terminology used by GBIF, see https://www.gbif.org/developer/occurrence#predicates

Function names are given, with the equivalent GBIF type value (e.g., pred_gt and greaterThan)

The following functions take one key and one value:

  • pred: equals

  • pred_lt: lessThan

  • pred_lte: lessThanOrEquals

  • pred_gt: greaterThan

  • pred_gte: greaterThanOrEquals

  • pred_like: like

The following function is only for geospatial queries, and only accepts a WKT string:

  • pred_within: within

The following function is only for stating the you don't want a key to be null, so only accepts one key:

  • pred_notnull: isNotNull

The following function is only for stating that you want a key to be null.

  • pred_isnull : isNull

The following two functions accept multiple individual predicates, separating them by either "and" or "or":

  • pred_and: and

  • pred_or: or

The not predicate accepts one predicate; that is, this negates whatever predicate is passed in, e.g., not the taxonKey of 12345:

  • pred_not: not

The following function is special in that it accepts a single key but many values; stating that you want to search for all the values:

  • pred_in: in

The following function will apply commonly used defaults.

  • pred_default

Using pred_default() is equivalent to running:

  pred_and(
   pred("HAS_GEOSPATIAL_ISSUE",FALSE),
   pred("HAS_COORDINATE",TRUE),
   pred("OCCURRENCE_STATUS","PRESENT"),
   pred_not(pred_in("BASIS_OF_RECORD",
    c("FOSSIL_SPECIMEN","LIVING_SPECIMEN")))
  )

What happens internally

Internally, the input to ⁠pred*⁠ functions turns into JSON to be sent to GBIF. For example ...

pred_in("taxonKey", c(2480946, 5229208)) gives:

{
   "type": "in",
   "key": "TAXON_KEY",
   "values": ["2480946", "5229208"]
 }

pred_gt("elevation", 5000) gives:

{
   "type": "greaterThan",
   "key": "ELEVATION",
   "value": "5000"
}

pred_or(pred("taxonKey", 2977832), pred("taxonKey", 2977901)) gives:

{
  "type": "or",
  "predicates": [
     {
       "type": "equals",
       "key": "TAXON_KEY",
       "value": "2977832"
     },
     {
       "type": "equals",
       "key": "TAXON_KEY",
       "value": "2977901"
     }
  ]
}

Keys

Acceptable arguments to the key parameter are (with the version of the key in parens that must be sent if you pass the query via the body parameter; see below for examples). You can also use the 'ALL_CAPS' version of a key if you prefer. Open an issue in the GitHub repository for this package if you know of a key that should be supported that is not yet.

  • taxonKey (TAXON_KEY)

  • acceptedTaxonKey (ACCEPTED_TAXON_KEY)

  • kingdomKey (KINGDOM_KEY)

  • phylumKey (PHYLUM_KEY)

  • classKey (CLASS_KEY)

  • orderKey (ORDER_KEY)

  • familyKey (FAMILY_KEY)

  • genusKey (GENUS_KEY)

  • subgenusKey (SUBGENUS_KEY)

  • speciesKey (SPECIES_KEY)

  • scientificName (SCIENTIFIC_NAME)

  • country (COUNTRY)

  • publishingCountry (PUBLISHING_COUNTRY)

  • hasCoordinate (HAS_COORDINATE)

  • hasGeospatialIssue (HAS_GEOSPATIAL_ISSUE)

  • typeStatus (TYPE_STATUS)

  • recordNumber (RECORD_NUMBER)

  • lastInterpreted (LAST_INTERPRETED)

  • modified (MODIFIED)

  • continent (CONTINENT)

  • geometry (GEOMETRY)

  • basisOfRecord (BASIS_OF_RECORD)

  • datasetKey (DATASET_KEY)

  • datasetID/datasetId (DATASET_ID)

  • eventDate (EVENT_DATE)

  • catalogNumber (CATALOG_NUMBER)

  • otherCatalogNumbers (OTHER_CATALOG_NUMBERS)

  • year (YEAR)

  • month (MONTH)

  • decimalLatitude (DECIMAL_LATITUDE)

  • decimalLongitude (DECIMAL_LONGITUDE)

  • elevation (ELEVATION)

  • depth (DEPTH)

  • institutionCode (INSTITUTION_CODE)

  • collectionCode (COLLECTION_CODE)

  • issue (ISSUE)

  • mediatype (MEDIA_TYPE)

  • recordedBy (RECORDED_BY)

  • recordedById/recordedByID (RECORDED_BY_ID)

  • establishmentMeans (ESTABLISHMENT_MEANS)

  • coordinateUncertaintyInMeters (COORDINATE_UNCERTAINTY_IN_METERS)

  • gadm (GADM_GID) (for the Database of Global Administrative Areas)

  • level0Gid (GADM_LEVEL_0_GID)

  • level1Gid (GADM_LEVEL_1_GID)

  • level2Gid (GADM_LEVEL_2_GID)

  • level3Gid (GADM_LEVEL_3_GID)

  • stateProvince (STATE_PROVINCE)

  • occurrenceStatus (OCCURRENCE_STATUS)

  • publishingOrg (PUBLISHING_ORG)

  • occurrenceId/occurrenceID (OCCURRENCE_ID)

  • eventId/eventID (EVENT_ID)

  • parentEventId/parentEventID (PARENT_EVENT_ID)

  • identifiedBy (IDENTIFIED_BY)

  • identifiedById/identifiedByID (IDENTIFIED_BY_ID)

  • license (LICENSE)

  • locality(LOCALITY)

  • pathway (PATHWAY)

  • preparations (PREPARATIONS)

  • networkKey (NETWORK_KEY)

  • organismId/organismID (ORGANISM_ID)

  • organismQuantity (ORGANISM_QUANTITY)

  • organismQuantityType (ORGANISM_QUANTITY_TYPE)

  • protocol (PROTOCOL)

  • relativeOrganismQuantity (RELATIVE_ORGANISM_QUANTITY)

  • repatriated (REPATRIATED)

  • sampleSizeUnit (SAMPLE_SIZE_UNIT)

  • sampleSizeValue (SAMPLE_SIZE_VALUE)

  • samplingProtocol (SAMPLING_PROTOCOL)

  • verbatimScientificName (VERBATIM_SCIENTIFIC_NAME)

  • taxonID/taxonId (TAXON_ID)

  • taxonomicStatus (TAXONOMIC_STATUS)

  • waterBody (WATER_BODY)

  • iucnRedListCategory (IUCN_RED_LIST_CATEGORY)

  • degreeOfEstablishment (DEGREE_OF_ESTABLISHMENT)

  • isInCluster (IS_IN_CLUSTER)

  • lifeStage (LIFE_STAGE)

  • distanceFromCentroidInMeters (DISTANCE_FROM_CENTROID_IN_METERS)

  • gbifId (GBIF_ID)

References

Download predicates docs: https://www.gbif.org/developer/occurrence#predicates

See Also

Other downloads: occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

pred("taxonKey", 3119195)
pred_gt("elevation", 5000)
pred_gte("elevation", 5000)
pred_lt("elevation", 1000)
pred_lte("elevation", 1000)
pred_within("POLYGON((-14 42, 9 38, -7 26, -14 42))")
pred_and(pred_within("POLYGON((-14 42, 9 38, -7 26, -14 42))"),
  pred_gte("elevation", 5000))
pred_or(pred_lte("year", 1989), pred("year", 2000))
pred_and(pred_lte("year", 1989), pred("year", 2000))
pred_in("taxonKey", c(2977832, 2977901, 2977966, 2977835))
pred_in("basisOfRecord", c("MACHINE_OBSERVATION", "HUMAN_OBSERVATION"))
pred_not(pred("taxonKey", 729))
pred_like("catalogNumber", "PAPS5-560%")
pred_notnull("issue")
pred("basisOfRecord", "LITERATURE")
pred("hasCoordinate", TRUE)
pred("stateProvince", "California")
pred("hasGeospatialIssue", FALSE)
pred_within("POLYGON((-14 42, 9 38, -7 26, -14 42))")
pred_or(pred("taxonKey", 2977832), pred("taxonKey", 2977901),
  pred("taxonKey", 2977966))
pred_in("taxonKey", c(2977832, 2977901, 2977966, 2977835))

Downloads interface

Description

GBIF provides two ways to get occurrence data: through the ⁠/occurrence/search⁠ route (see occ_search()), or via the ⁠/occurrence/download⁠ route (many functions, see below). occ_search() is more appropriate for smaller data, while ⁠occ_download*()⁠ functions are more appropriate for larger data requests.

Settings

You'll use occ_download() to kick off a download. You'll need to give that function settings from your GBIF profile: your user name, your password, and your email. These three settings are required to use the function. You can specify them in one of three ways:

  • Pass them to occ_download as parameters

  • Use R options: As options either in the current R session using the options() function, or by setting them in your .Rprofile file, after which point they'll be read in automatically

  • Use environment variables: As env vars either in the current R session using the Sys.setenv() function, or by setting them in your .Renviron/.bash_profile or similar files, after which point they'll be read in automatically

BEWARE

You can not perform that many downloads, so plan wisely. See Rate limiting below.

Rate limiting

If you try to launch too many downloads, you will receive an 420 "Enhance Your Calm" response. If there is less then 100 in total across all GBIF users, then you can have 3 running at a time. If there are more than that, then each user is limited to 1 only. These numbers are subject to change.

Functions

Download query composer methods:

See download_predicate_dsl

Query length

GBIF has a limit of 12,000 characters for a download query. This means that you can have a pretty long query, but at some point it may lead to an error on GBIF's side and you'll have to split your query into a few.

Download status

The following statuses can be found with any download:

  • PREPARING: just submitted by user and awaiting processing (typically only a few seconds)

  • RUNNING: being created (takes typically 1-15 minutes)

  • FAILED: something unexpected went wrong

  • KILLED: user decided to abort the job while it was in PREPARING or RUNNING phase

  • SUCCEEDED: The download was created and the user was informed

  • FILE_ERASED: The download was deleted according to the retention policy, see https://www.gbif.org/faq?question=for-how-long-will-does-gbif-store-downloads


Get elevation for lat/long points from a data.frame or list of points.

Description

Uses the GeoNames web service

Usage

elevation(
  input = NULL,
  latitude = NULL,
  longitude = NULL,
  latlong = NULL,
  elevation_model = "srtm3",
  username = Sys.getenv("GEONAMES_USER"),
  key,
  curlopts,
  ...
)

Arguments

input

A data.frame of lat/long data. There must be columns decimalLatitude and decimalLongitude.

latitude

A vector of latitude's. Must be the same length as the longitude vector.

longitude

A vector of longitude's. Must be the same length as the latitude vector.

latlong

A vector of lat/long pairs. See examples.

elevation_model

(character) one of srtm3 (default), srtm1, astergdem, or gtopo30. See "Elevation models" below for more

username

(character) Required. An GeoNames user name. See Details.

key, curlopts

defunct. see docs

...

curl options passed on to crul::verb-GET see curl::curl_options() for curl options

Value

A new column named elevation_geonames in the supplied data.frame or a vector with elevation of each location in meters. Note that data from GBIF can already have a column named elevation, thus the column we add is named differently.

GeoNames user name

To get a GeoNames user name, register for an account at http://www.geonames.org/login - then you can enable your account for the GeoNames webservice on your account page (http://www.geonames.org/manageaccount). Once you are enabled to use the webservice, you can pass in your username to the username parameter. Better yet, store your username in your .Renviron file, or similar (e.g., .zshrc or .bash_profile files) and read it in via Sys.getenv() as in the examples below. By default we do Sys.getenv("GEONAMES_USER") for the username parameter.

Elevation models

  • srtm3:

    • sample area: ca 90m x 90m

    • result: a single number giving the elevation in meters according to srtm3, ocean areas have been masked as "no data" and have been assigned a value of -32768

  • srtm1:

    • sample area: ca 30m x 30m

    • result: a single number giving the elevation in meters according to srtm1, ocean areas have been masked as "no data" and have been assigned a value of -32768

  • astergdem (Aster Global Digital Elevation Model V2 2011):

    • sample area: ca 30m x 30m, between 83N and 65S latitude

    • result: a single number giving the elevation in meters according to aster gdem, ocean areas have been masked as "no data" and have been assigned a value of -32768

  • gtopo30:

    • sample area: ca 1km x 1km

    • result: a single number giving the elevation in meters according to gtopo30, ocean areas have been masked as "no data" and have been assigned a value of -9999

References

GeoNames http://www.geonames.org/export/web-services.html

Examples

## Not run: 
user <- Sys.getenv("GEONAMES_USER")

occ_key <- name_suggest('Puma concolor')$key[1]
dat <- occ_search(taxonKey = occ_key, limit = 300, hasCoordinate = TRUE)
head( elevation(dat$data, username = user) )

# Pass in a vector of lat's and a vector of long's
elevation(latitude = dat$data$decimalLatitude[1:10],
  longitude = dat$data$decimalLongitude[1:10],
  username = user, verbose = TRUE)

# Pass in lat/long pairs in a single vector
pairs <- list(c(31.8496,-110.576060), c(29.15503,-103.59828))
elevation(latlong=pairs, username = user)

# Pass on curl options
pairs <- list(c(31.8496,-110.576060), c(29.15503,-103.59828))
elevation(latlong=pairs, username = user, verbose = TRUE)

# different elevation models
lats <- dat$data$decimalLatitude[1:5]
lons <- dat$data$decimalLongitude[1:5]
elevation(latitude = lats, longitude = lons, elevation_model = "srtm3")
elevation(latitude = lats, longitude = lons, elevation_model = "srtm1")
elevation(latitude = lats, longitude = lons, elevation_model = "astergdem")
elevation(latitude = lats, longitude = lons, elevation_model = "gtopo30")

## End(Not run)

Enumerations.

Description

Many parts of the GBIF API make use of enumerations, i.e. controlled vocabularies for specific topics - and are available via these functions

Usage

enumeration(x = NULL, curlopts = list())

enumeration_country(curlopts = list())

Arguments

x

A given enumeration.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

enumeration returns a character vector, while enumeration_country returns a data.frame.

Examples

## Not run: 
# basic enumeration
enumeration()
enumeration("NameType")
enumeration("MetadataType")
enumeration("TypeStatus")

# country enumeration
enumeration_country()

# curl options
enumeration(curlopts = list(verbose=TRUE))

## End(Not run)

Convert a bounding box to a Well Known Text polygon, and a WKT to a bounding box

Description

Convert a bounding box to a Well Known Text polygon, and a WKT to a bounding box

Usage

gbif_bbox2wkt(minx = NA, miny = NA, maxx = NA, maxy = NA, bbox = NULL)

gbif_wkt2bbox(wkt = NULL)

Arguments

minx

(numeric) Minimum x value, or the most western longitude

miny

(numeric) Minimum y value, or the most southern latitude

maxx

(numeric) Maximum x value, or the most eastern longitude

maxy

(numeric) Maximum y value, or the most northern latitude

bbox

(numeric) A vector of length 4, with the elements: minx, miny, maxx, maxy

wkt

(character) A Well Known Text object.

Value

gbif_bbox2wkt returns an object of class charactere, a Well Known Text string of the form 'POLYGON((minx miny, maxx miny, maxx maxy, minx maxy, minx miny))'.

gbif_wkt2bbox returns a numeric vector of length 4, like c(minx, miny, maxx, maxy)

Examples

## Not run: 
# Convert a bounding box to a WKT
## Pass in a vector of length 4 with all values
gbif_bbox2wkt(bbox=c(-125.0,38.4,-121.8,40.9))

## Or pass in each value separately
gbif_bbox2wkt(minx=-125.0, miny=38.4, maxx=-121.8, maxy=40.9)

# Convert a WKT object to a bounding box
wkt <- "POLYGON((-125 38.4,-125 40.9,-121.8 40.9,-121.8 38.4,-125 38.4))"
gbif_wkt2bbox(wkt)

## End(Not run)

Get citation for datasets used

Description

Get citation for datasets used

Usage

gbif_citation(x)

Arguments

x

(character) Result of call to occ_download_get(), occ_download_meta().

Details

The function is deprecated for use with occ_search() and occ_data() results, and is deprecated for use with datasetKeys and gbifids. Instead, we encourage you to use derived_dataset() instead.

occ_download_get() and occ_download_meta() results are still supported.

Value

list with S3 class assigned, used by a print method to pretty print citation information. Though you can unclass the output or just index to the named items as needed.

Examples

## Not run: 
# Downloads
## occ_download_get()
# d1 <- occ_download(pred("country", "BG"), pred_gte("year", 2020))
# occ_download_meta(d1) # wait until status = succeeded
# d1 <- occ_download_get(d1, overwrite = TRUE)
# gbif_citation(d1)

## occ_download_meta()
# key <- "0000122-171020152545675"
# res <- occ_download_meta(key)
# gbif_citation(res)

## End(Not run)

Geocode lat-lon point(s) with GBIF's set of geo-polygons (experimental)

Description

Geocode lat-lon point(s) with GBIF's set of geo-polygons (experimental)

Usage

gbif_geocode(latitude = NULL, longitude = NULL)

Arguments

latitude

a vector of numeric latitude values between -90 and 90.

longitude

a vector of numeric longitude values between -180 and 180.

Value

A data.frame of results from the GBIF gecoding service.

  • latitude : The input latitude

  • longitude : The input longitude

  • index : The original input rownumber

  • id : The polygon id from which the geocode comes from

  • type : One of the following : "Political" (county codes), "IHO" (marine regions), "SeaVox" (marine regions), "WGSRPD" (tdwg regions), "EEZ", (in national waters) or "GADM0","GADM1","GADM2","GADM2"(http://gadm.org/)

  • title : The name of the source polygon

  • distance : distance to the polygon boarder

This function uses the GBIF geocoder API which is not guaranteed to be stable and is undocumented. As such, this may return different data over time, may be rate-limited or may stop working if GBIF change the service. Use this function with caution.

References

http://gadm.org/ http://marineregions.org/ http://www.tdwg.org/standards/ http://api.gbif.org/v1/geocode/reverse?lat=0&lng=0

Examples

## Not run: 
# one pair 
gbif_geocode(0,0)
# or multiple pairs of points
gbif_geocode(c(0,50),c(0,20))


## End(Not run)

List all GBIF issues and their codes.

Description

Returns a data.frame of all GBIF issues with the following columns:

  • code: issue short code, e.g. gass84

  • code: issue full name, e.g. GEODETIC_DATUM_ASSUMED_WGS84

  • description: issue description

  • type: issue type, either related to occurrence or name

Usage

gbif_issues()

Source

https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/OccurrenceIssue.html https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/NameUsageIssue.html


Lookup issue definitions and short codes

Description

Lookup issue definitions and short codes

Usage

gbif_issues_lookup(issue = NULL, code = NULL)

Arguments

issue

Full name of issue, e.g, CONTINENT_COUNTRY_MISMATCH

code

An issue short code, e.g. 'ccm'

Examples

gbif_issues_lookup(issue = 'CONTINENT_COUNTRY_MISMATCH')
gbif_issues_lookup(code = 'ccm')
gbif_issues_lookup(issue = 'COORDINATE_INVALID')
gbif_issues_lookup(code = 'cdiv')

View highlighted terms in name results from GBIF.

Description

View highlighted terms in name results from GBIF.

Usage

gbif_names(input, output = NULL, browse = TRUE)

Arguments

input

Input output from occ_search

output

Output folder path. If not given uses temporary folder.

browse

(logical) Browse output (default: TRUE)

Examples

## Not run: 
# browse=FALSE returns path to file
gbif_names(name_lookup(query='snake', hl=TRUE), browse=FALSE)

(out <- name_lookup(query='canada', hl=TRUE, limit=5))
gbif_names(out)
gbif_names(name_lookup(query='snake', hl=TRUE))
gbif_names(name_lookup(query='bird', hl=TRUE))

# or not highlight
gbif_names(name_lookup(query='bird', limit=200))

## End(Not run)

GBIF registry data via OAI-PMH

Description

GBIF registry data via OAI-PMH

Usage

gbif_oai_identify(...)

gbif_oai_list_identifiers(
  prefix = "oai_dc",
  from = NULL,
  until = NULL,
  set = NULL,
  token = NULL,
  as = "df",
  ...
)

gbif_oai_list_records(
  prefix = "oai_dc",
  from = NULL,
  until = NULL,
  set = NULL,
  token = NULL,
  as = "df",
  ...
)

gbif_oai_list_metadataformats(id = NULL, ...)

gbif_oai_list_sets(token = NULL, as = "df", ...)

gbif_oai_get_records(ids, prefix = "oai_dc", as = "parsed", ...)

Arguments

...

Curl options passed on to httr::GET

prefix

(character) A string to specify the metadata format in OAI-PMH requests issued to the repository. The default ("oai_dc") corresponds to the mandatory OAI unqualified Dublin Core metadata schema.

from

(character) string giving datestamp to be used as lower bound for datestamp-based selective harvesting (i.e., only harvest records with datestamps in the given range). Dates and times must be encoded using ISO 8601. The trailing Z must be used when including time. OAI-PMH implies UTC for data/time specifications.

until

(character) Datestamp to be used as an upper bound, for datestamp-based selective harvesting (i.e., only harvest records with datestamps in the given range).

set

(character) A set to be used for selective harvesting (i.e., only harvest records in the given set).

token

(character) a token previously provided by the server to resume a request where it last left off. 50 is max number of records returned. We will loop for you internally to get all the records you asked for.

as

(character) What to return. One of "df" (for data.frame; default), "list" (get a list), or "raw" (raw text). For gbif_oai_get_records, one of "parsed" or "raw"

id, ids

(character) The OAI-PMH identifier for the record. Optional.

Details

These functions only work with GBIF registry data, and do so via the OAI-PMH protocol (https://www.openarchives.org/OAI/openarchivesprotocol.html)

Value

raw text, list or data.frame, depending on requested output via as parameter

Examples

## Not run: 
gbif_oai_identify()

today <- format(Sys.Date(), "%Y-%m-%d")
gbif_oai_list_identifiers(from = today)
gbif_oai_list_identifiers(set = "country:NL")

gbif_oai_list_records(from = today)
gbif_oai_list_records(set = "country:NL")

gbif_oai_list_metadataformats()
gbif_oai_list_metadataformats(id = "9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb")

gbif_oai_list_sets()
gbif_oai_list_sets(as = "list")

gbif_oai_get_records("9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb")
ids <- c("9c4e36c1-d3f9-49ce-8ec1-8c434fa9e6eb",
         "e0f1bb8a-2d81-4b2a-9194-d92848d3b82e")
gbif_oai_get_records(ids)

## End(Not run)

View photos from GBIF.

Description

View photos from GBIF.

Usage

gbif_photos(input, output = NULL, which = "table", browse = TRUE)

Arguments

input

Input output from occ_search

output

Output folder path. If not given uses temporary folder.

which

One of map or table (default).

browse

(logical) Browse output (default: TRUE)

Details

The max number of photos you can see when which="map" is ~160, so cycle through if you have more than that.

BEWARE

The maps in the table view may not show up correctly if you are using RStudio

Examples

## Not run: 
res <- occ_search(mediaType = 'StillImage', limit = 100)
gbif_photos(res)
gbif_photos(res, which='map')

res <- occ_search(scientificName = "Aves", mediaType = 'StillImage',
  limit=150)
gbif_photos(res)
gbif_photos(res, output = '~/barfoo')

## End(Not run)

Installations metadata.

Description

Installations metadata.

Usage

installations(
  data = "all",
  uuid = NULL,
  query = NULL,
  identifier = NULL,
  identifierType = NULL,
  limit = 100,
  start = NULL,
  curlopts = list()
)

Arguments

data

The type of data to get. One or more of: 'contact', 'endpoint', 'dataset', 'comment', 'deleted', 'nonPublishing', or the special 'all'. Default: 'all'

uuid

UUID of the data node provider. This must be specified if data is anything other than 'all'.

query

Query nodes. Only used when data='all'. Ignored otherwise.

identifier

The value for this parameter can be a simple string or integer, e.g. identifier=120. This parameter doesn't seem to work right now.

identifierType

Used in combination with the identifier parameter to filter identifiers by identifier type. See details. This parameter doesn't seem to work right now.

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

identifierType options:

  • DOI No description.

  • FTP No description.

  • GBIF_NODE Identifies the node (e.g: DK for Denmark, sp2000 for Species 2000).

  • GBIF_PARTICIPANT Participant identifier from the GBIF IMS Filemaker system.

  • GBIF_PORTAL Indicates the identifier originated from an auto_increment column in the portal.data_provider or portal.data_resource table respectively.

  • HANDLER No description.

  • LSID Reference controlled by a separate system, used for example by DOI.

  • SOURCE_ID No description.

  • UNKNOWN No description.

  • URI No description.

  • URL No description.

  • UUID No description.

References

https://www.gbif.org/developer/registry#installations

Examples

## Not run: 
installations(limit=5)
installations(query="france", limit = 25)
installations(uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4")
installations(data='contact', uuid="2e029a0c-87af-42e6-87d7-f38a50b78201")
installations(data='endpoint', uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4")
installations(data='dataset', uuid="b77901f9-d9b0-47fa-94e0-dd96450aa2b4")
installations(data='deleted', limit = 25)
installations(data='deleted', limit=2)
installations(data=c('deleted','nonPublishing'), limit=2)
installations(identifierType='DOI', limit=2)

# Pass on curl options
installations(data='deleted', curlopts = list(verbose=TRUE))

## End(Not run)

Fetch maps of GBIF occurrences

Description

This function is a wrapper for the GBIF mapping api version 2.0. The mapping API is a web map tile service making it straightforward to visualize GBIF content on interactive maps, and overlay content from other sources. It returns tile maps with number of GBIF records per area unit that can be used in a variety of ways, for example in interactive leaflet web maps. Map details are specified by a number of query parameters, some of them optional. Full documentation of the GBIF mapping api can be found at https://www.gbif.org/developer/maps

Usage

map_fetch(
  source = "density",
  x = 0:1,
  y = 0,
  z = 0,
  format = "@1x.png",
  srs = "EPSG:4326",
  bin = NULL,
  hexPerTile = NULL,
  squareSize = NULL,
  style = NULL,
  taxonKey = NULL,
  datasetKey = NULL,
  country = NULL,
  publishingOrg = NULL,
  publishingCountry = NULL,
  year = NULL,
  basisOfRecord = NULL,
  return = "png",
  base_style = NULL,
  plot_terra = TRUE,
  curlopts = list(http_version = 2),
  ...
)

Arguments

source

(character) Either density for fast, precalculated tiles, or adhoc for any search. Default: density

x

(integer sequence) the column. Default: 0:1

y

(integer sequence) the row. Default: 0

z

(integer) the zoom. Default: 0

format

(character) The data format, one of:

  • ⁠@Hx.png⁠ for a 256px raster tile

  • ⁠@1x.png⁠ for a 512px raster tile (the default)

  • ⁠@2x.png⁠ for a 1024px raster tile

  • ⁠@3x.png⁠ for a 2048px raster tile

  • ⁠@4x.png⁠ for a 4096px raster tile

srs

(character) Spatial reference system. One of:

  • EPSG:3857 (Web Mercator)

  • EPSG:4326 (WGS84 plate care?)

  • EPSG:3575 (Arctic LAEA on 10 degrees E)

  • EPSG:3031 (Antarctic stereographic)

bin

(character) square or hex to aggregate occurrence counts into squares or hexagons. Points by default.

hexPerTile

(integer) sets the size of the hexagons (the number horizontally across a tile).

squareSize

(integer) sets the size of the squares. Choose a factor of 4096 so they tessalate correctly: probably from 8, 16, 32, 64, 128, 256, 512.

style

(character) for raster tiles, choose from the available styles. Defaults to classic.point for source="density" and "scaled.circle" for source="adhoc".

taxonKey

(integer/numeric/character) search by taxon key, can only supply 1.

datasetKey

(character) search by taxon key, can only supply 1.

country

(character) search by taxon key, can only supply 1.

publishingOrg

(character) search by taxon key, can only supply 1.

publishingCountry

(character) search by taxon key, can only supply 1.

year

(integer) integer that limits the search to a certain year or, if passing a vector of integers, multiple years, for example 1984 or c(2016, 2017, 2018) or 2010:2015 (years 2010 to 2015). optional

basisOfRecord

(character) one or more basis of record states to include records with that basis of record. The full list is: c("OBSERVATION", "HUMAN_OBSERVATION", "MACHINE_OBSERVATION", "MATERIAL_SAMPLE", "PRESERVED_SPECIMEN", "FOSSIL_SPECIMEN", "LIVING_SPECIMEN", "LITERATURE", "UNKNOWN").

return

(character) Either "png" or "terra".

base_style

(character) The style of the base map.

plot_terra

(logical) Set whether the terra map be default plotted.

curlopts

options passed on to crul::HttpClient

...

additional arguments passed to the adhoc interface.

Details

The default settings, return='png', will return a magick-image png. This image will be a composite image of the the occurrence tiles fetched and a base map. This map is primarily useful as a high quality image of occurrence records.

The args x and y can both be integer sequences. For example, x=0:3 or y=0:1. Note that the tile index starts at 0. Higher values of z, will will produce more tiles that can be fetched and stitched together. Selecting a too high value for x or y will produce a blank image.

Setting return='terra' will return a terra::SpatRaster object. This is primarily useful if you were interested in the underlying aggregated occurrence density data.

See the article

Value

a magick-image or terra::SpatRaster object.

Author(s)

John Waller and Laurens Geffert [email protected]

References

https://www.gbif.org/developer/maps

https://api.gbif.org/v2/map/demo.html

https://api.gbif.org/v2/map/demo13.html

See Also

mvt_fetch()

Examples

## Not run: 

# all occurrences
map_fetch()
# get artic map
map_fetch(srs='EPSG:3031') 
# only preserved specimens
map_fetch(basisOfRecord="PRESERVED_SPECIMEN")

# Map of occ in Great Britain
map_fetch(z=3,y=1,x=7:8,country="GB")
# Peguins with artic projection
map_fetch(srs='EPSG:3031',taxonKey=2481660,style='glacier.point', 
base_style="gbif-dark")

# occ from a long time ago
map_fetch(year=1600) 
# polygon style 
map_fetch(style="iNaturalist.poly",bin="hex")
# iNaturalist dataset plotted 
map_fetch(datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7",
  style="iNaturalist.poly")
 
# use source="adhoc" for more filters
map_fetch(z=1,
  source="adhoc",
  iucn_red_list_category="CR",
  style="scaled.circles",
  base_style='gbif-light')

# cropped map of Hawaii
map_fetch(z=5,x=3:4,y=12,source="adhoc",gadmGid="USA.12_1")



## End(Not run)

Fetch Map Vector Tiles (MVT)

Description

This function is a wrapper for the GBIF mapping api version 2.0. The mapping API is a web map tile service making it straightforward to visualize GBIF content on interactive maps, and overlay content from other sources. It returns maps vector tiles with number of GBIF records per area unit that can be used in a variety of ways, for example in interactive leaflet web maps. Map details are specified by a number of query parameters, some of them optional. Full documentation of the GBIF mapping api can be found at https://www.gbif.org/developer/maps

Usage

mvt_fetch(
  source = "density",
  x = 0,
  y = 0,
  z = 0,
  srs = "EPSG:4326",
  bin = NULL,
  hexPerTile = NULL,
  squareSize = NULL,
  style = "classic.point",
  taxonKey = NULL,
  datasetKey = NULL,
  country = NULL,
  publishingOrg = NULL,
  publishingCountry = NULL,
  year = NULL,
  basisOfRecord = NULL,
  ...
)

Arguments

source

(character) Either density for fast, precalculated tiles, or adhoc for any search. Default: density

x

(integer) the column. Default: 0

y

(integer) the row. Default: 0

z

(integer) the zoom. Default: 0

srs

(character) Spatial reference system for the output (input srs for mvt from GBIF is always EPSG:3857). One of:

  • EPSG:3857 (Web Mercator)

  • EPSG:4326 (WGS84 plate care?)

  • EPSG:3575 (Arctic LAEA on 10 degrees E)

  • EPSG:3031 (Antarctic stereographic)

bin

(character) square or hex to aggregate occurrence counts into squares or hexagons. Points by default. optional

hexPerTile

(integer) sets the size of the hexagons (the number horizontally across a tile). optional

squareSize

(integer) sets the size of the squares. Choose a factor of 4096 so they tessalate correctly: probably from 8, 16, 32, 64, 128, 256, 512. optional

style

(character) for raster tiles, choose from the available styles. Defaults to classic.point. optional. THESE DON'T WORK YET.

taxonKey

(integer/numeric/character) search by taxon key, can only supply 1. optional

datasetKey

(character) search by taxon key, can only supply 1. optional

country

(character) search by taxon key, can only supply 1. optional

publishingOrg

(character) search by taxon key, can only supply 1. optional

publishingCountry

(character) search by taxon key, can only supply 1. optional

year

(integer) integer that limits the search to a certain year or, if passing a vector of integers, multiple years, for example 1984 or c(2016, 2017, 2018) or 2010:2015 (years 2010 to 2015). optional

basisOfRecord

(character) one or more basis of record states to include records with that basis of record. The full list is: c("OBSERVATION", "HUMAN_OBSERVATION", "MACHINE_OBSERVATION", "MATERIAL_SAMPLE", "PRESERVED_SPECIMEN", "FOSSIL_SPECIMEN", "LIVING_SPECIMEN", "LITERATURE", "UNKNOWN"). optional

...

curl options passed on to crul::HttpClient

Details

This function uses the arguments passed on to generate a query to the GBIF web map API. The API returns a web tile object as png that is read and converted into an R raster object. The break values or nbreaks generate a custom colour palette for the web tile, with each bin corresponding to one grey value. After retrieval, the raster is reclassified to the actual break values. This is a somewhat hacky but nonetheless functional solution in the absence of a GBIF raster API implementation.

We add extent and set the projection for the output. You can reproject after retrieving the output.

Value

an sf object

References

https://www.gbif.org/developer/maps

See Also

map_fetch()

Examples

## Not run: 
if (
 requireNamespace("sf", quietly = TRUE) &&
 requireNamespace("protolite", quietly = TRUE)
) {
  x <- mvt_fetch(taxonKey = 2480498, year = 2007:2011)
  x
  
  # gives an sf object
  class(x)
  
  # different srs
  ## 3857
  y <- mvt_fetch(taxonKey = 2480498, year = 2010, srs = "EPSG:3857")
  y
  ## 3031
  z <- mvt_fetch(taxonKey = 2480498, year = 2010, srs = "EPSG:3031", verbose = TRUE)
  z
  # 3575
  z <- mvt_fetch(taxonKey = 2480498, year = 2010, srs = "EPSG:3575")
  z

  # bin
  x <- mvt_fetch(taxonKey = 212, year = 1998, bin = "hex",
     hexPerTile = 30, style = "classic-noborder.poly")
  x

  # query with basisOfRecord
  mvt_fetch(taxonKey = 2480498, year = 2010,
    basisOfRecord = "HUMAN_OBSERVATION")
  mvt_fetch(taxonKey = 2480498, year = 2010,
    basisOfRecord = c("HUMAN_OBSERVATION", "LIVING_SPECIMEN"))
 }

## End(Not run)

Lookup names in the GBIF backbone taxonomy.

Description

Lookup names in the GBIF backbone taxonomy.

Usage

name_backbone(
  name,
  rank = NULL,
  kingdom = NULL,
  phylum = NULL,
  class = NULL,
  order = NULL,
  family = NULL,
  genus = NULL,
  strict = FALSE,
  verbose = FALSE,
  start = NULL,
  limit = 100,
  curlopts = list()
)

name_backbone_verbose(
  name,
  rank = NULL,
  kingdom = NULL,
  phylum = NULL,
  class = NULL,
  order = NULL,
  family = NULL,
  genus = NULL,
  strict = FALSE,
  start = NULL,
  limit = 100,
  curlopts = list()
)

Arguments

name

(character) Full scientific name potentially with authorship (required)

rank

(character) The rank given as our rank enum. (optional)

kingdom

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

phylum

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

class

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

order

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

family

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

genus

(character) If provided default matching will also try to match against this if no direct match is found for the name alone. (optional)

strict

(logical) If TRUE it (fuzzy) matches only the given name, but never a taxon in the upper classification (optional)

verbose

(logical) should the function give back more (less reliable) results. See function name_backbone_verbose()

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

limit

Number of records to return. Default: 100. Maximum: 1000.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

If you don't get a match, GBIF gives back a data.frame with columns synonym, confidence, and matchType='NONE'.

Value

For name_backbone, a data.frame for a single taxon with many columns. For name_backbone_verbose, a larger number of results in a data.frame the results of resulting from fuzzy matching. You will also get back your input name, rank, kingdom, phylum ect. as columns input_name, input_rank, input_kingdom ect. so you can check the results.

References

https://www.gbif.org/developer/species#searching

Examples

## Not run: 
name_backbone(name='Helianthus annuus', kingdom='plants')
name_backbone(name='Helianthus', rank='genus', kingdom='plants')
name_backbone(name='Poa', rank='genus', family='Poaceae')

# Verbose - gives back alternatives
## Strictness
name_backbone_verbose(name='Poa', kingdom='plants',
  strict=FALSE)
name_backbone_verbose(name='Helianthus annuus', kingdom='plants',
  strict=TRUE)

# Non-existent name - returns list of lenght 3 stating no match
name_backbone(name='Aso')
name_backbone(name='Oenante')

# Pass on curl options
name_backbone(name='Oenante', curlopts = list(verbose=TRUE))

## End(Not run)

Lookup names in the GBIF backbone taxonomy in a checklist.

Description

Lookup names in the GBIF backbone taxonomy in a checklist.

Usage

name_backbone_checklist(
  name_data = NULL,
  rank = NULL,
  kingdom = NULL,
  phylum = NULL,
  class = NULL,
  order = NULL,
  family = NULL,
  genus = NULL,
  strict = FALSE,
  verbose = FALSE,
  curlopts = list()
)

Arguments

name_data

(data.frame or vector) see details.

rank

(character) default value (optional).

kingdom

(character) default value (optional).

phylum

(character) default value (optional).

class

(character) default value (optional).

order

(character) default value (optional).

family

(character) default value (optional).

genus

(character) default value (optional).

strict

(logical) strict=TRUE will not attempt to fuzzy match or return higherrankmatches.

verbose

(logical) If true it shows alternative matches which were considered but then rejected.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

This function is an alternative for name_backbone(), which will work with a list of names (a vector or a data.frame). The data.frame should have the following column names, but only the 'name' column is required. If only one column is present, then that column is assumed to be the 'name' column.

  • name : (required)

  • rank : (optional)

  • kingdom : (optional)

  • phylum : (optional)

  • class : (optional)

  • order : (optional)

  • family : (optional)

  • genus : (optional)

The input columns will be returned as "verbatim_name","verbatim_rank", "verbatim_phylum" ect. A column of "verbatim_index" will also be returned giving the index of the input.

The following aliases for the 'name' column will work (any case or with '_' will work) :

  • "scientificName", "ScientificName", "scientific_name" ...

  • "sci_name", "sciname", "SCI_NAME" ...

  • "names", "NAMES" ...

  • "species", "SPECIES" ...

  • "species_name", "speciesname" ...

  • "sp_name", "SP_NAME", "spname" ...

  • "taxon_name", "taxonname", "TAXON NAME" ...

If more than one aliases is present and no column is named 'name', then the left-most column with an acceptable aliased name above is used.

If verbose=TRUE, a column called is_alternative will be returned, which species if a name was originally a first choice or not. is_alternative=TRUE means the name was not is not considered to be the best match by GBIF.

Default values for rank, kingdom, phylum, class, order, family, and genus can can be supplied. If a default value is supplied, the values for these fields are ignored in name_data, and the default value is used instead. This is most useful if you have a list of names and you know they are all plants, insects, birds, ect. You can also input multiple values, if they are the same length as list of names you are trying to match.

This function can also be used with a character vector of names. In that case no column names are needed of course.

This function is very similar to the GBIF species-lookup tool. https://www.gbif.org/tools/species-lookup.

If you have 1000s of names to match, it can take some minutes to get back all of the matches. I have tested it with 60K names. Scientific names with author details usually get better matches.

See also article Working With Taxonomic Names.

Value

A data.frame of matched names.

Examples

## Not run: 

library(rgbif)

name_data <- data.frame(
 scientificName = c(
   "Cirsium arvense (L.) Scop.", # a plant
   "Calopteryx splendens (Harris, 1780)", # an insect
   "Puma concolor (Linnaeus, 1771)", # a big cat
   "Ceylonosticta alwisi (Priyadarshana & Wijewardhane, 2016)", # newly discovered insect 
   "Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat
   "Fake species (John Waller 2021)", # a fake species
   "Calopteryx" # Just a Genus   
 ), description = c(
   "a plant",
   "an insect",
   "a big cat",
   "newly discovered insect",
   "a mis-spelled big cat",
   "a fake species",
   "just a GENUS"
 ), 
 kingdom = c(
   "Plantae",
   "Animalia",
   "Animalia",
   "Animalia",
   "Animalia",
   "Johnlia",
   "Animalia"
 ))

name_backbone_checklist(name_data)

# return more than 1 result per name
name_backbone_checklist(name_data,verbose=TRUE) 

# works with just vectors too 
name_list <- c(
"Cirsium arvense (L.) Scop.", 
"Calopteryx splendens (Harris, 1780)", 
"Puma concolor (Linnaeus, 1771)", 
"Ceylonosticta alwisi (Priyadarshana & Wijewardhane, 2016)", 
"Puma concuolor", 
"Fake species (John Waller 2021)", 
"Calopteryx")

name_backbone_checklist(name_list)
name_backbone_checklist(name_list,verbose=TRUE)
name_backbone_checklist(name_list,strict=TRUE) 

# default values
name_backbone_checklist(c("Aloe arborecens Mill.",
"Cirsium arvense (L.) Scop."),kingdom="Plantae")
name_backbone_checklist(c("Aloe arborecens Mill.",
"Calopteryx splendens (Harris, 1780)"),kingdom=c("Plantae","Animalia"))


## End(Not run)

Parse and examine further GBIF name issues on a dataset.

Description

Parse and examine further GBIF name issues on a dataset.

Usage

name_issues(.data, ..., mutate = NULL)

Arguments

.data

Output from a call to name_usage()

...

Named parameters to only get back (e.g. bbmn), or to remove (e.g. -bbmn).

mutate

(character) One of:

  • split Split issues into new columns.

  • expand Expand issue abbreviated codes into descriptive names. for downloads datasets, this is not super useful since the issues come to you as expanded already.

  • split_expand Split into new columns, and expand issue names.

For split and split_expand, values in cells become y ("yes") or n ("no")

References

https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/NameUsageIssue.html

Examples

## Not run: 
# what do issues mean, can print whole table
head(gbif_issues())
# or just name related issues
gbif_issues()[which(gbif_issues()$type %in% c("name")),]
# or search for matches
gbif_issues()[gbif_issues()$code %in% c('bbmn','clasna','scina'),]
# compare out data to after name_issues use
(aa <- name_usage(name = "Lupus"))
aa %>% name_issues("clasna")

## or parse issues in various ways
### remove data rows with certain issue classes
aa %>% name_issues(-clasna, -scina)

### expand issues to more descriptive names
aa %>% name_issues(mutate = "expand")

### split and expand
aa %>% name_issues(mutate = "split_expand")

### split, expand, and remove an issue class
aa %>% name_issues(-bbmn, mutate = "split_expand")

## Or you can use name_issues without %>%
name_issues(aa, -bbmn, mutate = "split_expand")

## End(Not run)

Lookup names in all taxonomies in GBIF.

Description

This service uses fuzzy lookup so that you can put in partial names and you should get back those things that match. See examples below.

Faceting: If facet=FALSE or left to the default (NULL), no faceting is done. And therefore, all parameters with facet in their name are ignored (facetOnly, facetMincount, facetMultiselect).

Usage

name_lookup(
  query = NULL,
  rank = NULL,
  higherTaxonKey = NULL,
  status = NULL,
  isExtinct = NULL,
  habitat = NULL,
  nameType = NULL,
  datasetKey = NULL,
  origin = NULL,
  nomenclaturalStatus = NULL,
  limit = 100,
  start = 0,
  facet = NULL,
  facetMincount = NULL,
  facetMultiselect = NULL,
  type = NULL,
  hl = NULL,
  issue = NULL,
  constituentKey = NULL,
  verbose = FALSE,
  return = NULL,
  curlopts = list()
)

Arguments

query

Query term(s) for full text search.

rank

CLASS, CULTIVAR, CULTIVAR_GROUP, DOMAIN, FAMILY, FORM, GENUS, INFORMAL, INFRAGENERIC_NAME, INFRAORDER, INFRASPECIFIC_NAME, INFRASUBSPECIFIC_NAME, KINGDOM, ORDER, PHYLUM, SECTION, SERIES, SPECIES, STRAIN, SUBCLASS, SUBFAMILY, SUBFORM, SUBGENUS, SUBKINGDOM, SUBORDER, SUBPHYLUM, SUBSECTION, SUBSERIES, SUBSPECIES, SUBTRIBE, SUBVARIETY, SUPERCLASS, SUPERFAMILY, SUPERORDER, SUPERPHYLUM, SUPRAGENERIC_NAME, TRIBE, UNRANKED, VARIETY

higherTaxonKey

Filters by any of the higher Linnean rank keys. Note this is within the respective checklist and not searching nub keys across all checklists. This parameter accepts many inputs in a vector ( passed in the same request).

status

Filters by the taxonomic status as one of:

  • ACCEPTED

  • DETERMINATION_SYNONYM Used for unknown child taxa referred to via spec, ssp, ...

  • DOUBTFUL Treated as accepted, but doubtful whether this is correct.

  • HETEROTYPIC_SYNONYM More specific subclass of SYNONYM.

  • HOMOTYPIC_SYNONYM More specific subclass of SYNONYM.

  • INTERMEDIATE_RANK_SYNONYM Used in nub only.

  • MISAPPLIED More specific subclass of SYNONYM.

  • PROPARTE_SYNONYM More specific subclass of SYNONYM.

  • SYNONYM A general synonym, the exact type is unknown.

isExtinct

(logical) Filters by extinction status (e.g. isExtinct=TRUE)

habitat

(character) Filters by habitat. One of: marine, freshwater, or terrestrial

nameType

Filters by the name type as one of:

  • BLACKLISTED surely not a scientific name.

  • CANDIDATUS Candidatus is a component of the taxonomic name for a bacterium that cannot be maintained in a Bacteriology Culture Collection.

  • CULTIVAR a cultivated plant name.

  • DOUBTFUL doubtful whether this is a scientific name at all.

  • HYBRID a hybrid formula (not a hybrid name).

  • INFORMAL a scientific name with some informal addition like "cf." or indetermined like Abies spec.

  • SCINAME a scientific name which is not well formed.

  • VIRUS a virus name.

  • WELLFORMED a well formed scientific name according to present nomenclatural rules.

datasetKey

Filters by the dataset's key (a uuid)

origin

(character) Filters by origin. One of:

  • SOURCE

  • DENORMED_CLASSIFICATION

  • VERBATIM_ACCEPTED

  • EX_AUTHOR_SYNONYM

  • AUTONYM

  • BASIONYM_PLACEHOLDER

  • MISSING_ACCEPTED

  • IMPLICIT_NAME

  • PROPARTE

  • VERBATIM_BASIONYM

nomenclaturalStatus

Not yet implemented, but will eventually allow for filtering by a nomenclatural status enum.

limit

Number of records to return. Hard maximum limit set by GBIF API: 99999.

start

Record number to start at. Default: 0.

facet

A vector/list of facet names used to retrieve the 100 most frequent values for a field. Allowed facets are: datasetKey, higherTaxonKey, rank, status, isExtinct, habitat, and nameType. Additionally threat and nomenclaturalStatus are legal values but not yet implemented, so data will not yet be returned for them.

facetMincount

Used in combination with the facet parameter. Set facetMincount to exclude facets with a count less than x, e.g. http://bit.ly/2osAUQB only shows the type values 'CHECKLIST' and 'OCCURRENCE' because the other types have counts less than 10000

facetMultiselect

(logical) Used in combination with the facet parameter. Set facetMultiselect=TRUE to still return counts for values that are not currently filtered, e.g. http://bit.ly/2JAymaC still shows all type values even though type is being filtered by type=CHECKLIST.

type

Type of name. One of occurrence, checklist, or metadata.

hl

(logical) Set hl=TRUE to highlight terms matching the query when in fulltext search fields. The highlight will be an emphasis tag of class gbifH1 e.g. query='plant', hl=TRUE. Fulltext search fields include: title, keyword, country, publishing country, publishing organization title, hosting organization title, and description. One additional full text field is searched which includes information from metadata documents, but the text of this field is not returned in the response.

issue

Filters by issue. Issue has to be related to names. Type gbif_issues() to get complete list of issues.

constituentKey

Filters by the dataset's constituent key (a uuid).

verbose

(logical) If TRUE, all data is returned as a list for each element. If FALSE (default) a subset of the data that is thought to be most essential is organized into a data.frame.

return

Defunct. All components are returned; index to the one(s) you want

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

An object of class gbif, which is a S3 class list, with slots for metadata (meta), the data itself (data), the taxonomic hierarchy data (hierarchies), and vernacular names (names). In addition, the object has attributes listing the user supplied arguments and type of search, which is, differently from occurrence data, always equals to 'single' even if multiple values for some parameters are given. meta is a list of length four with offset, limit, endOfRecords and count fields. data is a tibble (aka data.frame) containing all information about the found taxa. hierarchies is a list of data.frame's, one per GBIF key (taxon), containing its taxonomic classification. Each data.frame contains two columns: rankkey and name. names returns a list of data.frame's, one per GBIF key (taxon), containing all vernacular names. Each data.frame contains two columns: vernacularName and language.

A list of length five:

  • metadata

  • data: either a data.frame (verbose=FALSE, default) or a list (verbose=TRUE).

  • facets

  • hierarchies

  • names

Repeat parameter inputs

Some parameters can take many inputs, and treated as 'OR' (e.g., a or b or c). The following take many inputs:

  • rank

  • higherTaxonKey

  • status

  • habitat

  • nameType

  • datasetKey

  • origin

References

https://www.gbif.org/developer/species#searching

Examples

## Not run: 
# Look up names like mammalia
name_lookup(query='mammalia', limit = 20)

# Start with an offset
name_lookup(query='mammalia', limit=1)
name_lookup(query='mammalia', limit=1, start=2)

# large requests (paging is internally implemented).
# hard maximum limit set by GBIF API: 99999
# name_lookup(query = "Carnivora", limit = 10000)

# Get all data and parse it, removing descriptions which can be quite long
out <- name_lookup('Helianthus annuus', rank="species", verbose=TRUE)
lapply(out$data, function(x) {
  x[!names(x) %in% c("descriptions","descriptionsSerialized")]
})

# Search for a genus
name_lookup(query="Cnaemidophorus", rank="genus")
# Limit records to certain number
name_lookup('Helianthus annuus', rank="species", limit=2)

# Query by habitat
name_lookup(habitat = "terrestrial", limit=2)
name_lookup(habitat = "marine", limit=2)
name_lookup(habitat = "freshwater", limit=2)

# Using faceting
name_lookup(facet='status', limit=0, facetMincount='70000')
name_lookup(facet=c('status','higherTaxonKey'), limit=0,
  facetMincount='700000')

name_lookup(facet='nameType', limit=0)
name_lookup(facet='habitat', limit=0)
name_lookup(facet='datasetKey', limit=0)
name_lookup(facet='rank', limit=0)
name_lookup(facet='isExtinct', limit=0)

name_lookup(isExtinct=TRUE, limit=0)

# text highlighting
## turn on highlighting
res <- name_lookup(query='canada', hl=TRUE, limit=5)
res$data
name_lookup(query='canada', hl=TRUE, limit=45)
## and you can pass the output to gbif_names() function
res <- name_lookup(query='canada', hl=TRUE, limit=5)
gbif_names(res)

# Lookup by datasetKey (set up sufficient high limit, API maximum: 99999)
# name_lookup(datasetKey='3f8a1297-3259-4700-91fc-acc4170b27ce',
#   limit = 50000)

# Some parameters accept many inputs, treated as OR
name_lookup(rank = c("family", "genus"))
name_lookup(higherTaxonKey = c("119", "120", "121", "204"))
name_lookup(status = c("misapplied", "synonym"))$data
name_lookup(habitat = c("marine", "terrestrial"))
name_lookup(nameType = c("cultivar", "doubtful"))
name_lookup(datasetKey = c("73605f3a-af85-4ade-bbc5-522bfb90d847",
  "d7c60346-44b6-400d-ba27-8d3fbeffc8a5"))
name_lookup(datasetKey = "289244ee-e1c1-49aa-b2d7-d379391ce265",
  origin = c("SOURCE", "DENORMED_CLASSIFICATION"))

# Pass on curl options
name_lookup(query='Cnaemidophorus', rank="genus",
  curlopts = list(verbose = TRUE))

## End(Not run)

Parse taxon names using the GBIF name parser.

Description

Parse taxon names using the GBIF name parser.

Usage

name_parse(scientificname, curlopts = list())

Arguments

scientificname

A character vector of scientific names.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

A data.frame containing fields extracted from parsed taxon names. Fields returned are the union of fields extracted from all species names in scientificname.

Author(s)

John Baumgartner ([email protected])

References

https://www.gbif.org/developer/species#parser

Examples

## Not run: 
name_parse(scientificname='x Agropogon littoralis')
name_parse(c('Arrhenatherum elatius var. elatius',
             'Secale cereale subsp. cereale', 'Secale cereale ssp. cereale',
             'Vanessa atalanta (Linnaeus, 1758)'))
name_parse("Ajuga pyramidata")
name_parse("Ajuga pyramidata x reptans")

# Pass on curl options
# res <- name_parse(c('Arrhenatherum elatius var. elatius',
#          'Secale cereale subsp. cereale', 'Secale cereale ssp. cereale',
#          'Vanessa atalanta (Linnaeus, 1758)'), curlopts=list(verbose=TRUE))

## End(Not run)

Suggest up to 20 name usages.

Description

A quick and simple autocomplete service that returns up to 20 name usages by doing prefix matching against the scientific name. Results are ordered by relevance.

Usage

name_suggest(
  q = NULL,
  datasetKey = NULL,
  rank = NULL,
  fields = NULL,
  start = NULL,
  limit = 100,
  curlopts = list()
)

Arguments

q

(character, required) Simple search parameter. The value for this parameter can be a simple word or a phrase. Wildcards can be added to the simple word parameters only, e.g. q=puma

datasetKey

(character) Filters by the checklist dataset key (a uuid, see examples)

rank

(character) A taxonomic rank. One of class, cultivar, cultivar_group, domain, family, form, genus, informal, infrageneric_name, infraorder, infraspecific_name, infrasubspecific_name, kingdom, order, phylum, section, series, species, strain, subclass, subfamily, subform, subgenus, subkingdom, suborder, subphylum, subsection, subseries, subspecies, subtribe, subvariety, superclass, superfamily, superorder, superphylum, suprageneric_name, tribe, unranked, or variety.

fields

(character) Fields to return in output data.frame (simply prunes columns off)

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

limit

Number of records to return. Default: 100. Maximum: 1000.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

A list, with two elements data (tibble) and hierarchy (list of data.frame's). If 'higherClassificationMap' is one of the fields requested, then hierarchy is a list of data.frame's; if not included, hierarchy is an empty list.

Repeat parmeter inputs

Some parameters can take many inputs, and treated as 'OR' (e.g., a or b or c). The following take many inputs:

  • rank

  • datasetKey

References

https://www.gbif.org/developer/species#searching

Examples

## Not run: 
name_suggest(q='Puma concolor')
name_suggest(q='Puma')
name_suggest(q='Puma', rank="genus")
name_suggest(q='Puma', rank="subspecies")
name_suggest(q='Puma', rank="species")
name_suggest(q='Puma', rank="infraspecific_name")

name_suggest(q='Puma', limit=2)
name_suggest(q='Puma', fields=c('key','canonicalName'))
name_suggest(q='Puma', fields=c('key','canonicalName',
  'higherClassificationMap'))

# Some parameters accept many inputs, treated as OR
name_suggest(rank = c("family", "genus"))
name_suggest(datasetKey = c("73605f3a-af85-4ade-bbc5-522bfb90d847",
  "d7c60346-44b6-400d-ba27-8d3fbeffc8a5"))

# If 'higherClassificationMap' in fields, a list is returned
name_suggest(q='Puma', fields=c('key','higherClassificationMap'))

# Pass on curl options
name_suggest(q='Puma', limit=200, curlopts = list(verbose=TRUE))

## End(Not run)

Lookup details for specific names in all taxonomies in GBIF.

Description

Lookup details for specific names in all taxonomies in GBIF.

Usage

name_usage(
  key = NULL,
  name = NULL,
  data = "all",
  language = NULL,
  datasetKey = NULL,
  uuid = NULL,
  rank = NULL,
  shortname = NULL,
  start = 0,
  limit = 100,
  return = NULL,
  curlopts = list()
)

Arguments

key

(numeric or character) A GBIF key for a taxon

name

(character) Filters by a case insensitive, canonical namestring, e.g. 'Puma concolor'

data

(character) Specify an option to select what data is returned. See Description below.

language

(character) Language, default is english

datasetKey

(character) Filters by the dataset's key (a uuid). Must be length=1

uuid

(character) A dataset key

rank

(character) Taxonomic rank. Filters by taxonomic rank as one of: CLASS, CULTIVAR, CULTIVAR_GROUP, DOMAIN, FAMILY, FORM, GENUS, INFORMAL, INFRAGENERIC_NAME, INFRAORDER, INFRASPECIFIC_NAME, INFRASUBSPECIFIC_NAME, KINGDOM, ORDER, PHYLUM, SECTION, SERIES, SPECIES, STRAIN, SUBCLASS, SUBFAMILY, SUBFORM, SUBGENUS, SUBKINGDOM, SUBORDER, SUBPHYLUM, SUBSECTION, SUBSERIES, SUBSPECIES, SUBTRIBE, SUBVARIETY, SUPERCLASS, SUPERFAMILY, SUPERORDER, SUPERPHYLUM, SUPRAGENERIC_NAME, TRIBE, UNRANKED, VARIETY

shortname

(character) A short name for a dataset - it may not do anything

start

Record number to start at. Default: 0.

limit

Number of records to return. Default: 100.

return

Defunct. All components are returned; index to the one(s) you want

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

This service uses fuzzy lookup so that you can put in partial names and you should get back those things that match. See examples below.

This function is different from name_lookup() in that that function searches for names. This function encompasses a bunch of API endpoints, most of which require that you already have a taxon key, but there is one endpoint that allows name searches (see examples below).

Note that data="verbatim" hasn't been working.

Options for the data parameter are: 'all', 'verbatim', 'name', 'parents', 'children', 'related', 'synonyms', 'descriptions','distributions', 'media', 'references', 'speciesProfiles', 'vernacularNames', 'typeSpecimens', 'root', 'iucnRedListCategory'

This function used to be vectorized with respect to the data parameter, where you could pass in multiple values and the function internally loops over each option making separate requests. This has been removed. You can still loop over many options for the data parameter, just use an lapply family function, or a for loop, etc.

See name_issues() for more information about issues in issues column.

Value

An object of class gbif, which is a S3 class list, with slots for metadata (meta) and the data itself (data). In addition, the object has attributes listing the user supplied arguments and type of search, which is, differently from occurrence data, always equals to 'single' even if multiple values for some parameters are given. meta is a list of length four with offset, limit, endOfRecords and count fields. data is a tibble (aka data.frame) containing all information about the found taxa.

Repeat parameter inputs

These parameters used to accept many inputs, but no longer do:

  • rank

  • name

  • langugae

  • datasetKey

References

https://www.gbif.org/developer/species#nameUsages

Examples

## Not run: 
# A single name usage
name_usage(key=1)

# Name usage for a taxonomic name
name_usage(name='Puma', rank="GENUS")

# Name usage for all taxa in a dataset
# (set sufficient high limit, but less than 100000)
# name_usage(datasetKey = "9ff7d317-609b-4c08-bd86-3bc404b77c42", 
#  limit = 10000)
# All name usages
name_usage()

# References for a name usage
name_usage(key=2435099, data='references')

# Species profiles, descriptions
name_usage(key=3119195, data='speciesProfiles')
name_usage(key=3119195, data='descriptions')
name_usage(key=2435099, data='children')

# Vernacular names for a name usage
name_usage(key=3119195, data='vernacularNames')

# Limit number of results returned
name_usage(key=3119195, data='vernacularNames', limit=3)

# Search for names by dataset with datasetKey parameter
name_usage(datasetKey="d7dddbf4-2cf0-4f39-9b2a-bb099caae36c")

# Search for a particular language
name_usage(key=3119195, language="FRENCH", data='vernacularNames')

# get root usage with a uuid
name_usage(data = "root", uuid = "73605f3a-af85-4ade-bbc5-522bfb90d847")

# search by language
name_usage(language = "spanish")

# Pass on curl options
name_usage(name='Puma concolor', limit=300, curlopts = list(verbose=TRUE))

# look up iucn red list category 
name_usage(key = 7707728, data = 'iucnRedListCategory') 

## End(Not run)

Get data about GBIF networks

Description

Get data about GBIF networks

Usage

network(
  data = "all",
  uuid = NULL,
  query = NULL,
  identifier = NULL,
  identifierType = NULL,
  limit = 100,
  start = NULL,
  curlopts = list()
)

network_constituents(uuid = NULL, limit = 100, start = 0)

Arguments

data

The type of data to get. One or more of: 'contact', 'endpoint', 'identifier', 'tag', 'machineTag', 'comment', 'constituents', or the special 'all'. Default: 'all'

uuid

UUID of the data network provider. This must be specified if data is anything other than 'all'. Only 1 can be passed in

query

Query nodes. Only used when data='all'. Ignored otherwise.

identifier

The value for this parameter can be a simple string or integer, e.g. identifier=120. This parameter doesn't seem to work right now.

identifierType

Used in combination with the identifier parameter to filter identifiers by identifier type. See details. This parameter doesn't seem to work right now.

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

identifierType options:

  • DOI No description.

  • FTP No description.

  • GBIF_NODE Identifies the node (e.g: DK for Denmark, sp2000 for Species 2000).

  • GBIF_PARTICIPANT Participant identifier from the GBIF IMS Filemaker system.

  • GBIF_PORTAL Indicates the identifier originated from an auto_increment column in the portal.data_provider or portal.data_resource table respectively.

  • HANDLER No description.

  • LSID Reference controlled by a separate system, used for example by DOI.

  • SOURCE_ID No description.

  • UNKNOWN No description.

  • URI No description.

  • URL No description.

  • UUID No description.

Get various information about GBIF networks. network_constituents() is a convenience function that allows you to get all the datasets in a network.

Value

  • network() returns a list

  • network_constituents() returns a data.frame of datasets in the network

References

https://www.gbif.org/developer/registry#networks

Examples

## Not run: 
network()
network(uuid='2b7c7b4f-4d4f-40d3-94de-c28b6fa054a6')

network_constituents('2b7c7b4f-4d4f-40d3-94de-c28b6fa054a6')

# curl options
network(curlopts = list(verbose=TRUE))

## End(Not run)

Networks metadata.

Description

Networks metadata.

Usage

networks(
  data = "all",
  uuid = NULL,
  query = NULL,
  identifier = NULL,
  identifierType = NULL,
  limit = 100,
  start = NULL,
  curlopts = list()
)

Arguments

data

The type of data to get. One or more of: 'contact', 'endpoint', 'identifier', 'tag', 'machineTag', 'comment', 'constituents', or the special 'all'. Default: 'all'

uuid

UUID of the data network provider. This must be specified if data is anything other than 'all'. Only 1 can be passed in

query

Query nodes. Only used when data='all'. Ignored otherwise.

identifier

The value for this parameter can be a simple string or integer, e.g. identifier=120. This parameter doesn't seem to work right now.

identifierType

Used in combination with the identifier parameter to filter identifiers by identifier type. See details. This parameter doesn't seem to work right now.

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

identifierType options:

  • DOI No description.

  • FTP No description.

  • GBIF_NODE Identifies the node (e.g: DK for Denmark, sp2000 for Species 2000).

  • GBIF_PARTICIPANT Participant identifier from the GBIF IMS Filemaker system.

  • GBIF_PORTAL Indicates the identifier originated from an auto_increment column in the portal.data_provider or portal.data_resource table respectively.

  • HANDLER No description.

  • LSID Reference controlled by a separate system, used for example by DOI.

  • SOURCE_ID No description.

  • UNKNOWN No description.

  • URI No description.

  • URL No description.

  • UUID No description.

References

https://www.gbif.org/developer/registry#networks

Examples

## Not run: 
networks()
networks(uuid='2b7c7b4f-4d4f-40d3-94de-c28b6fa054a6')

# curl options
networks(curlopts = list(verbose=TRUE))

## End(Not run)

Nodes metadata.

Description

Nodes metadata.

Usage

nodes(
  data = "all",
  uuid = NULL,
  query = NULL,
  identifier = NULL,
  identifierType = NULL,
  limit = 100,
  start = NULL,
  isocode = NULL,
  curlopts = list()
)

Arguments

data

The type of data to get. One or more of: 'organization', 'endpoint', 'identifier', 'tag', 'machineTag', 'comment', 'pendingEndorsement', 'country', 'dataset', 'installation', or the special 'all'. Default: 'all'

uuid

UUID of the data node provider. This must be specified if data is anything other than 'all'.

query

Query nodes. Only used when data='all'

identifier

The value for this parameter can be a simple string or integer, e.g. identifier=120. This parameter doesn't seem to work right now.

identifierType

Used in combination with the identifier parameter to filter identifiers by identifier type. See details. This parameter doesn't seem to work right now.

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

isocode

A 2 letter country code. Only used if data='country'.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Details

identifierType options:

  • DOI No description.

  • FTP No description.

  • GBIF_NODE Identifies the node (e.g: DK for Denmark, sp2000 for Species 2000).

  • GBIF_PARTICIPANT Participant identifier from the GBIF IMS Filemaker system.

  • GBIF_PORTAL Indicates the identifier originated from an auto_increment column in the portal.data_provider or portal.data_resource table respectively.

  • HANDLER No description.

  • LSID Reference controlled by a separate system, used for example by DOI.

  • SOURCE_ID No description.

  • UNKNOWN No description.

  • URI No description.

  • URL No description.

  • UUID No description.

References

https://www.gbif.org/developer/registry#nodes

Examples

## Not run: 
nodes(limit=5)
nodes(uuid="1193638d-32d1-43f0-a855-8727c94299d8")
nodes(data='identifier', uuid="03e816b3-8f58-49ae-bc12-4e18b358d6d9")
nodes(data=c('identifier','organization','comment'),
  uuid="03e816b3-8f58-49ae-bc12-4e18b358d6d9")

uuids = c("8cb55387-7802-40e8-86d6-d357a583c596",
  "02c40d2a-1cba-4633-90b7-e36e5e97aba8",
  "7a17efec-0a6a-424c-b743-f715852c3c1f",
  "b797ce0f-47e6-4231-b048-6b62ca3b0f55",
  "1193638d-32d1-43f0-a855-8727c94299d8",
  "d3499f89-5bc0-4454-8cdb-60bead228a6d",
  "cdc9736d-5ff7-4ece-9959-3c744360cdb3",
  "a8b16421-d80b-4ef3-8f22-098b01a89255",
  "8df8d012-8e64-4c8a-886e-521a3bdfa623",
  "b35cf8f1-748d-467a-adca-4f9170f20a4e",
  "03e816b3-8f58-49ae-bc12-4e18b358d6d9",
  "073d1223-70b1-4433-bb21-dd70afe3053b",
  "07dfe2f9-5116-4922-9a8a-3e0912276a72",
  "086f5148-c0a8-469b-84cc-cce5342f9242",
  "0909d601-bda2-42df-9e63-a6d51847ebce",
  "0e0181bf-9c78-4676-bdc3-54765e661bb8",
  "109aea14-c252-4a85-96e2-f5f4d5d088f4",
  "169eb292-376b-4cc6-8e31-9c2c432de0ad",
  "1e789bc9-79fc-4e60-a49e-89dfc45a7188",
  "1f94b3ca-9345-4d65-afe2-4bace93aa0fe")

res <- lapply(uuids, function(x) nodes(x, data='identifier')$data)
res <- res[!sapply(res, NROW)==0]
res[1]

# Pass on curl options
nodes(limit=20, curlopts=list(verbose=TRUE))

## End(Not run)

Get number of occurrence records.

Description

Get number of occurrence records.

Usage

occ_count(..., occurrenceStatus = "PRESENT", curlopts = list())

Arguments

...

parameters passed to occ_search().

occurrenceStatus

(character) Default is "PRESENT". Specify whether search should return "PRESENT" or "ABSENT" data.

curlopts

(list) curl options.

Details

occ_count() is a short convenience wrapper for occ_search(limit=0)$meta$count.

The current version (since rgbif 3.7.6) of occ_count() uses a different GBIF API endpoint from previous versions. This change greatly improves the usability of occ_count(). Legacy parameters georeferenced, type, date, to, from are no longer supported and not guaranteed to work correctly.

Multiple values of the type c("a","b") will give an error, but "a;b" will work.

Value

The occurrence count of the occ_search() query.

See Also

occ_count_year(), occ_count_country(), occ_count_pub_country(), occ_count_basis_of_record()

Examples

## Not run: 
# total occurrences mediated by GBIF
occ_count() # should be > 2 billion! 

# number of plant occurrences
occ_count(kingdomKey=name_backbone("Plantea")$usageKey) 
occ_count(scientificName = 'Ursus americanus')

occ_count(country="DK") # found in Denmark 
occ_count(country="DK;US") # found in Denmark and United States
occ_count(publishingCountry="US") # published by the United States
# number of repatriated eBird records in India
occ_count(repatriated = TRUE,country="IN") 
 
occ_count(taxonKey=212) # number of bird occurrences
# between years 1800-1900
occ_count(basisOfRecord="PRESERVED_SPECIMEN", year="1800,1900") 
occ_count(recordedBy="John Waller") # recorded by John Waller
occ_count(decimalLatitude=0, decimalLongitude=0) # exactly on 0,0

# close to a known iso2 centroid
occ_count(distanceFromCentroidInMeters="0,2000") 
# close to a known iso2 centroid in Sweden
occ_count(distanceFromCentroidInMeters="0,2000",country="SE") 

occ_count(hasCoordinate=TRUE) # with coordinates
occ_count(protocol = "DIGIR") # published using DIGIR format
occ_count(mediaType = 'StillImage') # with images

# number of occurrences iucn status "critically endangered"
occ_count(iucnRedListCategory="CR") 
occ_count(verbatimScientificName="Calopteryx splendens;Calopteryx virgo")
occ_count(
geometry="POLYGON((24.70938 48.9221,24.71056 48.92175,24.71107
 48.92296,24.71002 48.92318,24.70938 48.9221))")

# getting a table of counts using the facets interface
# occurrence counts by year
occ_count(facet="year")
occ_count(facet="year",facetLimit=400)

# top scientificNames from Japan
occ_count(facet="scientificName",country="JP")
# top countries publishing specimen bird records between 1850 and 1880
occ_count(facet="scientificName",taxonKey=212,basisOfRecord="PRESERVED_SPECIMEN"
,year="1850,1880")

# Number of present or absence records of Elephants
occ_count(facet="occurrenceStatus",scientificName="Elephantidae")

# top 100 datasets publshing occurrences to GBIF
occ_count(facet="datasetKey",facetLimit=100)
# top datasets publishing country centroids on GBIF
occ_count(facet="datasetKey",distanceFromCentroidInMeters="0")

# common values for coordinateUncertaintyInMeters for museum specimens
occ_count(facet="coordinateUncertaintyInMeters",basisOfRecord="PRESERVED_SPECIMEN")

# number of iucn listed bird and insect occurrences in Mexico
occ_count(facet="iucnRedListCategory",taxonKey="212;216",country="MX")

# most common latitude values mediated by GBIF
occ_count(facet="decimalLatitude")

# top iNaturalist users publishing research-grade obs to GBIF
occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7")
# top 100 iNaturalist users from Ukraine
occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7"
,country="UA",facetLimit=100)

# top institutions publishing specimen occurrences to GBIF
occ_count(facet="institutionCode",basisOfRecord="PRESERVED_SPECIMEN")


## End(Not run)

Get quick pre-computed occurrence counts of a limited number of dimensions.

Description

Get quick pre-computed occurrence counts of a limited number of dimensions.

Usage

occ_count_country(publishingCountry = NULL)

occ_count_pub_country(country = NULL)

occ_count_year(year = NULL)

occ_count_basis_of_record(curlopts = list())

Arguments

publishingCountry

The 2-letter country code (as per ISO-3166-1) the country from which the occurrence was published.

country

(character) The 2-letter country code (ISO-3166-1) in which the occurrence was recorded.

year

The 4 digit year. Supports range queries, 'smaller,larger' (e.g., '1990,1991', whereas 1991, 1990' wouldn't work).

curlopts

(list) curl options.

Details

Get quick pre-computed counts of a limited number of dimensions.

occ_count_country() will return a data.frame with occurrence counts by country. By using occ_count_country(publishingCountry="DK") will return the occurrence contributions Denmark has made to each country.

occ_count_pub_country() will return a data.frame with occurrence counts by publishing country. Using occ_count_pub_country(country="DK"), will return the occurrence contributions each country has made to that focal country=DK.

occ_count_year() will return a data.frame with the total occurrences mediated by GBIF for each year. By using occ_counts_year(year="1800,1900") will only return counts for that range.

occ_count_basis_of_record() will return a data.frame with total occurrences mediated by GBIF for each basis of record.

Value

A data.frame of counts.

See Also

occ_count()

Examples

## Not run: 
# total occurrence counts for all countries and iso2 places
occ_count_country()  
# the occurrences Mexico has published in other countries 
occ_count_country("MX") 
# the occurrences Denmark has published in other countries 
occ_count_country("DK")

# the occurrences other countries have published in Denmark
occ_count_pub_country("DK")
# the occurrences other countries have published in Mexico
occ_count_pub_country("MX")

# total occurrence counts for each year that an occurrence was 
# recorded or collected.
occ_count_year()
# supports ranges
occ_count_year("1800,1900")

# table of occurrence counts by basis of record
occ_count_basis_of_record()


## End(Not run)

Legacy alternative to occ_search

Description

Legacy alternative to occ_search

Usage

occ_data(
  taxonKey = NULL,
  scientificName = NULL,
  country = NULL,
  publishingCountry = NULL,
  hasCoordinate = NULL,
  typeStatus = NULL,
  recordNumber = NULL,
  lastInterpreted = NULL,
  continent = NULL,
  geometry = NULL,
  geom_big = "asis",
  geom_size = 40,
  geom_n = 10,
  recordedBy = NULL,
  recordedByID = NULL,
  identifiedByID = NULL,
  basisOfRecord = NULL,
  datasetKey = NULL,
  eventDate = NULL,
  catalogNumber = NULL,
  year = NULL,
  month = NULL,
  decimalLatitude = NULL,
  decimalLongitude = NULL,
  elevation = NULL,
  depth = NULL,
  institutionCode = NULL,
  collectionCode = NULL,
  hasGeospatialIssue = NULL,
  issue = NULL,
  search = NULL,
  mediaType = NULL,
  subgenusKey = NULL,
  repatriated = NULL,
  phylumKey = NULL,
  kingdomKey = NULL,
  classKey = NULL,
  orderKey = NULL,
  familyKey = NULL,
  genusKey = NULL,
  speciesKey = NULL,
  establishmentMeans = NULL,
  degreeOfEstablishment = NULL,
  protocol = NULL,
  license = NULL,
  organismId = NULL,
  publishingOrg = NULL,
  stateProvince = NULL,
  waterBody = NULL,
  locality = NULL,
  occurrenceStatus = "PRESENT",
  gadmGid = NULL,
  coordinateUncertaintyInMeters = NULL,
  verbatimScientificName = NULL,
  eventId = NULL,
  identifiedBy = NULL,
  networkKey = NULL,
  verbatimTaxonId = NULL,
  occurrenceId = NULL,
  organismQuantity = NULL,
  organismQuantityType = NULL,
  relativeOrganismQuantity = NULL,
  iucnRedListCategory = NULL,
  lifeStage = NULL,
  isInCluster = NULL,
  distanceFromCentroidInMeters = NULL,
  skip_validate = TRUE,
  limit = 500,
  start = 0,
  curlopts = list(http_version = 2)
)

Arguments

taxonKey

(numeric) A taxon key from the GBIF backbone. All included and synonym taxa are included in the search, so a search for aves with taxononKey=212 will match all birds, no matter which species. You can pass many keys to occ_search(taxonKey=c(1,212)).

scientificName

A scientific name from the GBIF backbone. All included and synonym taxa are included in the search.

country

(character) The 2-letter country code (ISO-3166-1) in which the occurrence was recorded. enumeration_country().

publishingCountry

The 2-letter country code (as per ISO-3166-1) of the country in which the occurrence was recorded. See enumeration_country().

hasCoordinate

(logical) Return only occurrence records with lat/long data (TRUE) or all records (FALSE, default).

typeStatus

Type status of the specimen. One of many options.

recordNumber

Number recorded by collector of the data, different from GBIF record number.

lastInterpreted

Date the record was last modified in GBIF, in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, 'smaller,larger' (e.g., '1990,1991', whereas '1991,1990' wouldn't work).

continent

The source supplied continent.

  • "africa"

  • "antarctica"

  • "asia"

  • "europe"

  • "north_america"

  • "oceania"

  • "south_america"

Continent is not inferred but only populated if provided by the dataset publisher. Applying this filter may exclude many relevant records.

geometry

(character) Searches for occurrences inside a polygon in Well Known Text (WKT) format. A WKT shape written as either

  • "POINT"

  • "LINESTRING"

  • "LINEARRING"

  • "POLYGON"

  • "MULTIPOLYGON"

For Example, "POLYGON((37.08 46.86,38.06 46.86,38.06 47.28,37.08 47.28, 37.0 46.8))". See also the section WKT below.

geom_big

(character) One"bbox" or "asis" (default).

geom_size

(integer) An integer indicating size of the cell. Default: 40.

geom_n

(integer) An integer indicating number of cells in each dimension. Default: 10.

recordedBy

(character) The person who recorded the occurrence.

recordedByID

(character) Identifier (e.g. ORCID) for the person who recorded the occurrence

identifiedByID

(character) Identifier (e.g. ORCID) for the person who provided the taxonomic identification of the occurrence.

basisOfRecord

(character) The specific nature of the data record. See here.

  • "FOSSIL_SPECIMEN"

  • "HUMAN_OBSERVATION"

  • "MATERIAL_CITATION"

  • "MATERIAL_SAMPLE"

  • "LIVING_SPECIMEN"

  • "MACHINE_OBSERVATION"

  • "OBSERVATION"

  • "PRESERVED_SPECIMEN"

  • "OCCURRENCE"

datasetKey

(character) The occurrence dataset uuid key. That can be found in the dataset page url. For example, "7e380070-f762-11e1-a439-00145 eb45e9a" is the key for Natural History Museum (London) Collection Specimens.

eventDate

(character) Occurrence date in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. Supports range queries, 'smaller,larger' ('1990,1991', whereas '1991,1990' wouldn't work).

catalogNumber

(character) An identifier of any form assigned by the source within a physical collection or digital dataset for the record which may not unique, but should be fairly unique in combination with the institution and collection code.

year

The 4 digit year. A year of 98 will be interpreted as AD 98. Supports range queries, 'smaller,larger' (e.g., '1990,1991', whereas 1991, 1990' wouldn't work).

month

The month of the year, starting with 1 for January. Supports range queries, 'smaller,larger' (e.g., '1,2', whereas '2,1' wouldn't work).

decimalLatitude

Latitude in decimals between -90 and 90 based on WGS84. Supports range queries, 'smaller,larger' (e.g., '25,30', whereas '30,25' wouldn't work).

decimalLongitude

Longitude in decimals between -180 and 180 based on WGS84. Supports range queries (e.g., '-0.4,-0.2', whereas '-0.2,-0.4' wouldn't work).

elevation

Elevation in meters above sea level. Supports range queries, 'smaller,larger' (e.g., '5,30', whereas '30,5' wouldn't work).

depth

Depth in meters relative to elevation. For example 10 meters below a lake surface with given elevation. Supports range queries, 'smaller,larger' (e.g., '5,30', whereas '30,5' wouldn't work).

institutionCode

An identifier of any form assigned by the source to identify the institution the record belongs to.

collectionCode

(character) An identifier of any form assigned by the source to identify the physical collection or digital dataset uniquely within the text of an institution.

hasGeospatialIssue

(logical) Includes/excludes occurrence records which contain spatial issues (as determined in our record interpretation), i.e. hasGeospatialIssue=TRUE returns only those records with spatial issues while hasGeospatialIssue=FALSE includes only records without spatial issues. The absence of this parameter returns any record with or without spatial issues.

issue

(character) One or more of many possible issues with each occurrence record. Issues passed to this parameter filter results by the issue. One of many options. See here for definitions.

search

(character) Query terms. The value for this parameter can be a simple word or a phrase. For example, search="puma"

mediaType

(character) Media type of "MovingImage", "Sound", or "StillImage".

subgenusKey

(numeric) Subgenus classification key.

repatriated

(character) Searches for records whose publishing country is different to the country where the record was recorded in.

phylumKey

(numeric) Phylum classification key.

kingdomKey

(numeric) Kingdom classification key.

classKey

(numeric) Class classification key.

orderKey

(numeric) Order classification key.

familyKey

(numeric) Family classification key.

genusKey

(numeric) Genus classification key.

speciesKey

(numeric) Species classification key.

establishmentMeans

(character) provides information about whether an organism or organisms have been introduced to a given place and time through the direct or indirect activity of modern humans.

  • "Introduced"

  • "Native"

  • "NativeReintroduced"

  • "Vagrant"

  • "Uncertain"

  • "IntroducedAssistedColonisation"

degreeOfEstablishment

(character) Provides information about degree to which an Organism survives, reproduces, and expands its range at the given place and time. One of many options.

protocol

(character) Protocol or mechanism used to provide the occurrence record. One of many options.

license

(character) The type license applied to the dataset or record.

  • "CC0_1_0"

  • "CC_BY_4_0"

  • "CC_BY_NC_4_0"

organismId

(numeric) An identifier for the Organism instance (as opposed to a particular digital record of the Organism). May be a globally unique identifier or an identifier specific to the data set.

publishingOrg

(character) The publishing organization key (a UUID).

stateProvince

(character) The name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the Location occurs.

waterBody

(character) The name of the water body in which the locations occur

locality

(character) The specific description of the place.

occurrenceStatus

(character) Default is "PRESENT". Specify whether search should return "PRESENT" or "ABSENT" data.

gadmGid

(character) The gadm id of the area occurrences are desired from. https://gadm.org/.

coordinateUncertaintyInMeters

A number or range between 0-1,000,000 which specifies the desired coordinate uncertainty. A coordinateUncertainty InMeters=1000 will be interpreted all records with exactly 1000m. Supports range queries, 'smaller,larger' (e.g., '1000,10000', whereas '10000,1000' wouldn't work).

verbatimScientificName

(character) Scientific name as provided by the source.

eventId

(character) identifier(s) for a sampling event.

identifiedBy

(character) names of people, groups, or organizations.

networkKey

(character) The occurrence network key (a uuid) who assigned the Taxon to the subject.

verbatimTaxonId

(character) The taxon identifier provided to GBIF by the data publisher.

occurrenceId

(character) occurrence id from source.

organismQuantity

A number or range which specifies the desired organism quantity. An organismQuantity=5 will be interpreted all records with exactly 5. Supports range queries, smaller,larger (e.g., '5,20', whereas '20,5' wouldn't work).

organismQuantityType

(character) The type of quantification system used for the quantity of organisms. For example, "individuals" or "biomass".

relativeOrganismQuantity

(numeric) A relativeOrganismQuantity=0.1 will be interpreted all records with exactly 0.1 The relative measurement of the quantity of the organism (a number between 0-1). Supports range queries, "smaller,larger" (e.g., '0.1,0.5', whereas '0.5,0.1' wouldn't work).

iucnRedListCategory

(character) The IUCN threat status category.

  • "NE" (Not Evaluated)

  • "DD" (Data Deficient)

  • "LC" (Least Concern)

  • "NT" (Near Threatened)

  • "VU" (Vulnerable)

  • "EN" (Endangered)

  • "CR" (Critically Endangered)

  • "EX" (Extinct)

  • "EW" (Extinct in the Wild)

lifeStage

(character) the life stage of the occurrence. One of many options.

isInCluster

(logical) identify potentially related records on GBIF.

distanceFromCentroidInMeters

A number or range. A value of "2000,*" means at least 2km from known centroids. A value of "0" would mean occurrences exactly on known centroids. A value of "0,2000" would mean within 2km of centroids. Max value is 5000.

skip_validate

(logical) whether to skip wellknown::validate_wkt call or not. passed down to check_wkt(). Default: TRUE

limit

Number of records to return. Default: 500. Note that the per request maximum is 300, but since we set it at 500 for the function, we do two requests to get you the 500 records (if there are that many). Note that there is a hard maximum of 100,000, which is calculated as the limit+start, so start=99,000 and limit=2000 won't work

start

Record number to start at. Use in combination with limit to page through results. Note that we do the paging internally for you, but you can manually set the start parameter

curlopts

(list)

Details

This function is a legacy alternative to occ_search(). It is not recommended to use occ_data() as it is not as flexible as occ_search(). New search terms will not be added to this function and it is only supported for legacy reasons.

Value

An object of class gbif_data, which is a S3 class list, with slots for metadata (meta) and the occurrence data itself (data), and with attributes listing the user supplied arguments and whether it was a "single" or "many" search; that is, if you supply two values of the datasetKey parameter to searches are done, and it's a "many". meta is a list of length four with offset, limit, endOfRecords and count fields. data is a tibble (aka data.frame)


Spin up a download request for GBIF occurrence data.

Description

Spin up a download request for GBIF occurrence data.

Usage

occ_download(
  ...,
  body = NULL,
  type = "and",
  format = "DWCA",
  user = NULL,
  pwd = NULL,
  email = NULL,
  curlopts = list()
)

occ_download_prep(
  ...,
  body = NULL,
  type = "and",
  format = "DWCA",
  user = NULL,
  pwd = NULL,
  email = NULL,
  curlopts = list()
)

Arguments

...

For occ_download() and occ_download_prep(), one or more objects of class occ_predicate or occ_predicate_list, created by ⁠pred*⁠ functions (see download_predicate_dsl). If you use this, don't use body parameter.

body

if you prefer to pass in the payload yourself, use this parameter. If you use this, don't pass anything to the dots. Accepts either an R list, or JSON. JSON is likely easier, since the JSON library jsonlite requires that you unbox strings that shouldn't be auto-converted to arrays, which is a bit tedious for large queries. optional

type

(character) One of equals (=), and (&), or (|), lessThan (<), lessThanOrEquals (<=), greaterThan (>), greaterThanOrEquals (>=), in, within, not (!), like, isNotNull

format

(character) The download format. One of 'DWCA' (default), 'SIMPLE_CSV', or 'SPECIES_LIST'

user

(character) User name within GBIF's website. Required. See "Authentication" below

pwd

(character) User password within GBIF's website. Required. See "Authentication" below

email

(character) Email address to receive download notice done email. Required. See "Authentication" below

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

geometry

When using the geometry parameter, make sure that your well known text (WKT) is formatted as GBIF expects it. They expect WKT to have a counter-clockwise winding order. For example, the following is clockwise ⁠POLYGON((-19.5 34.1, -25.3 68.1, 35.9 68.1, 27.8 34.1, -19.5 34.1))⁠, whereas they expect the other order: ⁠POLYGON((-19.5 34.1, 27.8 34.1, 35.9 68.1, -25.3 68.1, -19.5 34.1))⁠

note that coordinate pairs are ⁠longitude latitude⁠, longitude first, then latitude

you should not get any results if you supply WKT that has clockwise winding order.

also note that occ_search()/occ_data() behave differently with respect to WKT in that you can supply clockwise WKT to those functions but they treat it as an exclusion, so get all data not inside the WKT area.

Methods

  • occ_download_prep: prepares a download request, but DOES NOT execute it. meant for use with occ_download_queue()

  • occ_download: prepares a download request and DOES execute it

Authentication

For user, pwd, and email parameters, you can set them in one of three ways:

  • Set them in your .Rprofile file with the names gbif_user, gbif_pwd, and gbif_email

  • Set them in your .Renviron/.bash_profile (or similar) file with the names GBIF_USER, GBIF_PWD, and GBIF_EMAIL

  • Simply pass strings to each of the parameters in the function call

We strongly recommend the second option - storing your details as environment variables as it's the most widely used way to store secrets.

See ?Startup for help.

Query length

GBIF has a limit of 12,000 characters for a download query. This means that you can have a pretty long query, but at some point it may lead to an error on GBIF's side and you'll have to split your query into a few.

Note

see downloads for an overview of GBIF downloads methods

References

See the API docs https://www.gbif.org/developer/occurrence#download for more info, and the predicates docs https://www.gbif.org/developer/occurrence#predicates

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait()

Examples

## Not run: 
# occ_download(pred("basisOfRecord", "LITERATURE"))
# occ_download(pred("taxonKey", 3119195), pred_gt("elevation", 5000))
# occ_download(pred_gt("decimalLatitude", 50))
# occ_download(pred_gte("elevation", 9000))
# occ_download(pred_gte('decimalLatitude", 65))
# occ_download(pred("country", "US"))
# occ_download(pred("institutionCode", "TLMF"))
# occ_download(pred("catalogNumber", 217880))
# occ_download(pred("gbifId", 142317604)) 

# download format
# z <- occ_download(pred_gte("decimalLatitude", 75),
#  format = "SPECIES_LIST")

# res <- occ_download(pred("taxonKey", 7264332), pred("hasCoordinate", TRUE))

# pass output directly, or later, to occ_download_meta for more information
# occ_download(pred_gt('decimalLatitude', 75)) %>% occ_download_meta

# Multiple queries
# occ_download(pred_gte("decimalLatitude", 65),
#  pred_lte("decimalLatitude", -65), type="or")
# gg <- occ_download(pred("depth", 80), pred("taxonKey", 2343454),
#  type="or")
# x <- occ_download(pred_and(pred_within("POLYGON((-14 42, 9 38, -7 26, -14 42))"),
#  pred_gte("elevation", 5000)))

# complex example with many predicates
# shows example of how to do date ranges for both year and month
# res <- occ_download(
#  pred_gt("elevation", 5000),
#  pred_in("basisOfRecord", c('HUMAN_OBSERVATION','OBSERVATION','MACHINE_OBSERVATION')),
#  pred("country", "US"),
#  pred("hasCoordinate", TRUE),
#  pred("hasGeospatialIssue", FALSE),
#  pred_gte("year", 1999),
#  pred_lte("year", 2011),
#  pred_gte("month", 3),
#  pred_lte("month", 8)
# )

# Using body parameter - pass in your own complete query
## as JSON
query1 <- '{"creator":"sckott",
  "notification_address":["[email protected]"],
  "predicate":{"type":"and","predicates":[
    {"type":"equals","key":"TAXON_KEY","value":"7264332"},
    {"type":"equals","key":"HAS_COORDINATE","value":"TRUE"}]}
 }'
# res <- occ_download(body = query1, curlopts=list(verbose=TRUE))

## as a list
library(jsonlite)
query <- list(
  creator = unbox("sckott"),
  notification_address = "[email protected]",
  predicate = list(
    type = unbox("and"),
    predicates = list(
      list(type = unbox("equals"), key = unbox("TAXON_KEY"),
        value = unbox("7264332")),
      list(type = unbox("equals"), key = unbox("HAS_COORDINATE"),
        value = unbox("TRUE"))
    )
  )
)
# res <- occ_download(body = query, curlopts = list(verbose = TRUE))

# Prepared query
occ_download_prep(pred("basisOfRecord", "LITERATURE"))
occ_download_prep(pred("basisOfRecord", "LITERATURE"), format = "SIMPLE_CSV")
occ_download_prep(pred("basisOfRecord", "LITERATURE"), format = "SPECIES_LIST")
occ_download_prep(pred_in("taxonKey", c(2977832, 2977901, 2977966, 2977835)))
occ_download_prep(pred_within("POLYGON((-14 42, 9 38, -7 26, -14 42))"))

## a complicated example
occ_download_prep(
  pred_in("basisOfRecord", c("MACHINE_OBSERVATION", "HUMAN_OBSERVATION")),
  pred_in("taxonKey", c(2498343, 2481776, 2481890)),
  pred_in("country", c("GB", "IE")),
  pred_or(pred_lte("year", 1989), pred("year", 2000))
)

# x = occ_download(
#   pred_in("basisOfRecord", c("MACHINE_OBSERVATION", "HUMAN_OBSERVATION")),
#   pred_in("taxonKey", c(9206251, 3112648)),
#   pred_in("country", c("US", "MX")),
#   pred_and(pred_gte("year", 1989), pred_lte("year", 1991))
# )
# occ_download_meta(x)
# z <- occ_download_get(x)
# df <- occ_download_import(z)
# str(df)
# library(dplyr)
# unique(df$basisOfRecord)
# unique(df$taxonKey)
# unique(df$countryCode)
# sort(unique(df$year))

## End(Not run)

Check for downloads already in your GBIF account

Description

Check for downloads already in your GBIF account

Usage

occ_download_cached(
  ...,
  body = NULL,
  type = "and",
  format = "DWCA",
  user = NULL,
  pwd = NULL,
  email = NULL,
  refresh = FALSE,
  age = 30,
  curlopts = list()
)

Arguments

...

For occ_download() and occ_download_prep(), one or more objects of class occ_predicate or occ_predicate_list, created by ⁠pred*⁠ functions (see download_predicate_dsl). If you use this, don't use body parameter.

body

if you prefer to pass in the payload yourself, use this parameter. If you use this, don't pass anything to the dots. Accepts either an R list, or JSON. JSON is likely easier, since the JSON library jsonlite requires that you unbox strings that shouldn't be auto-converted to arrays, which is a bit tedious for large queries. optional

type

(character) One of equals (=), and (&), or (|), lessThan (<), lessThanOrEquals (<=), greaterThan (>), greaterThanOrEquals (>=), in, within, not (!), like, isNotNull

format

(character) The download format. One of 'DWCA' (default), 'SIMPLE_CSV', or 'SPECIES_LIST'

user

(character) User name within GBIF's website. Required. See "Authentication" below

pwd

(character) User password within GBIF's website. Required. See "Authentication" below

email

(character) Email address to receive download notice done email. Required. See "Authentication" below

refresh

(logical) refresh your list of downloads. on the first request of each R session we'll cache your stored GBIF occurrence downloads locally. you can refresh this list by setting refresh=TRUE; if you're in the same R session, and you've done many download requests, then refreshing may be a good idea if you're using this function

age

(integer) number of days after which you want a new download. default: 30

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
# these are examples from the package maintainer's account;
# outcomes will vary by user
occ_download_cached(pred_gte("elevation", 12000L))
occ_download_cached(pred("catalogNumber", 217880))
occ_download_cached(pred_gte("decimalLatitude", 65),
  pred_lte("decimalLatitude", -65), type="or")
occ_download_cached(pred_gte("elevation", 12000L))
occ_download_cached(pred_gte("elevation", 12000L), refresh = TRUE)

## End(Not run)

Cancel a download creation process.

Description

Cancel a download creation process.

Usage

occ_download_cancel(key, user = NULL, pwd = NULL, curlopts = list())

occ_download_cancel_staged(
  user = NULL,
  pwd = NULL,
  limit = 20,
  start = 0,
  curlopts = list()
)

Arguments

key

(character) A key generated from a request, like that from occ_download. Required.

user

(character) User name within GBIF's website. Required. See Details.

pwd

(character) User password within GBIF's website. Required. See Details.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

limit

Number of records to return. Default: 20

start

Record number to start at. Default: 0

Details

Note, these functions only cancel a job in progress. If your download is already prepared for you, this won't do anything to change that.

occ_download_cancel cancels a specific job by download key - returns success message

occ_download_cancel_staged cancels all jobs with status RUNNING or PREPARING - if none are found, returns a message saying so - if some found, they are cancelled, returning message saying so

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
# occ_download_cancel(key="0003984-140910143529206")
# occ_download_cancel_staged()

## End(Not run)

Lists the downloads activity of a dataset

Description

Lists the downloads activity of a dataset

Usage

occ_download_dataset_activity(
  dataset,
  limit = 20,
  start = 0,
  curlopts = list()
)

Arguments

dataset

(character) A dataset key

limit

(integer/numeric) Number of records to return. Default: 20, Max: 1000

start

(integer/numeric) Record number to start at. Default: 0

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

a list with two slots:

  • meta: a single row data.frame with columns: offset, limit, endofrecords, count

  • results: a tibble with the nested data flattened, with many columns with the same download. or download.request. prefixes

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
res <- occ_download_dataset_activity("7f2edc10-f762-11e1-a439-00145eb45e9a")
res
res$meta
res$meta$count

# pagination
occ_download_dataset_activity("7f2edc10-f762-11e1-a439-00145eb45e9a",
limit = 3000)
occ_download_dataset_activity("7f2edc10-f762-11e1-a439-00145eb45e9a",
limit = 3, start = 10)

## End(Not run)

List datasets for a download

Description

List datasets for a download

Usage

occ_download_datasets(key, limit = 20, start = 0, curlopts = list())

Arguments

key

A key generated from a request, like that from occ_download()

limit

(integer/numeric) Number of records to return. Default: 20, Max: 1000

start

(integer/numeric) Record number to start at. Default: 0

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

a list with two slots:

  • meta: a single row data.frame with columns: offset, limit, endofrecords, count

  • results: a tibble with the results, of three columns: downloadKey, datasetKey, numberRecords

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
occ_download_datasets(key="0003983-140910143529206")
occ_download_datasets(key="0003983-140910143529206", limit = 3)
occ_download_datasets(key="0003983-140910143529206", limit = 3, start = 10)

## End(Not run)

Describes the fields available in GBIF downloads

Description

Describes the fields available in GBIF downloads

Usage

occ_download_describe(x = "dwca")

Arguments

x

a character string (default: "dwca"). Accepted values: "simpleCsv", "simpleAvro", "simpleParquet","speciesList".

Details

The function returns a list with the fields available in GBIF downloads. It is considered experimental by GBIF, so the output might change in the future.

Value

a list.

Examples

## Not run: 
occ_download_describe("dwca")$verbatimFields
occ_download_describe("dwca")$verbatimExtensions
occ_download_describe("simpleCsv")$fields

## End(Not run)

Get a download from GBIF.

Description

Get a download from GBIF.

Usage

occ_download_get(key, path = ".", overwrite = FALSE, ...)

Arguments

key

A key generated from a request, like that from occ_download

path

Path to write zip file to. Default: ".", with a .zip appended to the end.

overwrite

Will only overwrite existing path if TRUE.

...

named curl options passed on to crul::verb-GET. see curl::curl_options() for curl options

Details

Downloads the zip file to a directory you specify on your machine. crul::HttpClient() is used internally to write the zip file to disk. See crul::writing-options. This function only downloads the file. See occ_download_import to open a downloaded file in your R session. The speed of this function is of course proportional to the size of the file to download. For example, a 58 MB file on my machine took about 26 seconds.

Note

see downloads for an overview of GBIF downloads methods

This function used to check for HTTP response content type, but it has changed enough that we no longer check it. If you run into issues with this function, open an issue in the GitHub repository.

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
occ_download_get("0000066-140928181241064")
occ_download_get("0003983-140910143529206", overwrite = TRUE)

## End(Not run)

Import a downloaded file from GBIF.

Description

Import a downloaded file from GBIF.

Usage

occ_download_import(
  x = NULL,
  key = NULL,
  path = ".",
  fill = FALSE,
  encoding = "UTF-8",
  ...
)

as.download(path = ".", key = NULL)

## S3 method for class 'character'
as.download(path = ".", key = NULL)

## S3 method for class 'download'
as.download(path = ".", key = NULL)

Arguments

x

The output of a call to occ_download_get

key

A key generated from a request, like that from occ_download

path

Path to unzip file to. Default: "." Writes to folder matching zip file name

fill

(logical) (default: FALSE). If TRUE then in case the rows have unequal length, blank fields are implicitly filled. passed on to fill parameter in data.table::fread.

encoding

(character) encoding to read in data; passed to data.table::fread(). default: "UTF-8". other allowed options: "Latin-1" and "unknown". see ?data.table::fread docs

...

parameters passed on to data.table::fread(). See fread docs for details. Some fread parameters that may be particular useful here are: select (select which columns to read in; others are dropped), nrows (only read in a certain number of rows)

Details

You can provide either x as input, or both key and path. We use data.table::fread() internally to read data.

Value

a tibble (data.frame)

Problems reading data

You may run into errors when using occ_download_import(); most often these are due to data.table::fread() not being able to parse the occurrence.txt file correctly. The fill parameter passes down to data.table::fread() and the ... allows you to pass on any other parameters that data.table::fread() accepts. Read the docs for fread for help.

countryCode result column and Namibia

The country code for Namibia is "NA". Unfortunately in R an "NA" string will be read in to R as an NA/missing. To avoid this, in this function we read in the data, then convert an NA/missing values to the character string "NA". When a country code is truly missing it will be an empty string.

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
# First, kick off at least 1 download, then wait for the job to be complete
# Then use your download keys
res <- occ_download_get(key="0000066-140928181241064", overwrite=TRUE)
occ_download_import(res)

occ_download_get(key="0000066-140928181241064", overwrite = TRUE) %>%
  occ_download_import

# coerce a file path to the right class to feed to occ_download_import
# as.download("0000066-140928181241064.zip")
# as.download(key = "0000066-140928181241064")
# occ_download_import(as.download("0000066-140928181241064.zip"))

# download a dump that has a CSV file
# res <- occ_download_get(key = "0001369-160509122628363", overwrite=TRUE)
# occ_download_import(res)
# occ_download_import(key = "0001369-160509122628363")

# download and import a species list (in csv format)
# x <- occ_download_get("0000172-190415153152247")
# occ_download_import(x)

## End(Not run)

Lists the downloads created by a user.

Description

Lists the downloads created by a user.

Usage

occ_download_list(
  user = NULL,
  pwd = NULL,
  limit = 20,
  start = 0,
  curlopts = list()
)

Arguments

user

(character) User name within GBIF's website. Required. See Details.

pwd

(character) User password within GBIF's website. Required. See Details.

limit

(integer/numeric) Number of records to return. Default: 20, Max: 1000

start

(integer/numeric) Record number to start at. Default: 0

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

a list with two slots:

  • meta: a single row data.frame with columns: offset, limit, endofrecords, count

  • results: a tibble with the nested data flattened, with many columns with the same request. prefix

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_meta(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
occ_download_list(user="sckott")
occ_download_list(user="sckott", limit = 5)
occ_download_list(user="sckott", start = 21)

## End(Not run)

Retrieves the occurrence download metadata by its unique key.

Description

Retrieves the occurrence download metadata by its unique key.

Usage

occ_download_meta(key, curlopts = list())

Arguments

key

A key generated from a request, like that from occ_download

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

an object of class occ_download_meta, a list with slots for the download key, the DOI assigned to the download, license link, the request details you sent in the occ_download() request, and metadata about the size and date/time of the request

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_queue(), occ_download_wait(), occ_download()

Examples

## Not run: 
occ_download_meta(key="0003983-140910143529206")
occ_download_meta("0000066-140928181241064")

## End(Not run)

Download requests in a queue

Description

Download requests in a queue

Usage

occ_download_queue(..., .list = list(), status_ping = 10)

Arguments

...

any number of occ_download() requests

.list

any number of occ_download_prep() requests

status_ping

(integer) seconds between pings checking status of the download request. generally larger numbers for larger requests. default: 10 (i.e., 10 seconds). must be 10 or greater

Details

This function is a convenience wrapper around occ_download(), allowing the user to kick off any number of requests, while abiding by GBIF rules of 3 concurrent requests per user.

Value

a list of occ_download class objects, see occ_download_get() to fetch data

How it works

It works by using lazy evaluation to collect your requests into a queue (but does not use lazy evaluation if use the .list parameter). Then it kicks of the first 3 requests. Then in a while loop, we check status of those requests, and when any request finishes (see ⁠When is a job done?⁠ below), we kick off the next, and so on. So in theory, there may not always strictly be 3 running concurrently, but the function will usually provide for 3 running concurrently.

When is a job done?

We mark a job as done by checking the ⁠/occurrence/download/⁠ API route with our occ_download_meta() function. If the status of the job is any of "succeeded", "killed", or "cancelled", then we mark the job as done and move on to other jobs in the queue.

Beware

This function is still in development. There's a lot of complexity to this problem. We'll be rolling out fixes and improvements in future versions of the package, so expect to have to adjust your code with new versions.

Note

see downloads for an overview of GBIF downloads methods

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_wait(), occ_download()

Examples

## Not run: 
if (interactive()) { # dont run in automated example runs, too costly
# passing occ_download() requests via ...
out <- occ_download_queue(
  occ_download(pred('taxonKey', 3119195), pred("year", 1976)),
  occ_download(pred('taxonKey', 3119195), pred("year", 2001)),
  occ_download(pred('taxonKey', 3119195), pred("year", 2001),
    pred_lte("month", 8)),
  occ_download(pred('taxonKey', 5229208), pred("year", 2011)),
  occ_download(pred('taxonKey', 2480946), pred("year", 2015)),
  occ_download(pred("country", "NZ"), pred("year", 1999),
    pred("month", 3)),
  occ_download(pred("catalogNumber", "Bird.27847588"),
    pred("year", 1998), pred("month", 2))
)

# supports <= 3 requests too
out <- occ_download_queue(
  occ_download(pred("country", "NZ"), pred("year", 1999), pred("month", 3)),
  occ_download(pred("catalogNumber", "Bird.27847588"), pred("year", 1998),
    pred("month", 2))
)

# using pre-prepared requests via .list
keys <- c(7905507, 5384395, 8911082)
queries <- list()
for (i in seq_along(keys)) {
  queries[[i]] <- occ_download_prep(
    pred("taxonKey", keys[i]),
    pred_in("basisOfRecord", c("HUMAN_OBSERVATION","OBSERVATION")),
    pred("hasCoordinate", TRUE),
    pred("hasGeospatialIssue", FALSE),
    pred("year", 1993)
  )
}
out <- occ_download_queue(.list = queries)
out

# another pre-prepared example
yrs <- 1930:1934
queries <- list()
for (i in seq_along(yrs)) {
  queries[[i]] <- occ_download_prep(
    pred("taxonKey", 2877951),
    pred_in("basisOfRecord", c("HUMAN_OBSERVATION","OBSERVATION")),
    pred("hasCoordinate", TRUE),
    pred("hasGeospatialIssue", FALSE),
    pred("year", yrs[i])
  )
}
out <- occ_download_queue(.list = queries)
out
}
## End(Not run)

Download occurrence data using a SQL query

Description

Download occurrence data using a SQL query

Usage

occ_download_sql(
  q = NULL,
  format = "SQL_TSV_ZIP",
  user = NULL,
  pwd = NULL,
  email = NULL,
  validate = TRUE,
  curlopts = list()
)

occ_download_sql_validate(q = NULL, user = NULL, pwd = NULL)

occ_download_sql_prep(
  q = NULL,
  format = "SQL_TSV_ZIP",
  user = NULL,
  pwd = NULL,
  email = NULL,
  validate = TRUE,
  curlopts = list()
)

Arguments

q

sql query

format

only "SQL_TSV_ZIP" is supported right now

user

your GBIF user name

pwd

your GBIF password

email

your email address

validate

should the query be validated before submission. Default is TRUE.

curlopts

list of curl options

Details

This is an experimental feature, and the implementation may change throughout 2024. The feature is currently only available for preview by invited users. Contact [email protected] to request access.

Please see the article here for more information: https://docs.ropensci.org/rgbif/articles/gbif_sql_downloads.html

Value

an object of class 'occ_download_sql'

References

https://techdocs.gbif.org/en/data-use/api-sql-downloads

Examples

## Not run: 
occ_download_sql("SELECT gbifid,countryCode FROM occurrence 
                  WHERE genusKey = 2435098")

## End(Not run)

Wait for an occurrence download to be done

Description

Wait for an occurrence download to be done

Usage

occ_download_wait(
  x,
  status_ping = 5,
  curlopts = list(http_version = 2),
  quiet = FALSE
)

Arguments

x

and object of class occ_download or downloadkey

status_ping

(integer) seconds between each occ_download_meta() request. default is 5, and cannot be < 3

curlopts

(list) curl options, as named list, passed on to occ_download_meta()

quiet

(logical) suppress messages. default: FALSE

Value

an object of class occ_download_meta, see occ_download_meta() for details

Note

occ_download_queue() is similar, but handles many requests at once; occ_download_wait handles one request at a time

See Also

Other downloads: download_predicate_dsl, occ_download_cached(), occ_download_cancel(), occ_download_dataset_activity(), occ_download_datasets(), occ_download_get(), occ_download_import(), occ_download_list(), occ_download_meta(), occ_download_queue(), occ_download()

Examples

## Not run: 
x <- occ_download(
  pred("taxonKey", 9206251),
  pred_in("country", c("US", "MX")),
  pred_gte("year", 1971)
)
res <- occ_download_wait(x)
occ_download_meta(x)

# works also with a downloadkey
occ_download_wait("0000066-140928181241064") 


## End(Not run)

Facet GBIF occurrences

Description

Facet GBIF occurrences

Usage

occ_facet(facet, facetMincount = NULL, curlopts = list(), ...)

Arguments

facet

(character) a character vector of length 1 or greater. Required.

facetMincount

(numeric) minimum number of records to be included in the faceting results

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

...

Facet parameters, such as for paging based on each facet variable, e.g., country.facetLimit

Details

All fields can be faceted on except for last "lastInterpreted", "eventDate", and "geometry"

If a faceted variable is not found, it is silently dropped, returning nothing for that query

Value

A list of tibbles (data.frame's) for each facet (each element of the facet parameter).

See Also

occ_search() also has faceting ability, but can include occurrence data in addition to facets.

Examples

## Not run: 
occ_facet(facet = "country")

# facetMincount - minimum number of records to be included
#   in the faceting results
occ_facet(facet = "country", facetMincount = 30000000L)
occ_facet(facet = c("country", "basisOfRecord"))

# paging with many facets
occ_facet(
  facet = c("country", "basisOfRecord", "hasCoordinate"),
  country.facetLimit = 3,
  basisOfRecord.facetLimit = 6
)

# paging
## limit
occ_facet(facet = "country", country.facetLimit = 3)
## offset
occ_facet(facet = "country", country.facetLimit = 3,
  country.facetOffset = 3)

# Pass on curl options
occ_facet(facet = "country", country.facetLimit = 3,
  curlopts = list(verbose = TRUE))

## End(Not run)

Get data for GBIF occurrences by occurrence key

Description

Get data for GBIF occurrences by occurrence key

Usage

occ_get(
  key,
  fields = "minimal",
  curlopts = list(),
  return = NULL,
  verbatim = NULL
)

occ_get_verbatim(key, fields = "minimal", curlopts = list())

Arguments

key

(numeric/integer) one or more occurrence keys. required

fields

(character) Default ("minimal") will return just taxon name, key, latitude, and longitude. 'all' returns all fields. Or specify each field you want returned by name, e.g. fields = c('name', 'decimalLatitude','altitude').

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

return

Defunct. All components are returned now; index to the one(s) you want

verbatim

Defunct. verbatim records can now be retrieved using occ_get_verbatim()

Value

For occ_get a list of lists. For occ_get_verbatim a data.frame

References

https://www.gbif.org/developer/occurrence#occurrence

Examples

## Not run: 
occ_get(key=855998194)

# many occurrences
occ_get(key=c(101010, 240713150, 855998194))

# Verbatim data
occ_get_verbatim(key=855998194)
occ_get_verbatim(key=855998194, fields='all')
occ_get_verbatim(key=855998194,
 fields=c('scientificName', 'lastCrawled', 'county'))
occ_get_verbatim(key=c(855998194, 620594291))
occ_get_verbatim(key=c(855998194, 620594291), fields='all')
occ_get_verbatim(key=c(855998194, 620594291),
   fields=c('scientificName', 'decimalLatitude', 'basisOfRecord'))

# curl options, pass in a named list
occ_get(key=855998194, curlopts = list(verbose=TRUE))

## End(Not run)

Parse and examine further GBIF occurrence issues on a dataset.

Description

Parse and examine further GBIF occurrence issues on a dataset.

Usage

occ_issues(.data, ..., mutate = NULL)

Arguments

.data

Output from a call to occ_search(), occ_data(), or occ_download_import(). The data from occ_download_import is just a regular data.frame so you can pass in a data.frame to this function, but if it doesn't have certain columns it will fail.

...

Named parameters to only get back (e.g. cdround), or to remove (e.g. -cdround).

mutate

(character) One of:

  • split Split issues into new columns.

  • expand Expand issue abbreviated codes into descriptive names. for downloads datasets, this is not super useful since the issues come to you as expanded already.

  • split_expand Split into new columns, and expand issue names.

For split and split_expand, values in cells become y ("yes") or n ("no")

Details

See also the vignette Cleaning data using GBIF issues

Note that you can also query based on issues, e.g., occ_search(taxonKey=1, issue='DEPTH_UNLIKELY'). However, I imagine it's more likely that you want to search for occurrences based on a taxonomic name, or geographic area, not based on issues, so it makes sense to pull data down, then clean as needed using this function.

This function only affects the data element in the gbif class that is returned from a call to occ_search(). Maybe in a future version we will remove the associated records from the hierarchy and media elements as they are removed from the data element.

You'll notice that we sort columns to make it easier to glimpse the important parts of your data, namely taxonomic name, taxon key, latitude and longitude, and the issues. The columns are unchanged otherwise.

References

https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/OccurrenceIssue.html

Examples

## Not run: 
# what do issues mean, can print whole table
head(gbif_issues())
# or just occurrence related issues
gbif_issues()[which(gbif_issues()$type %in% c("occurrence")),]
# or search for matches
iss <- c('cdround','cudc','gass84','txmathi')
gbif_issues()[ gbif_issues()$code %in% iss, ]

# compare out data to after occ_issues use
(out <- occ_search(limit=100))
out %>% occ_issues(cdround)

# occ_data
(out <- occ_data(limit=100))
out %>% occ_issues(cdround)

# Parsing output by issue
(res <- occ_data(
  geometry='POLYGON((30.1 10.1,40 40,20 40,10 20,30.1 10.1))',
  limit = 600))

## or parse issues in various ways
### include only rows with cdround issue
gg <- res %>% occ_issues(cdround)
NROW(res$data)
NROW(gg$data)
head(res$data)[,c(1:5)]
head(gg$data)[,c(1:5)]

### remove data rows with certain issue classes
res %>% occ_issues(-cdround, -cudc)

### split issues into separate columns
res %>% occ_issues(mutate = "split")
res %>% occ_issues(-cudc, -mdatunl, mutate = "split")
res %>% occ_issues(gass84, mutate = "split")

### expand issues to more descriptive names
res %>% occ_issues(mutate = "expand")

### split and expand
res %>% occ_issues(mutate = "split_expand")

### split, expand, and remove an issue class
res %>% occ_issues(-cdround, mutate = "split_expand")

## Or you can use occ_issues without %>%
occ_issues(res, -cdround, mutate = "split_expand")

# from GBIF downloaded data via occ_download_* functions
res <- occ_download_get(key="0000066-140928181241064", overwrite=TRUE)
x <- occ_download_import(res)
occ_issues(x, -txmathi)
occ_issues(x, txmathi)
occ_issues(x, gass84)
occ_issues(x, zerocd)
occ_issues(x, gass84, txmathi)
occ_issues(x, mutate = "split")
occ_issues(x, -gass84, mutate = "split")
occ_issues(x, mutate = "expand")
occ_issues(x, mutate = "split_expand")

# occ_search/occ_data with many inputs - give slightly different output
# format than normal 2482598, 2498387
xyz <- occ_data(taxonKey = c(9362842, 2492483, 2435099), limit = 300)
xyz
length(xyz) # length 3
names(xyz) # matches taxonKey values passed in
occ_issues(xyz, -gass84)
occ_issues(xyz, -cdround)
occ_issues(xyz, -cdround, -gass84)

## End(Not run)

Search for catalog numbers, collection codes, collector names, and institution codes.

Description

Search for catalog numbers, collection codes, collector names, and institution codes.

Usage

occ_metadata(
  type = "catalogNumber",
  q = NULL,
  limit = 5,
  pretty = TRUE,
  curlopts = list()
)

Arguments

type

Type of data, one of catalogNumber, collectionCode, recordedBy, or institutionCode. Unique partial strings work too, like 'cat' for catalogNumber

q

Search term

limit

Number of results, default=5

pretty

Pretty as true (Default) uses cat to print data, FALSE gives character strings.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

References

https://www.gbif.org/developer/occurrence#search

Examples

## Not run: 
# catalog number
occ_metadata(type = "catalogNumber", q=122)

# collection code
occ_metadata(type = "collectionCode", q=12)

# institution code
occ_metadata(type = "institutionCode", q='GB')

# recorded by
occ_metadata(type = "recordedBy", q='scott')

# data as character strings
occ_metadata(type = "catalogNumber", q=122, pretty=FALSE)

# Change number of results returned
occ_metadata(type = "catalogNumber", q=122, limit=10)

# Partial unique type strings work too
occ_metadata(type = "cat", q=122)

# Pass on curl options
occ_metadata(type = "cat", q=122, curlopts = list(verbose = TRUE))

## End(Not run)

Organizations metadata.

Description

Organizations metadata.

Usage

organizations(
  data = "all",
  country = NULL,
  uuid = NULL,
  query = NULL,
  limit = 100,
  start = NULL,
  curlopts = list()
)

Arguments

data

(character) The type of data to get. One or more of: 'organization', 'contact', 'endpoint', 'identifier', 'tag', 'machineTag', 'comment', 'hostedDataset', 'ownedDataset', 'deleted', 'pending', 'nonPublishing', or the special 'all'. Default: 'all'

country

(character) Filters by country.

uuid

(character) UUID of the data node provider. This must be specified if data is anything other than 'all', 'deleted', 'pending', or 'nonPublishing'.

query

(character) Query nodes. Only used when data='all'

limit

Number of records to return. Default: 100. Maximum: 1000.

start

Record number to start at. Default: 0. Use in combination with limit to page through results.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

A list of length of two, consisting of a data.frame meta when uuid is NULL, and data which can either be a list or a data.frame depending on the requested type of data.

References

https://www.gbif.org/developer/registry#organizations

Examples

## Not run: 
organizations(limit=5)
organizations(query="france", limit=5)
organizations(country = "SPAIN")
organizations(uuid="4b4b2111-ee51-45f5-bf5e-f535f4a1c9dc")
organizations(data='contact', uuid="4b4b2111-ee51-45f5-bf5e-f535f4a1c9dc")
organizations(data='pending')
organizations(data=c('contact','endpoint'),
  uuid="4b4b2111-ee51-45f5-bf5e-f535f4a1c9dc")

# Pass on curl options
organizations(query="spain", curlopts = list(verbose=TRUE))

## End(Not run)

Parse taxon names using the GBIF name parser.

Description

Parse taxon names using the GBIF name parser.

Usage

parsenames(scientificname, curlopts = list())

Arguments

scientificname

A character vector of scientific names.

curlopts

list of named curl options passed on to HttpClient. see curl::curl_options for curl options

Value

A data.frame containing fields extracted from parsed taxon names. Fields returned are the union of fields extracted from all species names in scientificname.

Author(s)

John Baumgartner ([email protected])

References

https://www.gbif.org/developer/species#parser

Examples

## Not run: 
parsenames(scientificname='x Agropogon littoralis')
parsenames(c('Arrhenatherum elatius var. elatius',
             'Secale cereale subsp. cereale', 'Secale cereale ssp. cereale',
             'Vanessa atalanta (Linnaeus, 1758)'))
parsenames("Ajuga pyramidata")
parsenames("Ajuga pyramidata x reptans")

# Pass on curl options
# res <- parsenames(c('Arrhenatherum elatius var. elatius',
#          'Secale cereale subsp. cereale', 'Secale cereale ssp. cereale',
#          'Vanessa atalanta (Linnaeus, 1758)'), curlopts=list(verbose=TRUE))

## End(Not run)

Look up 2 character ISO country codes

Description

Look up 2 character ISO country codes

Usage

rgb_country_codes(country_name, fuzzy = FALSE, ...)

Arguments

country_name

Name of country to look up

fuzzy

If TRUE, uses agrep to do fuzzy search on names.

...

Further arguments passed on to agrep or grep

Examples

## Not run: 
rgb_country_codes(country_name="United")

## End(Not run)

Defunct functions in rgbif

Description

Details

The above functions have been removed. See https://github.com/ropensci/rgbif and poke around the code if you want to find the old functions in previous versions of the package


Get the possible values to be used for (taxonomic) rank arguments in GBIF API methods.

Description

Get the possible values to be used for (taxonomic) rank arguments in GBIF API methods.

Usage

taxrank()

Examples

## Not run: 
taxrank()

## End(Not run)

parse wkt into smaller bits

Description

parse wkt into smaller bits

Usage

wkt_parse(wkt, geom_big = "bbox", geom_size = 40, geom_n = 10)

Arguments

wkt

(character) A WKT string. Required.

geom_big

(character) Only "bbox" works since rgbif 3.8.0.

geom_size

(integer) An integer indicating size of the cell. Default: 40.

geom_n

(integer) An integer indicating number of cells in each dimension. Default: 10.

Examples

wkt <- "POLYGON((13.26349675655365 52.53991761181831,18.36115300655365 54.11445544219924,
21.87677800655365 53.80418956368524,24.68927800655365 54.217364774722455,28.20490300655365
54.320018299365124,30.49005925655365 52.85948216284084,34.70880925655365 52.753220564427814,
35.93927800655365 50.46131871049754,39.63068425655365 49.55761261299145,40.86115300655365
46.381388009130845,34.00568425655365 45.279102926537,33.30255925655365 48.636868465271846,
30.13849675655365 49.78513301801265,28.38068425655365 47.2236377039631,29.78693425655365
44.6572866068524,27.67755925655365 42.62220075124676,23.10724675655365 43.77542058000212,
24.51349675655365 47.10412345120368,26.79865300655365 49.55761261299145,23.98615300655365
52.00209943876426,23.63459050655365 49.44345313705238,19.41584050655365 47.580567827212114,
19.59162175655365 44.90682206053508,20.11896550655365 42.36297154876359,22.93146550655365
40.651849782081555,25.56818425655365 39.98171166226459,29.61115300655365 40.78507856230178,
32.95099675655365 40.38459278067577,32.95099675655365 37.37491910393631,26.27130925655365
33.65619609886799,22.05255925655365 36.814081996401605,18.71271550655365 36.1072176729021,
18.53693425655365 39.16878677351903,15.37287175655365 38.346355762190846,15.19709050655365
41.578843777436326,12.56037175655365 41.050735748143424,12.56037175655365 44.02872991212046,
15.19709050655365 45.52594200494078,16.42755925655365 48.05271546733352,17.48224675655365
48.86865641518059,10.62677800655365 47.817178329053135,9.57209050655365 44.154980365192,
8.16584050655365 40.51835445724746,6.05646550655365 36.53210972067291,0.9588092565536499
31.583640057148145,-5.54509699344635 35.68001485298146,-6.77556574344635 40.51835445724746,
-9.41228449344635 38.346355762190846,-12.40056574344635 35.10683619158607,-15.74040949344635
38.07010978950028,-14.68572199344635 41.31532459432774,-11.69744074344635 43.64836179231387,
-8.88494074344635 42.88035509418534,-4.31462824344635 43.52103366008421,-8.35759699344635
47.2236377039631,-8.18181574344635 50.12441989397795,-5.01775324344635 49.55761261299145,
-2.73259699344635 46.25998980446569,-1.67790949344635 44.154980365192,-1.32634699344635
39.30493590580802,2.18927800655365 41.44721797271696,4.47443425655365 43.26556960420879,
2.18927800655365 46.7439668697322,1.83771550655365 50.3492841273576,6.93537175655365
49.671505849335254,5.00177800655365 52.32557322466785,7.81427800655365 51.67627099802223,
7.81427800655365 54.5245591562317,10.97834050655365 51.89375191441792,10.97834050655365
55.43241335888528,13.26349675655365 52.53991761181831))"
wkt <- gsub("\n", " ", wkt)

if (requireNamespace("sf", quietly=TRUE)) {
# to a bounding box in wkt format
wkt_parse(wkt, geom_big = "bbox")

}