Symbiota is an open-source content management system built for the purpose of integrating virtual biodiversity databases. Currently used by over 700 natural history collections, containing more than 30 million specimens, Symbiota is an essential tool for digitizing biological specimen data. In an effort to expand modularity and accessibility, Symbiota2 is an improved, refactored version of the original Symbiota core code structure, designed based on user feedback. While packages do exist for accessing Symbiota portals (for instance, see the rSymbiota package), R users currently cannot access the data offered by Symbiota2. Here, we describe SymbiotaR2, a package built to address this need by allowing users to access Symbiota2 portals in an R environment.
Below, we provide a general workflow for using SymbiotaR2, a description of the command structure, code for installing the package, and examples of using SymbiotaR2 functions. The GitHub page for the Symbiota2 software can be found here, and instructions for setting up a new Symbiota2 portal can be found on the documentation site. Finally, a review of the original Symbiota platform is offered in Gries et al., 2014.
SymbiotaR2 allows R users to download data from specified Symbiota2 portals, granting access to thousands of digitized flora and fauna specimen records across the United States. It does this by querying endpoints in the Symbiota2 API, then downloading a JSON object containing the requested data to a temporary directory on the local computer. The JSON object is then converted into an R format that is straightforward and easy to use. The general argument structure of all SymbiotaR2 functions allows users to specify whether they want to pull a single resource or a collection of resources from the API.
In general, there are four steps for using SymbiotaR2:
Coordinates
for
co-ordinate data).
library(help=SymbiotaR2)
url
argument (see Example and Portal Specification below).SymbiotaR2 can be downloaded by calling:
library(devtools)
install_github("pearselab/SymbiotaR2")
Once it has passed peer review, you will be able to install it by running:
install.packages("SymbiotaR2")
Load the package using:
library(SymbiotaR2)
SymbiotaR2_setup
will save to your
.Rprofile
a default URL, for automatic reference.
Specifying a different url
argument will let you refer to a
portal besides the default. The code below demonstrates this:
SymbiotaR2_setup("http://imaginary-symbiota-portal.com/api", append=TRUE)
Coordinates() # Download from http://imaginary-symbiota-portal.com/api
Coordinates("http://another-imaginary-portal.com/api") # Download from a different portal
SymbiotaR2 consists of commands pulling from the Checklists, Collections, Crowdsource, Exsiccati, Glossary, ImageProcessor, Key, Media, Occurrence, Reference, Taxa, Traits, and UserRoles API families of the specified Symbiota2 portal. Note that because each Symbiota2 portal owner can load their own plugins into the API, it’s possible that not every possible API endpoint from the specified Symbiota2 instance will be covered.
Below, we provide an example of pulling a single Taxa
resource into the R environment, by specifying an id
argument in the command call (using a random, nonexistent URL). Please
note that this example won’t work for users (as they need to specify a
working Symbiota2 portal they can access), but is included to
demonstrate typical usage:
myURL <- "http://imaginary-symbiota2-portal.com/api"
myTaxa <- Taxa(id = 12, url = myURL)
str(myTaxa)
List of 23
$ @context : chr "/api/contexts/Taxa"
$ @id : chr "/api/taxa/12"
$ @type : chr "Taxa"
$ id : num 12
$ rankId : chr "/api/taxa/ranks/31"
$ scientificName : chr "Polygonum bistortoides"
$ unitIndicator1 : logi NA
$ unitName1 : chr "Polygonum"
$ unitIndicator2 : logi NA
$ unitName2 : chr "bistortoides"
$ unitIndicator3 : logi NA
$ unitName3 : logi NA
$ author : chr "Pursh"
$ phylogenySortSequence: logi NA
$ status : chr "AZTT-USDA Plants consistant"
$ source : logi NA
$ notes : logi NA
$ hybrid : logi NA
$ securityStatus : num 0
$ modifiedTimestamp : logi NA
$ initialTimestamp : chr "2019-01-11T21:44:39+00:00"
$ modifiedUserId : logi NA
$ taxaAuthorityId : list()
If a collection of resources from the Symbiota2 API needs to come
into the R environment, then the page
argument can be
specified in place of id
to retrieve a list of resources
(here, as a data.frame
):
myURL <- "http://imaginary-symbiota2-portal.com/api"
myCoordinates <- Coordinates(page = 1, url = myURL)
str(my.Coordinates)
'data.frame': 5 obs. of 2 variables:
$ latitude : num 32.2 32.2 32.2 32.2 32.2
$ longitude: num -111 -111 -111 -111 -111
If neither an id
or a page
argument is
provided, the functions are written to return the list of resources at
page = 1
. Once downloaded, these R objects can be taken and
manipulated as needed for any downstream processes.
The code for SymbiotaR2 is structured hierarchically, and includes
parameter type checking to ensure arguments are provided in the proper
format. Additionally, all commands include a URL check
(.check.url
), which confirms the following:
The second step consists of an API call made at the end of the URL check. If either step fails, the error below will be triggered:
Error in .check.url(badURL) :
URL http://incorrect-portal-address.com/api cannot be reached; is it a valid Symbiota2 portal API?
If this error is received, make sure your portal address is spelled
correctly. Note that functions are designed such that a forward slash
(/
) at the end of the URL is optional. If your URL is
correctly spelled, make sure that the Symbiota2 portal manager has
allowed you access to the portal.
All SymbiotaR2 functions come with tests, for both pulling a single
SymbiotaR2 resource (using the id
argument), or a
collection of resources (using page
). Tests for each
function are contained in the tests/testthat
directory.
Running these tests requires you have access to a fully configured
SymbiotaR2 test instance, complete with demo data, which is both
time-consuming to setup and then time/bandwidth-consuming to run the
tests. We therefore release cached data downloads, generated using
vcr
, for use with this package. Information about the
vcr
package can be found on the vcr
page on
GitHub.
To run the package tests, do the following:
R CMD build SymbiotaR2
from the command line.R CMD check SymbbiotaR2_0.0-1
from the command
line.If you want to add new tests, or new functions that address new API endpoints, do the following:
tests/testthat
for your tests. Otherwise, add to one of the
existing files.vcr
cassette. Note that the folder fixtures
contains
the cassettes, and that SymbiotaR2
makes use of the file
tests/testthat/helper-SymbiotaR2.R
to setup the automatic
tests. See point 4 below.url
variable
at the top of the script to be wherever your test instance is. When
committing your code to submit a pull request (see point 5), change it
to the address at the top of the other tests (currently
http://a02235015-6.bluezone.usu.edu/api/
).master
branch of this repository. Please use the pull
request template and follow the contributor guidelines.Here is an example of what a piece of testing code may look like:
context("AccessStats")
vcr::use_cassette(name = "AccessStats_id", {
data <- AccessStats(id = 4, url = url)
})
test_that("AccessStats_id", {
expect_equal(length(data), 12)
expect_type(data, "list")
})
The data <- AccessStates(url = url, id = 4)
line is
the Symbiota2 call, and the test_that
block below it
contains the test conditions–here, that the data
object is
a list
of length 12.