| Title: | Interface to 'the CAVD DataSpace' |
|---|---|
| Description: | Provides a convenient API interface to access immunological data within 'the CAVD DataSpace'(<https://dataspace.cavd.org>), a data sharing and discovery tool that facilitates exploration of HIV immunological data from pre-clinical and clinical HIV vaccine studies. |
| Authors: | Ju Yeong Kim [aut], Sean Hughes [rev], Jason Taylor [aut, cre], Helen Miller [aut], Kellie MacPhee [rev], CAVD DataSpace [cph] |
| Maintainer: | Jason Taylor <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-07-01 08:22:58 UTC |
| Source: | https://github.com/ropensci/DataSpaceR |
DataSpaceR provides a convenient API for accessing datasets within the DataSpace database.
Uses the Rlabkey package to connect to DataSpace. Implements convenient methods for accessing datasets.
Ju Yeong Kim
Check that there is a netrc file with a valid entry for the CAVD DataSpace.
checkNetrc(netrcFile = getNetrcPath(), onStaging = FALSE, verbose = TRUE)checkNetrc(netrcFile = getNetrcPath(), onStaging = FALSE, verbose = TRUE)
netrcFile |
A character. File path to netrc file to check. |
onStaging |
A logical. Whether to check the staging server instead of the production server. |
verbose |
A logical. Whether to print the extra details for troubleshooting. |
The name of the netrc file
## Not run: checkNetrc() ## End(Not run)## Not run: checkNetrc() ## End(Not run)
Constructor for DataSpaceConnection
connectDS(login = NULL, password = NULL, verbose = FALSE, onStaging = FALSE)connectDS(login = NULL, password = NULL, verbose = FALSE, onStaging = FALSE)
login |
A character. Optional argument. If there is no netrc file a temporary one can be written by passing login and password of an active DataSpace account. |
password |
A character. Optional. The password for the selected login. |
verbose |
A logical. Whether to print the extra details for troubleshooting. |
onStaging |
A logical. Whether to connect to the staging server instead of the production server. |
Instantiates an DataSpaceConnection.
The constructor will try to take the values of the various labkey.*
parameters from the global environment. If they don't exist, it will use
default values. These are assigned to 'options', which are then used by the
DataSpaceConnection class.
an instance of DataSpaceConnection
## Not run: con <- connectDS() ## End(Not run)## Not run: con <- connectDS() ## End(Not run)
An R6 class for DataSpace browsing and fetching data in DataSpace.
configA list. Stores configuration of the connection object such as URL, path and username.
availableStudiesA data.tabl of available studies.
availableGroupsA data.table of available groups.
availableMabsA data.table of available mAbs.
availableMabMixturesA data.table. Metadata of available mAb mixtures.
availableDonorsA data.table. Metadata about all mAb donors in the DataSpace.
availableVirusesA data.table of metadata about all virsues in the DataSpace and virus name synonyms.
availablePublicationsA data.table of available publications metadata and available datasets.
virusNameMappingTablesA list of data.tables containing virus name mappings.
mabGridSummaryDefunct. Use 'availableMabs'.
mabGridDefunct. Use 'availableMabs'.
virusMetadataDefunct. Use 'virusNameMappingTables'.
new()
Initialize a DataSpaceConnection object.
See connectDS.
DataSpaceConnection$new( login = NULL, password = NULL, verbose = FALSE, onStaging = FALSE )
loginA character. Optional argument. If there is no netrc file a temporary one can be written by passing login and password of an active DataSpace account.
passwordA character. Optional. The password for the selected login.
verboseA logical. Whether to print the extra details for troubleshooting.
onStagingA logical. Whether to connect to the staging server instead of the production server.
A new 'DataSpaceConnection' object.
print()
Print the DataSpaceConnection object.
DataSpaceConnection$print()
getStudies()
Create a 'DataSpaceStudies' object.
DataSpaceConnection$getStudies(availableStudies = self$availableStudies)
availableStudiesan 'availableStudies' object, or a vector of 'study_id' values.
getGroups()
Create a 'DataSpaceGroups' object.
DataSpaceConnection$getGroups(availableGroups = self$availableGroups)
availableGroupsan 'availableGroups' object, or a vector of 'group id' values.
getMabs()
Create a 'DataSpaceMabs' object.
DataSpaceConnection$getMabs( availableMabs = self$availableMabs, includeMixtures = "yes" )
availableMabsan 'availableMabs' or 'availableMabMixtures' object, or a vector of 'mab id' values. 'mab_id' values are inferred from 'availableMabMixtures' objects.
includeMixturesWhether or not to include mab mixtures. "yes", "no", or "only" are valid. The default, "yes", will return any available mAb mixtures for any mAb passed here.
getDonors()
Create a 'DataSpaceDonors' object.
DataSpaceConnection$getDonors(availableDonors = self$availableDonors)
availableDonorsan 'availableDonors' object, or a vector of 'donor_id' values.
getDaash()
Create a 'DataSpaceDaash' object.
DataSpaceConnection$getDaash(availableDaash = NULL)
availableDaashan 'availableMabs', or 'availableDonors' object, or a vector of 'sequnce_id' values.
downloadPublicationData()
Download study related publication datasets.
DataSpaceConnection$downloadPublicationData( availablePublications = NULL, downloadDir = tempdir() )
availablePublicationsan 'availablePublications' object or a vector of 'publication_id' values.
downloadDirA character. Optional, specifies directory to download nonstandard datasets. Default is use to the R session temp directory
getStudy()
Defunct. Use 'getStudies'.
DataSpaceConnection$getStudy()
getGroup()
Defunct. Use 'getGroups'.
DataSpaceConnection$getGroup()
getMab()
Defunct. Use 'getMabs'.
DataSpaceConnection$getMab()
filterMabGrid()
Defunct. Use 'availableMabs'.
DataSpaceConnection$filterMabGrid()
resetMabGrid()
Defunct. Use 'availableMabs'.
DataSpaceConnection$resetMabGrid()
refresh()
Refresh the connection object to update available studies and groups.
DataSpaceConnection$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceConnection$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # View available data con$availableStudies con$availableGroups con$availablePublications con$availableMabs con$availableMabMixtures con$availableDonors con$availableViruses # Pass an available object to a "get" method to get data cvd408 <- con$availableStudies[study_id == "cvd408"] |> con$getStudies() cd4Mabs <- con$availableMabs[grepl("CD4bs", mab_ab_binding_type)] |> con$getMabs() ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # View available data con$availableStudies con$availableGroups con$availablePublications con$availableMabs con$availableMabMixtures con$availableDonors con$availableViruses # Pass an available object to a "get" method to get data cvd408 <- con$availableStudies[study_id == "cvd408"] |> con$getStudies() cd4Mabs <- con$availableMabs[grepl("CD4bs", mab_ab_binding_type)] |> con$getMabs() ## End(Not run)
An R6 class for DataSpace DAASH data.
DataSpaceConnection$getDaash()
DataSpaceR::DataSpaceConnection -> DataSpaceDaash
mabMetadataA data.table of mAbs with metadata found in the object.
donorMetadataA data.table of donors with metadata found in the object.
daashMetadataA data.table showing the donor and mAb metadata with CDS sequence_id values for the loaded DAASH dataset.
availableStructuresA data.table showing the mAb structures available to download.
datasetsA list of DAASH datastets loaded to the DAASH object.
variableDefinitionsA data.table of variable definitions.
DataSpaceR::DataSpaceConnection$downloadPublicationData()DataSpaceR::DataSpaceConnection$filterMabGrid()DataSpaceR::DataSpaceConnection$getDaash()DataSpaceR::DataSpaceConnection$getDonors()DataSpaceR::DataSpaceConnection$getGroup()DataSpaceR::DataSpaceConnection$getGroups()DataSpaceR::DataSpaceConnection$getMab()DataSpaceR::DataSpaceConnection$getMabs()DataSpaceR::DataSpaceConnection$getStudies()DataSpaceR::DataSpaceConnection$getStudy()DataSpaceR::DataSpaceConnection$resetMabGrid()new()
Initialize DataSpaceMabMetadata object.
See DataSpaceConnection.
DataSpaceDaash$new(availableDaash)
availableDaashavailableDaash an 'availableMabs', or 'availableDonors' object, or a vector of 'sequnce_id' values.
configA list.
print()
Print the DataSpaceMab object summary.
DataSpaceDaash$print()
getFastaFromSequences()
Return a fasta file for available daash sequences that have been loaded to the current object.
DataSpaceDaash$getFastaFromSequences( sequenceType = "nt", originalHeaders = FALSE, path = NULL )
sequenceTypecharacter the type of fasta file to return: nt = nucleotide, aa = amino acid.
originalHeadersboolean if the original fasta headers should be provided
pathThe path where to save the fasta files to. If using the default value, NULL, then a fasta file is returned as a character vector.
downloadAntibodyStructures()
Saves all antibody structures associated with the daash object's 'availableStuctures' object.
DataSpaceDaash$downloadAntibodyStructures(path = tempdir(), mab_id = NULL)
pathThe directory to export fasta files to.
mab_idA subset of mab_ids to export. If using the default, NULL, all structures in availableStuctures are downloaded.
refresh()
Refresh the DataSpaceMabMetadata object to update datasets.
DataSpaceDaash$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceDaash$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get the daash object using either an availableMabs or # availableDonors object. daash <- con$availableMabs[mab_ab_binding_type %like% "CD4"] |> con$getDaash() # To get lineage sequences, query donors, then pipe available # donors to the connection getDaash object. daash <- con$availableDonors[ lineage_sequences_available == TRUE & mab_count < 10, ] |> con$getDaash() # Inspect what datasets are available names(daash$datasets) # Inspect the `topCalls` dataset daash$datasets$topCalls ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get the daash object using either an availableMabs or # availableDonors object. daash <- con$availableMabs[mab_ab_binding_type %like% "CD4"] |> con$getDaash() # To get lineage sequences, query donors, then pipe available # donors to the connection getDaash object. daash <- con$availableDonors[ lineage_sequences_available == TRUE & mab_count < 10, ] |> con$getDaash() # Inspect what datasets are available names(daash$datasets) # Inspect the `topCalls` dataset daash$datasets$topCalls ## End(Not run)
An R6 class for DataSpace MAb Donor data.
DataSpaceConnection$getMab()
DataSpaceR::DataSpaceConnection -> DataSpaceDonors
mabMetadataA data.table of mAbs with metadata found in the object.
donorMetadataA data.table of donors with metadata found in the object.
datasetsA list of data.table objects containing the related data loaded.
variableDefinitionsA data.table of variable definitions.
DataSpaceR::DataSpaceConnection$downloadPublicationData()DataSpaceR::DataSpaceConnection$filterMabGrid()DataSpaceR::DataSpaceConnection$getDaash()DataSpaceR::DataSpaceConnection$getDonors()DataSpaceR::DataSpaceConnection$getGroup()DataSpaceR::DataSpaceConnection$getGroups()DataSpaceR::DataSpaceConnection$getMab()DataSpaceR::DataSpaceConnection$getMabs()DataSpaceR::DataSpaceConnection$getStudies()DataSpaceR::DataSpaceConnection$getStudy()DataSpaceR::DataSpaceConnection$resetMabGrid()new()
Initialize DataSpaceMab object.
See DataSpaceConnection.
DataSpaceDonors$new(donorIds)
donorIdsa character vector of 'donor_id' values.
print()
Print the DataSpaceMab object summary.
DataSpaceDonors$print()
loadDaash()
Load DAASH data to the object.
DataSpaceDonors$loadDaash()
refresh()
Refresh the 'DataSpaceDonors' object to update datasets.
DataSpaceDonors$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceDonors$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Print available donors to the console con$availableDonors # Query the available donors object and pass that to `getDonors` to get a DataSpaceDonors object donors <- con$availableDonors[lineage_sequences_available == TRUE & donor_clade == "B",] |> con$getDonors() # Load DAASH data to the object donors$loadDaash() ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Print available donors to the console con$availableDonors # Query the available donors object and pass that to `getDonors` to get a DataSpaceDonors object donors <- con$availableDonors[lineage_sequences_available == TRUE & donor_clade == "B",] |> con$getDonors() # Load DAASH data to the object donors$loadDaash() ## End(Not run)
An R6 class for DataSpace Groups data.
DataSpaceConnection$getGroups()
DataSpaceR::DataSpaceConnection -> DataSpaceGroups
availableDatasetsA data.table of datasets available in the object.
datasetsA list of data.table objects containing the availableDatasets that were loaded.
variableDefinitionsA data.table of variable definitions.
DataSpaceR::DataSpaceConnection$downloadPublicationData()DataSpaceR::DataSpaceConnection$filterMabGrid()DataSpaceR::DataSpaceConnection$getDaash()DataSpaceR::DataSpaceConnection$getDonors()DataSpaceR::DataSpaceConnection$getGroup()DataSpaceR::DataSpaceConnection$getGroups()DataSpaceR::DataSpaceConnection$getMab()DataSpaceR::DataSpaceConnection$getMabs()DataSpaceR::DataSpaceConnection$getStudies()DataSpaceR::DataSpaceConnection$getStudy()DataSpaceR::DataSpaceConnection$resetMabGrid()new()
Initialize 'DataSpaceGroups' class.
See DataSpaceConnection.
DataSpaceGroups$new(groupIds = NULL)
groupIdsA character vecotor of 'group_id' values. as URL, path and username.
print()
Print DataSpaceStudy class.
DataSpaceGroups$print()
refresh()
Refresh loaded integrated datasets, and information of what datasets are available.
DataSpaceGroups$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceGroups$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get group by `group_id` or pass a filtered `availableGroups` object. groups <- con$getGroups(c(266, 267)) groups <- con$availableGroups[label == "NYVAC durability comparison"] |> con$getGroups() # Retrieving group assay data for cvd408 from # DataSpace is done automatically when the groups object is created. groups$datasets$BAMA # Get variable information of the assay dataset groups$datasetDescription$BAMA ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get group by `group_id` or pass a filtered `availableGroups` object. groups <- con$getGroups(c(266, 267)) groups <- con$availableGroups[label == "NYVAC durability comparison"] |> con$getGroups() # Retrieving group assay data for cvd408 from # DataSpace is done automatically when the groups object is created. groups$datasets$BAMA # Get variable information of the assay dataset groups$datasetDescription$BAMA ## End(Not run)
An R6 class for DataSpace MAb data.
DataSpaceConnection$getMab()
DataSpaceR::DataSpaceConnection -> DataSpaceMabs
mabMetadataA data.table of mAbs with metadata found in the object.
donorMetadataA data.table of donors with metadata found in the object.
mabMixMetadataA data.table. A table of mAb mixtures with metadata found in this DataSpaceMab instance.
mabMixA data.table. A mapping table of mab_mix_id to mab_id. with metadata found in this DataSpaceMab instance.
datasetsA list of data.table objects containing the mab related that were loaded.
variableDefinitionsA data.table of variable definitions.
DataSpaceR::DataSpaceConnection$downloadPublicationData()DataSpaceR::DataSpaceConnection$filterMabGrid()DataSpaceR::DataSpaceConnection$getDaash()DataSpaceR::DataSpaceConnection$getDonors()DataSpaceR::DataSpaceConnection$getGroup()DataSpaceR::DataSpaceConnection$getGroups()DataSpaceR::DataSpaceConnection$getMab()DataSpaceR::DataSpaceConnection$getMabs()DataSpaceR::DataSpaceConnection$getStudies()DataSpaceR::DataSpaceConnection$getStudy()DataSpaceR::DataSpaceConnection$resetMabGrid()new()
Initialize DataSpaceMab object.
See DataSpaceConnection.
DataSpaceMabs$new(mabIds, includeMixtures)
mabIdsA character vector of 'mab_id' values.
includeMixturesWhether or not to include mab mixtures. "yes", "no", or "only" are valid.
print()
Print the DataSpaceMab object summary.
DataSpaceMabs$print()
loadDaash()
Load any available DAASH datasets.
DataSpaceMabs$loadDaash()
refresh()
Refresh the DataSpaceMab object to update datasets.
DataSpaceMabs$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceMabs$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Inspect available mabs, then pass subset to the `getMabs` method. vrc01 <- con$availableMabs[mab_name_std == "VRC01"] |> con$getMabs() # Inspect the `NABMAb` assay data. vrc01$datasets$NABMAb # Load DAASH data from mab object vrc01$loadDaash() # Inspect DAASH datasets vrc01$datasets$daash |> names() ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Inspect available mabs, then pass subset to the `getMabs` method. vrc01 <- con$availableMabs[mab_name_std == "VRC01"] |> con$getMabs() # Inspect the `NABMAb` assay data. vrc01$datasets$NABMAb # Load DAASH data from mab object vrc01$loadDaash() # Inspect DAASH datasets vrc01$datasets$daash |> names() ## End(Not run)
An R6 class for DataSpace Study data.
DataSpaceConnection$getStudies()
DataSpaceR::DataSpaceConnection -> DataSpaceStudies
studiesA character vector of 'study_id' values found in the object.
availableDatasetsA table of datasets available in
the DataSpaceStudies object.
datasetsA list of data.table objects containing the availableDatasets that were loaded.
variableDefinitionsA list of data.table objects containing the data dictionaries of the integrated data loaded.
treatmentArmA data.table. The table of treatment arm information for the connected study. Not available for all study connection.
studyInfoA list. Stores the information about the study.
DataSpaceR::DataSpaceConnection$downloadPublicationData()DataSpaceR::DataSpaceConnection$filterMabGrid()DataSpaceR::DataSpaceConnection$getDaash()DataSpaceR::DataSpaceConnection$getDonors()DataSpaceR::DataSpaceConnection$getGroup()DataSpaceR::DataSpaceConnection$getGroups()DataSpaceR::DataSpaceConnection$getMab()DataSpaceR::DataSpaceConnection$getMabs()DataSpaceR::DataSpaceConnection$getStudies()DataSpaceR::DataSpaceConnection$getStudy()DataSpaceR::DataSpaceConnection$resetMabGrid()new()
Initialize DataSpaceStudy class.
See DataSpaceConnection.
DataSpaceStudies$new(studyIds)
studyIdsA character. Name of the study to retrieve. as URL, path and username.
print()
Print DataSpaceStudy class.
DataSpaceStudies$print()
loadAvailableDatasets()
Load datasets to the studies object from an availableDatasets object.
DataSpaceStudies$loadAvailableDatasets( availableDatasets = self$availableDatasets, downloadDir = tempdir() )
availableDatasetsAn 'availableDatasets' object or vector of 'study_id' values.
downloadDirOptional, a character path specifying a directory to download. nonstandard datasets. The default is the working temp directory.
refresh()
Refresh the study object to update available datasets and treatment info.
DataSpaceStudies$refresh()
clone()
The objects of this class are cloneable with this method.
DataSpaceStudies$clone(deep = FALSE)
deepWhether to make a deep clone.
## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get group by `study_id` or pass a filtered `availableStudies` object. studies <- con$getStudies(c("vtn505", "cvd408")) studies <- con$getStudies( con$availableStudies[grepl("BAMA", data_availability) & species == "Human"] ) # Load BAMA to the studies object. studies$loadAssayDatasets("BAMA") studies$datasets$BAMA # Inspect variable information of the BAMA dataset studies$datasetDescriptions$BAMA # Inspect treatment arm information for all studies in study object studies$treatmentArm ## End(Not run)## Not run: # Create a connection (Initiate a DataSpaceConnection object) con <- connectDS() # Get group by `study_id` or pass a filtered `availableStudies` object. studies <- con$getStudies(c("vtn505", "cvd408")) studies <- con$getStudies( con$availableStudies[grepl("BAMA", data_availability) & species == "Human"] ) # Load BAMA to the studies object. studies$loadAssayDatasets("BAMA") studies$datasets$BAMA # Inspect variable information of the BAMA dataset studies$datasetDescriptions$BAMA # Inspect treatment arm information for all studies in study object studies$treatmentArm ## End(Not run)
Get a default netrc file path
getNetrcPath()getNetrcPath()
A character vector containing the default netrc file path
## Not run: getNetrcPath() ## End(Not run)## Not run: getNetrcPath() ## End(Not run)
Write a netrc file that is valid for accessing DataSpace.
writeNetrc( login, password, netrcFile = NULL, onStaging = FALSE, overwrite = FALSE )writeNetrc( login, password, netrcFile = NULL, onStaging = FALSE, overwrite = FALSE )
login |
A character. Email address used for logging in on DataSpace. |
password |
A character. Password associated with the login. |
netrcFile |
A character. Credentials will be written into that file. If left NULL, netrc will be written into a temporary file. |
onStaging |
A logical. Whether to connect to the staging server instead of the production server. |
overwrite |
A logical. Whether to overwrite the existing netrc file. |
The database is accessed with the user's credentials.
A netrc file storing login and password information is required.
See here
for instruction on how to register and set DataSpace credential.
By default curl will look for the file in your home directory.
A character vector containing the netrc file path
## Not run: # First, create an account in the DataSpace App and read the terms of use # Next, create a netrc file using writeNetrc() writeNetrc( login = "[email protected]", password = "yourSecretPassword" ) # Specify `netrcFile = getNetrcPath()` to write netrc in the default path ## End(Not run)## Not run: # First, create an account in the DataSpace App and read the terms of use # Next, create a netrc file using writeNetrc() writeNetrc( login = "[email protected]", password = "yourSecretPassword" ) # Specify `netrcFile = getNetrcPath()` to write netrc in the default path ## End(Not run)