This package provides a thin wrapper around Rlabkey and connects to the the CAVD DataSpace database, making it easier to fetch datasets from specific studies.
First, go to DataSpace now and set yourself up with an account.
In order to connect to the CAVD DataSpace via
DataSpaceR
, you will need a netrc
file in your
home directory that will contain a machine
name (hostname
of DataSpace), and login
and password
. There
are two ways to create a netrc
file.
writeNetrc
On your R console, create a netrc
file using a function
from DataSpaceR
:
writeNetrc(
login = "[email protected]",
password = "yourSecretPassword",
netrcFile = "/your/home/directory/.netrc" # use getNetrcPath() to get the default path
)
This will create a netrc
file in your home directory.
Make sure you have a valid login and password.
Alternatively, you can manually create a netrc file.
_netrc
.netrc
Sys.getenv("HOME")
in RThe following three lines must be included in the .netrc
or _netrc
file either separated by white space (spaces,
tabs, or newlines) or commas. Multiple such blocks can exist in one
file.
machine dataspace.cavd.org
login [email protected]
password supersecretpassword
See here
for more information about netrc
.
We’ll be looking at study cvd256
. If you want to use a
different study, change that string. You can instantiate multiple
connections to different studies simultaneously.
library(DataSpaceR)
con <- connectDS()
con
#> <DataSpaceConnection>
#> URL: https://dataspace.cavd.org
#> User: [email protected]
#> Available studies: 273
#> - 77 studies with data
#> - 5049 subjects
#> - 423195 data points
#> Available groups: 3
#> Available publications: 1530
#> - 12 publications with data
The call to connectDS
instantiates the connection.
Printing the object shows where it’s connected and the available
studies.
study_name | short_name | title | type | status | stage | species | start_date | strategy | network | data_availability | ni_data_availability |
---|---|---|---|---|---|---|---|---|---|---|---|
cor01 | NA | The correlate of risk targeted intervention study (CORTIS): A randomized, partially-blinded, clinical trial of isoniazid and rifapentine (3HP) therapy to prevent pulmonary tuberculosis in high-risk individuals identified by a transcriptomic correlate of risk | Phase III | Inactive | Assays Completed | Human | NA | NA | GH-VAP | NA | NA |
cvd232 | Parks_RV_232 | Limiting Dose Vaginal SIVmac239 Challenge of RhCMV-SIV vaccinated Indian rhesus macaques. | Pre-Clinical NHP | Inactive | Assays Completed | Rhesus macaque | 2009-11-24 | Vector vaccines (viral or bacterial) | CAVD | NA | NA |
cvd234 | Zolla-Pazner_Mab_test1 Study | Zolla-Pazner_Mab_Test1 | Antibody Screening | Inactive | Assays Completed | Non-Organism Study | 2009-02-03 | Prophylactic neutralizing Ab | CAVD | NA | NA |
cvd235 | mAbs potency | Weiss mAbs potency | Antibody Screening | Inactive | Assays Completed | Non-Organism Study | 2008-08-21 | Prophylactic neutralizing Ab | CAVD | NA | NA |
cvd236 | neutralization assays | neutralization assays | Antibody Screening | Active | In Progress | Non-Organism Study | 2009-02-03 | Prophylactic neutralizing Ab | CAVD | NA | NA |
cvd238 | Gallo_PA_238 | HIV-1 neutralization responses in chronically infected individuals | Antibody Screening | Inactive | Assays Completed | Non-Organism Study | 2009-01-08 | Prophylactic neutralizing Ab | CAVD | NA | NA |
con$availableStudies
shows the available studies in the
CAVD DataSpace. Check out the
reference page of DataSpaceConnection
for all available
fields and methods.
cvd256 <- con$getStudy("cvd256")
cvd256
#> <DataSpaceStudy>
#> Study: cvd256
#> URL: https://dataspace.cavd.org/CAVD/cvd256
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Neutralizing antibody
#> Available non-integrated datasets:
con$getStudy
creates a connection to the study
cvd256
. Printing the object shows where it’s connected, to
what study, and the available datasets.
name | label | n | integrated |
---|---|---|---|
BAMA | Binding Ab multiplex assay | 6740 | TRUE |
Demographics | Demographics | 121 | TRUE |
NAb | Neutralizing antibody | 1419 | TRUE |
arm_id | arm_part | arm_group | arm_name | randomization | coded_label | last_day | description |
---|---|---|---|---|---|---|---|
cvd256-NA-A-A | NA | A | A | Vaccine | Group A Vaccine | 168 | DNA-C 4 mg administered IM at weeks 0, 4, and 8 AND NYVAC-C 10^7pfu/mL administered IM at week 24 |
cvd256-NA-B-B | NA | B | B | Vaccine | Group B Vaccine | 168 | DNA-C 4 mg administered IM at weeks 0 and 4 AND NYVAC-C 10^7pfu/mL administered IM at weeks 20 and 24 |
Available datasets and treatment arm information for the connection
can be accessed by availableDatasets
and
treatmentArm
.
We can grab any of the datasets listed in the connection
(availableDatasets
).
NAb <- cvd256$getDataset("NAb")
dim(NAb)
#> [1] 1419 33
colnames(NAb)
#> [1] "participant_id" "participant_visit" "visit_day"
#> [4] "assay_identifier" "summary_level" "specimen_type"
#> [7] "antigen" "antigen_type" "virus"
#> [10] "virus_type" "virus_insert_name" "clade"
#> [13] "neutralization_tier" "tier_clade_virus" "target_cell"
#> [16] "initial_dilution" "titer_ic50" "titer_ic80"
#> [19] "response_call" "nab_lab_source_key" "lab_code"
#> [22] "exp_assayid" "titer_id50" "titer_id80"
#> [25] "nab_response_id50" "nab_response_id80" "slope"
#> [28] "vaccine_matched" "study_prot" "virus_full_name"
#> [31] "virus_species" "virus_host_cell" "virus_backbone"
The cvd256 object is an R6
class,
so it behaves like a true object. Functions (like
getDataset
) are members of the object, thus the
$
semantics to access member functions.
We can get detailed variable information using
getDatasetDescription
. getDataset
and
getDatasetDescription
accept either the name
or label
field listed in
availableDatasets
.
fieldName | caption | type | description |
---|---|---|---|
ParticipantId | Participant ID | Text (String) | Subject identifier |
antigen | Antigen name | Text (String) | The name of the antigen (virus) being tested. |
antigen_type | Antigen type | Text (String) | The standardized term for the type of virus used in the construction of the nAb antigen. |
assay_identifier | Assay identifier | Text (String) | Name identifying assay |
clade | Virus clade | Text (String) | The clade (gene subtype) of the virus (antigen) being tested. |
exp_assayid | Experimental Assay Design Code | Integer | Unique ID assigned to the experiment design of the assay for tracking purposes. |
initial_dilution | Initial dilution | Number (Double) | Indicates the initial specimen dilution. |
lab_code | Lab ID | Text (String) | A code indicating the lab performing the assay. |
nab_lab_source_key | Data provenance | Integer | Details regarding the provenance of the assay results. |
nab_response_ID50 | Response call ID50 | True/False (Boolean) | Indicates if neutralization is detected based on ID50 titer. |
nab_response_ID80 | Response call ID80 | True/False (Boolean) | Indicates if neutralization is detected based on ID80 titer. |
neutralization_tier | Neutralization tier | Text (String) | A classification specific to HIV NAb assay design, in which an antigen is assessed for its ease of neutralization (1=most easily neutralized, 3=least easily neutralized) |
response_call | Response call | True/False (Boolean) | Indicates if neutralization is detected. |
slope | Slope | Number (Double) | The slope calculated using the difference between 50% and 80% neutralization. |
specimen_type | Specimen type | Text (String) | The type of specimen used in the assay. For nAb assays, this is generally serum or plasma. |
study_prot | Study Protocol | Text (String) | Study protocol |
summary_level | Data summary level | Text (String) | Defines the level at which the magnitude or response has been summarized (e.g. summarized at the isolate level). |
target_cell | Target cell | Text (String) | The cell line used in the assay to determine infection (lack of neutralization). Generally TZM-bl or A3R5, but can also be other cell lines or non-engineered cells. |
tier_clade_virus | Neutralization tier + Antigen clade + Virus | Text (String) | A combination of neutralization tier, antigen clade, and virus used for filtering. |
titer_ID50 | Titer ID50 | Number (Double) | The adjusted value of 50% maximal inhibitory dilution (ID50). |
titer_ID80 | Titer ID80 | Number (Double) | The adjusted value of 80% maximal inhibitory dilution (ID80). |
titer_ic50 | Titer IC50 | Number (Double) | The half maximal inhibitory concentration (IC50). |
titer_ic80 | Titer IC80 | Number (Double) | The 80% maximal inhibitory concentration (IC80). |
vaccine_matched | Antigen vaccine match indicator | True/False (Boolean) | Indicates if the interactive part of the antigen was designed to match the immunogen in the vaccine. |
virus | Virus name | Text (String) | The term for the virus (antigen) being tested. |
virus_backbone | Virus backbone | Text (String) | Indicates the backbone used to generate the virus if from a different plasmid than the envelope. |
virus_full_name | Virus full name | Text (String) | The full name of the virus used in the construction of the nAb antigen. |
virus_host_cell | Virus host cell | Text (String) | The host cell used to incubate the virus stock. |
virus_insert_name | Virus insert name | Text (String) | The amino acid sequence inserted in the virus construct. |
virus_species | Virus species | Text (String) | A classification for virus species using informal taxonomy. |
virus_type | Virus type | Text (String) | The type of virus used in the construction of the nAb antigen. |
visit_day | Visit Day | Integer | Target study day defined for a study visit. Study days are relative to Day 0, where Day 0 is typically defined as enrollment and/or first injection. |
To get only a subset of the data and speed up the download, filters
can be passed to getDataset
. The filters are created using
the makeFilter
function of the Rlabkey
package.
cvd256Filter <- makeFilter(c("visit_day", "EQUAL", "0"))
NAb_day0 <- cvd256$getDataset("NAb", colFilter = cvd256Filter)
dim(NAb_day0)
#> [1] 709 33
See ?makeFilter
for more information on the syntax.
To fetch data from multiple studies, create a connection at the project level.
This will instantiate a connection at the CAVD
level.
Most functions work cross study connections just like they do on single
studies.
You can get a list of datasets available across all studies.
cavd
#> <DataSpaceStudy>
#> Study: CAVD
#> URL: https://dataspace.cavd.org/CAVD
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Enzyme-Linked ImmunoSpot
#> - Intracellular Cytokine Staining
#> - Neutralizing antibody
#> - PK MAb
#> Available non-integrated datasets:
knitr::kable(cavd$availableDatasets)
name | label | n | integrated |
---|---|---|---|
BAMA | Binding Ab multiplex assay | 170320 | TRUE |
Demographics | Demographics | 5049 | TRUE |
ELISPOT | Enzyme-Linked ImmunoSpot | 5610 | TRUE |
ICS | Intracellular Cytokine Staining | 195883 | TRUE |
NAb | Neutralizing antibody | 51382 | TRUE |
PKMAb | PK MAb | 3217 | TRUE |
In all-study connection, getDataset
will combine the
requested datasets. Note that in most cases, the datasets will have too
many subjects for quick data transfer, making filtering of the data a
necessity. The colFilter
argument can be used here, as
described in the getDataset
section.
conFilter <- makeFilter(c("species", "EQUAL", "Human"))
human <- cavd$getDataset("Demographics", colFilter = conFilter)
dim(human)
#> [1] 3142 36
colnames(human)
#> [1] "subject_id" "subject_visit"
#> [3] "species" "subspecies"
#> [5] "sexatbirth" "race"
#> [7] "ethnicity" "country_enrollment"
#> [9] "circumcised_enrollment" "bmi_enrollment"
#> [11] "agegroup_range" "agegroup_enrollment"
#> [13] "age_enrollment" "study_label"
#> [15] "study_start_date" "study_first_enr_date"
#> [17] "study_fu_complete_date" "study_public_date"
#> [19] "study_network" "study_last_vaccination_day"
#> [21] "study_type" "study_part"
#> [23] "study_group" "study_arm"
#> [25] "study_arm_summary" "study_arm_coded_label"
#> [27] "study_randomization" "study_product_class_combination"
#> [29] "study_product_combination" "study_short_name"
#> [31] "study_grant_pi_name" "study_strategy"
#> [33] "study_prot" "genderidentity"
#> [35] "studycohort" "bmi_category"
Check out the
reference page of DataSpaceStudy
for all available
fields and methods.
A group is a curated collection of participants from filtering of treatments, products, studies, or species, and it is created in the DataSpace App.
Let’s say you are using the App to filter and visualize data and want
to save them for later or explore in R with DataSpaceR
. You
can save a group by clicking the Save button on the Active Filter
Panel.
We can browse available the saved groups or the curated groups by
DataSpace Team via availableGroups
.
group_id | label | original_label | description | created_by | shared | n | studies |
---|---|---|---|---|---|---|---|
220 | NYVAC durability comparison | NYVAC_durability | Compare durability in 4 NHP studies using NYVAC-C (vP2010) and NYVAC-KC-gp140 (ZM96) products. | ehenrich | TRUE | 78 | cvd281, cvd434, cvd259, cvd277 |
228 | HVTN 505 case control subjects | HVTN 505 case control subjects | Participants from HVTN 505 included in the case-control analysis | drienna | TRUE | 189 | vtn505 |
230 | HVTN 505 polyfunctionality vs BAMA | HVTN 505 polyfunctionality vs BAMA | Compares ICS polyfunctionality (CD8+, Any Env) to BAMA mfi-delta (single Env antigen) in the HVTN 505 case control cohort | drienna | TRUE | 170 | vtn505 |
To fetch data from a saved group, create a connection at the project
level with a group ID. For example, we can connect to the “NYVAC
durability comparison” group which has group ID 220 by
getGroup
.
nyvac <- con$getGroup(220)
nyvac
#> <DataSpaceStudy>
#> Group: NYVAC durability comparison
#> URL: https://dataspace.cavd.org/CAVD
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Enzyme-Linked ImmunoSpot
#> - Intracellular Cytokine Staining
#> - Neutralizing antibody
#> Available non-integrated datasets:
Retrieving a dataset is the same as before.
DataSpace maintains metadata about all viruses used in Neutralizing Antibody (NAb) assays. This data can be accessed through the app on the NAb antigen page and NAb MAb antigen page.
We can access this metadata in DataSpaceR
with
con$virusMetadata
:
assay_identifier | cds_virus_id | virus | virus_type | neutralization_tier | clade | antigen_control | virus_full_name | virus_name_other | virus_species | virus_host_cell | virus_backbone | panel_names |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NAB MAB | cds_1 | 0013095-2.11 | Env Pseudotype | 2 | NA | 0 | 0013095-2.11 [SG3Δenv] 293T/17 | NA | HIV | 293T/17 | SG3Δenv | Tiered diverse panel |
NAB MAB | cds_2 | 001428-2.42 | Env Pseudotype | 2 | C | 0 | 001428-2.42 [SG3Δenv] 293T/17 | NA | HIV | 293T/17 | SG3Δenv | Tiered diverse panel |
NAB MAB | cds_3 | 0041.v3.c18 | Env Pseudotype | 2 | C | 0 | 0041.v3.c18 [SG3Δenv] 293T/17 | 0041.V3.C18 | HIV | 293T/17 | SG3Δenv | NA |
NAB MAB | cds_4 | 0077.v1.c16 | Env Pseudotype | 2 | C | 0 | 0077.v1.c16 [SG3Δenv] 293T/17 | 0077.v1.c16 | HIV | 293T/17 | SG3Δenv | NA |
NAB | cds_252 | 00836-2.5 | Env Pseudotype | 1B | C | 0 | 00836-2.5 [SG3Δenv] 293T/17 | NA | HIV | 293T/17 | SG3Δenv | Tiered diverse panel |
NAB MAB | cds_5 | 0260.v5.c1 | Env Pseudotype | 2 | A | 0 | 0260.v5.c1 [SG3Δenv] 293T/17 | 0260.V5.C1 | HIV | 293T/17 | SG3Δenv | Tiered diverse panel |
See other vignette for a tutorial on accessing monoclonal antibody
data with DataSpaceR
:
DataSpace maintains a curated collection of relevant publications,
which can be accessed through the Publications
page through the app. Metadata about these publications can be
accessed through DataSpaceR
with
con$availablePublications
.
See Publication Data vignette for a tutorial on accessing publication data through DataSpaceR.
The followings are the tables of all fields and methods that work on
DataSpaceConnection
and DataSpaceStudy
objects
and could be used as a quick reference.
DataSpaceConnection
Name | Description |
---|---|
availableStudies |
The table of available studies. |
availableGroups |
The table of available groups. |
availablePublications |
The table of available publications. |
mabGrid |
The filtered mAb grid. |
mabGridSummary |
The summarized mAb grid with updated n_ columns and
geometric_mean_curve_ic50 . |
virusMetadata |
Metadata about all viruses in the DataSpace. |
filterMabGrid |
Filter rows in the mAb grid by specifying the values to keep in the
columns found in the mabGrid field. |
resetMabGrid |
Reset the mAb grid to the unfiltered state. |
getMab |
Create a DataSpaceMab object by filtered
mabGrid . |
getStudy |
Create a DataSpaceStudy object by study. |
getGroup |
Create a DataSpaceStudy object by group. |
downloadPublicationData |
Download data from a chosen publication. |
DataSpaceStudy
Name | Description |
---|---|
study |
The study name. |
group |
The group name. |
availableDatasets |
The table of datasets available in the study object. |
treatmentArm |
The table of treatment arm information for the connected study. Not available for all study connection. |
dataDir |
The default target directory for downloading non-integrated datasets. |
studyInfo |
Stores the information about the study. |
getDataset |
Get a dataset from the connection. |
getDatasetDescription |
Get variable information. |
setDataDir |
Set default target directory for downloading non-integrated datasets. |
DataSpaceMab
Name | Description |
---|---|
studyAndMabs |
The table of available mAbs by study. |
mabs |
The table of available mAbs and their attributes. |
nabMab |
The table of mAbs and their neutralizing measurements against viruses. |
studies |
The table of available studies. |
assays |
The table of assay status by study. |
variableDefinitions |
The table of variable definitions. |
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
#> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
#> [7] LC_PAPER=en_US.utf8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1
#> [5] jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9
#> [9] httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2
#> [13] stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2