Accessing Non-Integrated Datasets

Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as “Non-Integrated Datasets,” which can be downloaded from the app as a zip file. DataSpaceR provides an interface for accessing non-integrated data from studies where it is available.

Viewing available non-integrated data

Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.

library(DataSpaceR)
con <- connectDS()
vtn505 <- con$getStudy("vtn505")
vtn505
#> <DataSpaceStudy>
#>   Study: vtn505
#>   URL: https://dataspace.cavd.org/CAVD/vtn505
#>   Available datasets:
#>     - Binding Ab multiplex assay
#>     - Demographics
#>     - Intracellular Cytokine Staining
#>     - Neutralizing antibody
#>   Available non-integrated datasets:
#>     - ADCP
#>     - Demographics (Supplemental)
#>     - Fc Array

The print method on the study object will list available non-integrated datasets. The availableDatasets property shows some more info about available datasets, with the integrated field indicating whether the data is integrated. The value for n will be NA for non-integrated data until the dataset has been loaded.

knitr::kable(vtn505$availableDatasets)
name label n integrated
BAMA Binding Ab multiplex assay 10260 TRUE
Demographics Demographics 2504 TRUE
ICS Intracellular Cytokine Staining 22684 TRUE
NAb Neutralizing antibody 628 TRUE
ADCP ADCP NA FALSE
DEM SUPP Demographics (Supplemental) NA FALSE
Fc Array Fc Array NA FALSE

Loading non-integrated data

Non-Integrated datasets can be loaded with getDataset like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment.

adcp <- vtn505$getDataset("ADCP")
dim(adcp)
#> [1] 378  11
colnames(adcp)
#>  [1] "study_prot"             "participant_id"         "study_day"             
#>  [4] "lab_code"               "specimen_type"          "antigen"               
#>  [7] "percent_cv"             "avg_phagocytosis_score" "positivity_threshold"  
#> [10] "response"               "assay_identifier"

You can also view the file format info using getDatasetDescription. For non-integrated data, this will open a pdf into your computer’s default pdf viewer.

vtn505$getDatasetDescription("ADCP")

Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify outputDir when calling getDataset or getDatasetDescription.

If you will be accessing the data at another time and don’t want to have to re-download it, you can change the default directory for the whole study object with setDataDir.

vtn505$dataDir
#> [1] "/tmp/RtmpoDO8Tc"
vtn505$setDataDir(".")
vtn505$dataDir
#> [1] "/home/jmtaylor/Projects/DataSpaceR/vignettes"

If the dataset already exists in the specified dataDir or outputDir, it will be not be downloaded. This can be overridden with reload=TRUE, which forces a re-download.

Session information

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
#>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
#>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
#>  [7] LC_PAPER=en_US.utf8       LC_NAME=C                
#>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5  knitr_1.37       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8       digest_0.6.29    assertthat_0.2.1 R6_2.5.1        
#>  [5] jsonlite_1.8.0   magrittr_2.0.2   evaluate_0.15    highr_0.9       
#>  [9] httr_1.4.2       stringi_1.7.6    curl_4.3.2       tools_4.1.2     
#> [13] stringr_1.4.0    Rlabkey_2.8.3    xfun_0.29        compiler_4.1.2