--- title: "Accessing Non-Integrated Datasets" author: "Helen Miller" date: "2022-06-15" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Accessing Non-Integrated Datasets} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as "Non-Integrated Datasets," which can be downloaded from the app as a zip file. `DataSpaceR` provides an interface for accessing non-integrated data from studies where it is available. ## Viewing available non-integrated data Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example. ```r library(DataSpaceR) con <- connectDS() vtn505 <- con$getStudy("vtn505") vtn505 #> #> Study: vtn505 #> URL: https://dataspace.cavd.org/CAVD/vtn505 #> Available datasets: #> - Binding Ab multiplex assay #> - Demographics #> - Intracellular Cytokine Staining #> - Neutralizing antibody #> Available non-integrated datasets: #> - ADCP #> - Demographics (Supplemental) #> - Fc Array ``` The print method on the study object will list available non-integrated datasets. The `availableDatasets` property shows some more info about available datasets, with the `integrated` field indicating whether the data is integrated. The value for `n` will be `NA` for non-integrated data until the dataset has been loaded. ```r knitr::kable(vtn505$availableDatasets) ``` |name |label | n|integrated | |:------------|:-------------------------------|-----:|:----------| |BAMA |Binding Ab multiplex assay | 10260|TRUE | |Demographics |Demographics | 2504|TRUE | |ICS |Intracellular Cytokine Staining | 22684|TRUE | |NAb |Neutralizing antibody | 628|TRUE | |ADCP |ADCP | NA|FALSE | |DEM SUPP |Demographics (Supplemental) | NA|FALSE | |Fc Array |Fc Array | NA|FALSE | ## Loading non-integrated data Non-Integrated datasets can be loaded with `getDataset` like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment. ```r adcp <- vtn505$getDataset("ADCP") dim(adcp) #> [1] 378 11 colnames(adcp) #> [1] "study_prot" "participant_id" "study_day" #> [4] "lab_code" "specimen_type" "antigen" #> [7] "percent_cv" "avg_phagocytosis_score" "positivity_threshold" #> [10] "response" "assay_identifier" ``` You can also view the file format info using `getDatasetDescription`. For non-integrated data, this will open a pdf into your computer's default pdf viewer. ```r vtn505$getDatasetDescription("ADCP") ``` Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify `outputDir` when calling `getDataset` or `getDatasetDescription`. If you will be accessing the data at another time and don't want to have to re-download it, you can change the default directory for the whole study object with `setDataDir`. ```r vtn505$dataDir #> [1] "/tmp/RtmpoDO8Tc" vtn505$setDataDir(".") vtn505$dataDir #> [1] "/home/jmtaylor/Projects/DataSpaceR/vignettes" ``` If the dataset already exists in the specified `dataDir` or `outputDir`, it will be not be downloaded. This can be overridden with `reload=TRUE`, which forces a re-download. ## Session information ```r sessionInfo() #> R version 4.1.2 (2021-11-01) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 18.04.5 LTS #> #> Matrix products: default #> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 #> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 #> #> locale: #> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C #> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 #> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 #> [7] LC_PAPER=en_US.utf8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37 #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1 #> [5] jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9 #> [9] httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2 #> [13] stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2 ```