---
title: "Accessing Monoclonal Antibody Data"
author:
- Ju Yeong Kim
- Jason Taylor
date: "2022-06-15"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Accessing Monoclonal Antibody Data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Workflow overview
Using the DataSpace [app](https://dataspace.cavd.org/cds/CAVD/app.view#mabgrid), the workflow of using the mAb grid is the following:
1. Navigate to the mAb Grid and browse the available mAb mixtures
2. Select the mAb mixtures that you'd like to investigate
3. Or filter rows by using columns:
- mAb/Mixture
- donor species
- isotype
- HXB2 location
- tiers
- clades
- viruses
4. Click "Neutralization Curves" or "IC50 Titer Heatmap" to visualize the mAb data
5. Click "Export CSV" or "Export Excel" to download the mAb data
`DataSpaceR` offers a similar interface:
1. Browse the mAb Grid by `con$mabGridSummary`
2. Select the mAb mixtures by filtering the mAb grid using any columns found in `con$mabGrid` using `con$filterMabGrid()`
3. Use `con$getMab()` to retrieve the mAb data
## Browse the mAb Grid
You can browse the mAb Grid by calling the `mabGridSummary` field in the connection object:
```r
library(DataSpaceR)
con <- connectDS()
knitr::kable(head(con$mabGridSummary))
```
|mab_mixture |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:--------------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|10-1074 |human |IgG |Env | 7| 3| 2| 0.0213723| 1|
|10E8 |human |IgG3 |gp160 | 227| 11| 7| 0.4843333| 2|
|10E8 V2.0 |human | | | 28| 3| 1| 0.0031350| 1|
|10E8 V2.0/iMab |human | |gp160 | 13| 8| 2| 0.0462897| 1|
|10E8 V4.0 |human | | | 28| 3| 1| 0.0024094| 1|
|10E8 V4.0/iMab |human | | | 119| 12| 6| 0.0015396| 1|
This table is designed to mimic the mAb grid found in the app.
One can also access the unsummarized data from the mAb grid by calling `con$mabGrid`.
## Filter the mAb grid
You can filter rows in the grid by specifying the values to keep in the columns found in the field `con$mabGrid`: `mab_mixture`, `donor_species`, `isotype`, `hxb2_location`, `tiers`, `clades`, `viruses`, and `studies`. `filterMabGrid` takes the column and the values and filters the underlying tables (private fields), and when you call the `mabGridSummary` or (which is actually an [active binding](https://r6.r-lib.org/articles/Introduction.html#active-bindings)), it returns the filtered grid with updated `n_` columns and `geometric_mean_curve_ic50`.
```r
# filter the grid by viruses
con$filterMabGrid(using = "virus", value = c("242-14", "Q23.17", "6535.3", "BaL.26", "DJ263.8"))
# filter the grid by donor species (llama)
con$filterMabGrid(using = "donor_species", value = "llama")
# check the updated grid
knitr::kable(con$mabGridSummary)
```
|mab_mixture |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:-----------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|11F1B |llama | | | 4| 2| 1| NA| 1|
|11F1F |llama | | | 4| 2| 1| 26.2961178| 1|
|1H9 |llama | |Env | 4| 2| 1| 5.0898322| 1|
|2B4F |llama | | | 4| 2| 1| 1.5242288| 1|
|2H10 |llama | | | 2| 2| 1| NA| 1|
|2H10/W100A |llama | | | 2| 2| 1| NA| 1|
|3E3 |llama | |gp160 | 3| 3| 1| 0.9944945| 1|
|4H73 |llama | | | 4| 2| 1| NA| 1|
|5B10D |llama | | | 4| 2| 1| NA| 1|
|9B6B |llama | | | 4| 2| 1| 24.3643637| 1|
|A14 |llama | |gp160 | 3| 3| 1| 1.8444582| 1|
|B21 |llama | |gp160 | 3| 3| 1| 0.0936399| 1|
|B9 |llama | |gp160 | 3| 3| 1| 0.0386986| 1|
|LAB5 |llama | | | 4| 2| 1| NA| 1|
Or we can use method chaining to call multiple filter methods and browse the grid. Method chaining is unique to R6 objects and related to the pipe. See Hadley Wickham's [Advanced R](https://adv-r.hadley.nz/r6.html) for more info.
```r
con$resetMabGrid()
con$
filterMabGrid(using = "virus", value = c("242-14", "Q23.17", "6535.3", "BaL.26", "DJ263.8"))$
filterMabGrid(using = "donor_species", value = "llama")$
mabGridSummary
```
## Retrieve column values from the mAb grid
You can retrieve values from the grid by `mab_mixture`, `donor_species`, `isotype`, `hxb2_location`, `tier`, `clade`, `virus`, and `studies`, or any variables found in the `mabGrid` field in the connection object via `data.table` operations.
```r
# retrieve available viruses in the filtered grid
con$mabGrid[, unique(virus)]
#> [1] "6535.3" "Q23.17" "DJ263.8" "BaL.26" "242-14"
# retrieve available clades for 1H9 mAb mixture in the filtered grid
con$mabGrid[mab_mixture == "1H9", unique(clade)]
#> [1] "CRF02_AG" "B"
```
## Create a DataSpaceMab object
After filtering the grid, you can create a DataSpaceMab object that contains the filtered mAb data.
```r
mab <- con$getMab()
mab
#>
#> URL: https://dataspace.cavd.org
#> User: jmtaylor@scharp.org
#> Summary:
#> - 3 studies
#> - 14 mAb mixtures
#> - 1 neutralization tiers
#> - 3 clades
#> Filters:
#> - virus: 242-14, Q23.17, 6535.3, BaL.26, DJ263.8
#> - mab_donor_species: llama
```
There are 6 public fields available in the `DataSpaceMab` object: `studyAndMabs`, `mabs`, `nabMab`, `studies`, `assays`, and `variableDefinitions`, and they are equivalent to the sheets in the excel file or the csv files you would download from the app via "Export Excel"/"Export CSV".
```r
knitr::kable(con$mabGridSummary)
```
|mab_mixture |donor_species |isotype |hxb2_location | n_viruses| n_clades| n_tiers| geometric_mean_curve_ic50| n_studies|
|:-----------|:-------------|:-------|:-------------|---------:|--------:|-------:|-------------------------:|---------:|
|11F1B |llama | | | 4| 2| 1| NA| 1|
|11F1F |llama | | | 4| 2| 1| 26.2961178| 1|
|1H9 |llama | |Env | 4| 2| 1| 5.0898322| 1|
|2B4F |llama | | | 4| 2| 1| 1.5242288| 1|
|2H10 |llama | | | 2| 2| 1| NA| 1|
|2H10/W100A |llama | | | 2| 2| 1| NA| 1|
|3E3 |llama | |gp160 | 3| 3| 1| 0.9944945| 1|
|4H73 |llama | | | 4| 2| 1| NA| 1|
|5B10D |llama | | | 4| 2| 1| NA| 1|
|9B6B |llama | | | 4| 2| 1| 24.3643637| 1|
|A14 |llama | |gp160 | 3| 3| 1| 1.8444582| 1|
|B21 |llama | |gp160 | 3| 3| 1| 0.0936399| 1|
|B9 |llama | |gp160 | 3| 3| 1| 0.0386986| 1|
|LAB5 |llama | | | 4| 2| 1| NA| 1|
## View metadata concerning the mAb object
There are several metadata fields that can be exported in the mAb object.
```r
names(mab)
#> [1] ".__enclos_env__" "variableDefinitions" "assays"
#> [4] "studies" "nabMab" "mabs"
#> [7] "studyAndMabs" "config" "clone"
#> [10] "getLanlMetadata" "refresh" "print"
#> [13] "initialize"
```
DataSpaceR can also fetch and add metadata associated with downloaded mAbs via the `getLanlMetadata` method that is associated with the `DataSpaceMab` object.
```r
mab$getLanlMetadata()
```
The LANL metadata can now be found at the `mabs$lanl_metadata` variable. This is a list column and its structure can very depending on what data LANL has collected.
```r
mab$mabs[mab_name_std == "B9"]$lanl_metadata
#> [[1]]
#> [[1]]$epitopes
#> accession alt_names binding_type cite country dis
#> 1: NA NA NA
#> disrange donor epitope epitope_name hxb2_contig hxb2loc2end hxb2loc2start
#> 1: NA NA NA NA NA NA NA
#> hxb2locend hxb2locstart hxb2protein hxb2protein_id id immunogen
#> 1: NA NA gp160 18 3219
#> in_catnap in_feature_db is_adcc isotype keyword mab_name
#> 1: TRUE FALSE NA B9
#> modifydate neutralizing note origlocend origlocstart
#> 1: 2018-06-01 13:59:55 L NA NA
#> origprotein origprotein_id patient species strain subprotein
#> 1: NA 1 NA NA
#> subprotein_id subtype table total_cite_count total_note_count
#> 1: 1 ab 1 2
#> vaccine_adjuvant vaccine_component vaccine_strain vaccine_type
#> 1:
#>
#> [[1]]$params
#> [[1]]$params$id
#> [1] "3219"
#>
#> [[1]]$params$table
#> [1] "ab"
#>
#>
#> [[1]]$timestamp
#> [1] "2022-04-01 17:01:42z"
#>
#> [[1]]$source
#> [1] "https://www.hiv.lanl.gov/mojo/immunology/api/v1/epitope/ab?id=3219"
```
## Session information
```r
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
#> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
#> [7] LC_PAPER=en_US.utf8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1
#> [5] jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9
#> [9] httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2
#> [13] stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2
```