--- title: "DrugBank Database XML Parser" author: "Mohammed Ali" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{DrugBank Database XML Parser} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "docs/articles/", out.width = "100%" ) ``` ## Introduction The main purpose of the `dbparser` package is to parse the [DrugBank](https://go.drugbank.com/) database which is downloadable in XML format from [this link](https://go.drugbank.com/releases/latest). The parsed data can then be explored and analyzed as desired by the user. In this tutorial, we will see how to use `dbparser` along with `dplyr` and `ggplot2` along with other libraries to do simple drug analysis ## Loading and Parsing the Data Before starting the code we are assuming the following: - user already downloaded *DrugBank* xml database file based on the [Read Me](https://docs.ropensci.org/dbparser/) instructions or the above note, - user saved the downloaded database in working directory as `C:\`. - user named the downloaded xml file **drugbank.xml**. Now we can loads the `drugs` info, `drug groups` info and `drug targets` actions info. ```{r eval=T} ## load dbparser package suppressPackageStartupMessages({ library(tidyr) library(dplyr) library(canvasXpress) library(tibble) library(dbparser) }) ## load drugs data drugs <- readRDS(system.file("drugs.RDS", package = "dbparser")) ## load drug groups data drug_groups <- readRDS(system.file("drug_groups.RDS", package = "dbparser")) ## load drug targets actions data drug_targets_actions <- readRDS(system.file("targets_actions.RDS", package = "dbparser")) ``` ## Exploring the data Following is an example involving a quick look at a few aspects of the parsed data. First we look at the proportions of `biotech` and `small-molecule` drugs in the data. ```{r eval=T} ## view proportions of the different drug types (biotech vs. small molecule) type_stat <- drugs %>% select(type) %>% group_by(type) %>% summarise(count = n()) %>% column_to_rownames("type") canvasXpress( data = type_stat, graphOrientation = "vertical", graphType = "Bar", showSampleNames = FALSE, title ="Drugs Type Distribution", xAxisTitle = "Count" ) ``` Below, we view the different `drug_groups` in the data and how prevalent they are. ```{r eval=T} ## view proportions of the different drug types for each drug group type_stat <- drugs %>% full_join(drug_groups, by = c("drugbank_id")) %>% select(type, group) %>% group_by(type, group) %>% summarise(count = n()) %>% pivot_wider(names_from = group, values_from = count) %>% column_to_rownames("type") canvasXpress( data = type_stat, graphType = "Stacked", legendColumns = 2, legendPosition = "bottom", title ="Drug Type Distribution per Drug Group", xAxisTitle = "Quantity", xAxis2Show = TRUE, xAxisShow = FALSE, smpTitle = "Drug Group") ``` Finally, we look at the `drug_targets_actions` to observe their proportions as well. ```{r eval=T} ## get counts of the different target actions in the data targetActionCounts <- drug_targets_actions %>% group_by(action) %>% summarise(count = n()) %>% arrange(desc(count)) %>% top_n(10) %>% column_to_rownames("action") ## get bar chart of the 10 most occurring target actions in the data canvasXpress( data = targetActionCounts, graphType = "Bar", legendColumns = 2, legendPosition = "bottom", title = "Target Actions Distribution", showSampleNames = FALSE, xAxis2Show = TRUE, xAxisShow = FALSE) ```