--- title: "BaseSet" abstract: > Describes the background of the package, important functions defined in the package and some of the applications and usages. date: "`r format(Sys.time(), '%Y %b %d')`" output: html_document: fig_caption: true code_folding: show self_contained: yes toc_float: collapsed: true toc_depth: 3 author: - name: LluĂ­s Revilla email: lluis.revilla@gmail.com vignette: > %\VignetteIndexEntry{BaseSet} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} %\DeclareUnicodeCharacter{2229}{$\cap$} %\DeclareUnicodeCharacter{222A}{$\cup$} editor_options: chunk_output_type: console --- ```{r setup, message=FALSE, warning=FALSE, include=FALSE} knitr::opts_knit$set(root.dir = ".") knitr::opts_chunk$set(collapse = TRUE, warning = TRUE, comment = "#>") ``` # Getting started This vignette explains how to work with sets using this package. The package provides a class to store the information efficiently and functions to work with it. # The TidySet class To create a `TidySet` object, to store associations between elements and sets image we have several genes associated with a characteristic. ```{r from_list, message=FALSE} library("BaseSet") gene_lists <- list( geneset1 = c("A", "B"), geneset2 = c("B", "C", "D") ) tidy_set <- tidySet(gene_lists) tidy_set ``` This is then stored internally in three slots `relations()`, `elements()`, and `sets()` slots. If you have more information for each element or set it can be added: ```{r metadata, message=FALSE} gene_data <- data.frame( stat1 = c( 1, 2, 3, 4 ), info1 = c("a", "b", "c", "d") ) tidy_set <- add_column(tidy_set, "elements", gene_data) set_data <- data.frame( Group = c( 100 , 200 ), Column = c("abc", "def") ) tidy_set <- add_column(tidy_set, "sets", set_data) tidy_set ``` This data is stored in one of the three slots, which can be directly accessed using their getter methods: ```{r getters} relations(tidy_set) elements(tidy_set) sets(tidy_set) ``` You can add as much information as you want, with the only restriction for a "fuzzy" column for the `relations()`. See the Fuzzy sets vignette: `vignette("Fuzzy sets", "BaseSet")`. You can also use the standard R approach with `[`: ```{r} gene_data <- data.frame( stat2 = c( 4, 4, 3, 5 ), info2 = c("a", "b", "c", "d") ) tidy_set$info1 <- NULL tidy_set[, "elements", c("stat2", "info2")] <- gene_data tidy_set[, "sets", "Group"] <- c("low", "high") tidy_set ``` Observe that one can add, replace or delete # Creating a TidySet As you can see it is possible to create a TidySet from a list. More commonly you can create it from a data.frame: ```{r tidyset_data.frame} relations <- data.frame(elements = c("a", "b", "c", "d", "e", "f"), sets = c("A", "A", "A", "A", "A", "B"), fuzzy = c(1, 1, 1, 1, 1, 1)) TS <- tidySet(relations) TS ``` It is also possible from a matrix: ```{r tidySet_matrix} m <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0), ncol = 3, nrow = 3, dimnames = list(letters[1:3], LETTERS[1:3])) m tidy_set <- tidySet(m) tidy_set ``` Or they can be created from a GeneSet and GeneSetCollection objects. Additionally it has several function to read files related to sets like the OBO files (`getOBO`) and GAF (`getGAF`) # Converting to other formats It is possible to extract the gene sets as a `list`, for use with functions such as `lapply`. ```{r as.list} as.list(tidy_set) ``` Or if you need to apply some network methods and you need a matrix, you can create it with `incidence`: ```{r incidence} incidence(tidy_set) ``` # Operations with sets To work with sets several methods are provided. In general you can provide a new name for the resulting set of the operation, but if you don't one will be automatically provided using `naming()`. All methods work with fuzzy and non-fuzzy sets ## Union You can make a union of two sets present on the same object. ```{r union} BaseSet::union(tidy_set, sets = c("C", "B"), name = "D") ``` ## Intersection ```{r intersection} intersection(tidy_set, sets = c("A", "B"), name = "D", keep = TRUE) ``` The keep argument used here is if you want to keep all the other previous sets: ```{r intersection2} intersection(tidy_set, sets = c("A", "B"), name = "D", keep = FALSE) ``` ## Complement We can look for the complement of one or several sets: ```{r complement} complement_set(tidy_set, sets = c("A", "B")) ``` Observe that we haven't provided a name for the resulting set but we can provide one if we prefer to ```{r complement2} complement_set(tidy_set, sets = c("A", "B"), name = "F") ``` ## Subtract This is the equivalent of `setdiff`, but clearer: ```{r subtract} out <- subtract(tidy_set, set_in = "A", not_in = "B", name = "A-B") out name_sets(out) subtract(tidy_set, set_in = "B", not_in = "A", keep = FALSE) ``` See that in the first case there isn't any element present in B not in set A, but the new set is stored. In the second use case we focus just on the elements that are present on B but not in A. # Additional information The number of unique elements and sets can be obtained using the `nElements()` and `nSets()` methods. ```{r n} nElements(tidy_set) nSets(tidy_set) nRelations(tidy_set) ``` If you wish to know all in a single call you can use `dim(tidy_set)`: `r dim(tidy_set)`. This summary doesn't provide the number of relations of each set. You can quickly obtain that with `lengths(tidy_set)`: `r lengths(tidy_set)` The size of each set can be obtained using the `set_size()` method. ```{r set_size} set_size(tidy_set) ``` Conversely, the number of sets associated with each gene is returned by the `element_size()` function. ```{r element_size} element_size(tidy_set) ``` The identifiers of elements and sets can be inspected and renamed using `name_elements` and ```{r name} name_elements(tidy_set) name_elements(tidy_set) <- paste0("Gene", seq_len(nElements(tidy_set))) name_elements(tidy_set) name_sets(tidy_set) name_sets(tidy_set) <- paste0("Geneset", seq_len(nSets(tidy_set))) name_sets(tidy_set) ``` # Using `dplyr` verbs You can also use `mutate()`, `filter()`, `select()`, `group_by()` and other `dplyr` verbs with TidySets. You usually need to activate which three slots you want to affect with `activate()`: ```{r tidyverse} library("dplyr") m_TS <- tidy_set %>% activate("relations") %>% mutate(Important = runif(nRelations(tidy_set))) m_TS ``` You can use activate to select what are the verbs modifying: ```{r deactivate} set_modified <- m_TS %>% activate("elements") %>% mutate(Pathway = if_else(elements %in% c("Gene1", "Gene2"), "pathway1", "pathway2")) set_modified set_modified %>% deactivate() %>% # To apply a filter independently of where it is filter(Pathway == "pathway1") ``` If you think you need `group_by` usually this could mean that you need a new set. You can create a new one with `group`. ```{r group} # A new group of those elements in pathway1 and with Important == 1 set_modified %>% deactivate() %>% group(name = "new", Pathway == "pathway1") ``` ```{r group2} set_modified %>% group("pathway1", elements %in% c("Gene1", "Gene2")) ``` You can use `group_by()` but it won't return a `TidySet`. ```{r group_by} set_modified %>% deactivate() %>% group_by(Pathway, sets) %>% count() ``` After grouping or mutating sometimes we might be interested in moving a column describing something to other places. We can do by this with: ```{r moving} elements(set_modified) out <- move_to(set_modified, "elements", "relations", "Pathway") relations(out) ``` # Session info {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ```