---
title: "BaseSet"
abstract: >
  Describes the background of the package, important functions defined in the
  package and some of the applications and usages.
date: "`r format(Sys.time(), '%Y %b %d')`"
output:
  html_document:
    fig_caption: true
    code_folding: show
    self_contained: yes
    toc_float:
      collapsed: true
      toc_depth: 3
author:
- name: Lluís Revilla
  email: lluis.revilla@gmail.com
vignette: >
  %\VignetteIndexEntry{BaseSet}
  %\VignetteEncoding{UTF-8}  
  %\VignetteEngine{knitr::rmarkdown}
  %\DeclareUnicodeCharacter{2229}{$\cap$}
  %\DeclareUnicodeCharacter{222A}{$\cup$}
editor_options: 
  chunk_output_type: console
---
```{r setup, message=FALSE, warning=FALSE, include=FALSE}
knitr::opts_knit$set(root.dir = ".")
knitr::opts_chunk$set(collapse = TRUE, 
                      warning = TRUE,
                      comment = "#>")
```

# Getting started

This vignette explains how to work with sets using this package.
The package provides a class  to store the information efficiently and functions to work with it. 


# The TidySet class

To create a `TidySet` object, to store associations between elements and sets 
image we have several genes associated with a characteristic.

```{r from_list, message=FALSE}
library("BaseSet")
gene_lists <- list(
    geneset1 = c("A", "B"),
    geneset2 = c("B", "C", "D")
)
tidy_set <- tidySet(gene_lists)
tidy_set
```

This is then stored internally in three slots `relations()`, `elements()`, and `sets()` slots.

If you have more information for each element or set it can be added:

```{r metadata, message=FALSE}
gene_data <- data.frame(
    stat1     = c( 1,   2,   3,   4 ),
    info1     = c("a", "b", "c", "d")
)

tidy_set <- add_column(tidy_set, "elements", gene_data)
set_data <- data.frame(
    Group     = c( 100 ,  200 ),
    Column     = c("abc", "def")
)
tidy_set <- add_column(tidy_set, "sets", set_data)
tidy_set
```

This data is stored in one of the three slots, which can be directly accessed using their getter methods:

```{r getters}
relations(tidy_set)
elements(tidy_set)
sets(tidy_set)
```

You can add as much information as you want, with the only restriction for a "fuzzy" column for the `relations()`. See the Fuzzy sets vignette: `vignette("Fuzzy sets", "BaseSet")`.

You can also use the standard R approach with `[`:

```{r}
gene_data <- data.frame(
    stat2     = c( 4,   4,   3,   5 ),
    info2     = c("a", "b", "c", "d")
)

tidy_set$info1 <- NULL
tidy_set[, "elements", c("stat2", "info2")] <- gene_data
tidy_set[, "sets", "Group"] <- c("low", "high")
tidy_set
```

Observe that one can add, replace or delete

# Creating a TidySet

As you can see it is possible to create a TidySet from a list.
More commonly you can create it from a data.frame:

```{r tidyset_data.frame}
relations <- data.frame(elements = c("a", "b", "c", "d", "e", "f"), 
                        sets = c("A", "A", "A", "A", "A", "B"), 
                        fuzzy = c(1, 1, 1, 1, 1, 1))
TS <- tidySet(relations)
TS
```

It is also possible from a matrix:

```{r tidySet_matrix}
m <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0), ncol = 3, nrow = 3,  
               dimnames = list(letters[1:3], LETTERS[1:3]))
m
tidy_set <- tidySet(m)
tidy_set
```

Or they can be created from a GeneSet and GeneSetCollection objects. 
Additionally it has several function to read files related to sets like the OBO files (`getOBO`) and GAF (`getGAF`)

# Converting to other formats

It is possible to extract the gene sets as a `list`, for use with functions such as `lapply`.

```{r as.list}
as.list(tidy_set)
```

Or if you need to apply some network methods and you need a matrix, you can create it with `incidence`:

```{r incidence}
incidence(tidy_set)
```

# Operations with sets

To work with sets several methods are provided. In general you can provide a new name for the resulting set of the operation, but if you don't one will be automatically provided using `naming()`. All methods work with fuzzy and non-fuzzy sets

## Union

You can make a union of two sets present on the same object.

```{r union}
BaseSet::union(tidy_set, sets = c("C", "B"), name = "D")
```

## Intersection


```{r intersection}
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = TRUE)
```

The keep argument used here is if you want to keep all the other previous sets:

```{r intersection2}
intersection(tidy_set, sets = c("A", "B"), name = "D", keep = FALSE)
```

## Complement

We can look for the complement of one or several sets:

```{r complement}
complement_set(tidy_set, sets = c("A", "B"))
```

Observe that we haven't provided a name for the resulting set but we can provide one if we prefer to

```{r complement2}
complement_set(tidy_set, sets = c("A", "B"), name = "F")
```

## Subtract

This is the equivalent of `setdiff`, but clearer:

```{r subtract}
out <- subtract(tidy_set, set_in = "A", not_in = "B", name = "A-B")
out
name_sets(out)
subtract(tidy_set, set_in = "B", not_in = "A", keep = FALSE)
```

See that in the first case there isn't any element present in B not in set A, but the new set is stored. 
In the second use case we focus just on the elements that are present on B but not in A.

# Additional information

The number of unique elements and sets can be obtained using the `nElements()` and `nSets()` methods.

```{r n}
nElements(tidy_set)
nSets(tidy_set)
nRelations(tidy_set)
```

If you wish to know all in a single call you can use `dim(tidy_set)`: `r dim(tidy_set)`.
This summary doesn't provide the number of relations of each set.
You can quickly obtain that with `lengths(tidy_set)`: `r lengths(tidy_set)`

The size of each set can be obtained using the `set_size()` method.

```{r set_size}
set_size(tidy_set)
```

Conversely, the number of sets associated with each gene is returned by the 
`element_size()` function.

```{r element_size}
element_size(tidy_set)
```

The identifiers of elements and sets can be inspected and renamed using `name_elements` and 

```{r name}
name_elements(tidy_set)
name_elements(tidy_set) <- paste0("Gene", seq_len(nElements(tidy_set)))
name_elements(tidy_set)
name_sets(tidy_set)
name_sets(tidy_set) <- paste0("Geneset", seq_len(nSets(tidy_set)))
name_sets(tidy_set)
```


# Using `dplyr` verbs

You can also use `mutate()`, `filter()`, `select()`, `group_by()` and other `dplyr` verbs with TidySets.
You usually need to activate which three slots you want to affect with `activate()`:

```{r tidyverse}
library("dplyr")
m_TS <- tidy_set %>% 
  activate("relations") %>% 
  mutate(Important = runif(nRelations(tidy_set)))
m_TS
```

You can use activate to select what are the verbs modifying:

```{r deactivate}
set_modified <- m_TS %>% 
  activate("elements") %>% 
  mutate(Pathway = if_else(elements %in% c("Gene1", "Gene2"), 
                           "pathway1", 
                           "pathway2"))
set_modified
set_modified %>% 
  deactivate() %>% # To apply a filter independently of where it is
  filter(Pathway == "pathway1")
```


If you think you need `group_by` usually this could mean that you need a new set.
You can create a new one with `group`. 


```{r group}
# A new group of those elements in pathway1 and with Important == 1
set_modified %>% 
  deactivate() %>% 
  group(name = "new", Pathway == "pathway1")
```

```{r group2}
set_modified %>% 
  group("pathway1", elements %in% c("Gene1", "Gene2"))
```

You can use `group_by()` but it won't return a `TidySet`. 

```{r group_by}
set_modified %>% 
    deactivate() %>% 
    group_by(Pathway, sets) %>%  
    count()
```


After grouping or mutating sometimes we might be interested in moving a column describing something to other places. We can do by this with:

```{r moving}
elements(set_modified)
out <- move_to(set_modified, "elements", "relations", "Pathway")
relations(out)
```

# Session info {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```