--- title: "Finding communities in large datasets" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{large_datasets_communities} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(dendroNetwork) ``` ## Community detection in very large datasets When using larger datasets of tree-ring series, calculating the table with similarities can take a lot of time, but finding communities even more. It is therefore recommended to use of parallel computing for Clique Percolation: `clique_community_names_par(network, k=3, n_core = 4)`. This reduces the amount of time significantly. For most datasets `clique_community_names()` is sufficiently fast and for smaller datasets `clique_community_names_par()` can even be slower due to the parallelisation. Therefore, the funtion `clique_community_names()` should be used initially and if this is very slow, start using `clique_community_names_par()`. The workflow is similar as described in the `vignette("dendroNetwork")`, but with minor changes: 1. load network. 2. compute similarities. 3. find the maximum clique size: `igraph::clique_num(network)` . 4. detect communities for each clique size separately: - `com_cpm_k3 <- clique_community_names_par(network, k=3, n_core = 6)`. - `com_cpm_k4 <- clique_community_names_par(network, k=4, n_core = 6)`. - and so on until the maximum clique size. 5. merge these into a single `data frame` by `com_cpm_all <- rbind(com_cpm_k3,com_cpm_k4, com_cpm_k5,... )` . 6. create table for use in cytoscape with all communities: `com_cpm_all <- com_cpm_all |> dplyr::count(node, com_name) |> tidyr::spread(com_name, n)` . 7. Continue with the visualisation in Cytoscape, see the relevant section in the `vignette("dendroNetwork")` .