--- title: "Nested Sampling" author: "Richèl J.C. Bilderbeek" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Nested Sampling} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r check_empty_cache_at_start, include = FALSE} beastier::remove_beaustier_folders() beastier::check_empty_beaustier_folders() ``` ## Introduction ![](babette_logo.png) This vignette demonstrates how to use the Nested Sampling approach to obtain the marginal likelihood and a Bayes factor, as described in [1] ## Setup ```{r} library(babette) ``` `babette` needs to have 'BEAST2' installed to work. In the case 'BEAST2' is not installed, we'll use this fabricated data: ```{r} out_jc69 <- create_test_bbt_run_output() out_jc69$ns$marg_log_lik <- c(-1.1) out_jc69$ns$marg_log_lik_sd <- c(0.1) out_gtr <- out_jc69 ``` Here we setup how to interpret the Bayes factor: ```{r} interpret_bayes_factor <- function(bayes_factor) { if (bayes_factor < 10^-2.0) { "decisive for GTR" } else if (bayes_factor < 10^-1.5) { "very strong for GTR" } else if (bayes_factor < 10^-1.0) { "strong for GTR" } else if (bayes_factor < 10^-0.5) { "substantial for GTR" } else if (bayes_factor < 10^0.0) { "barely worth mentioning for GTR" } else if (bayes_factor < 10^0.5) { "barely worth mentioning for JC69" } else if (bayes_factor < 10^1.0) { "substantial for JC69" } else if (bayes_factor < 10^1.5) { "strong for JC69" } else if (bayes_factor < 10^2.0) { "very strong for JC69" } else { "decisive for JC69" } } # Should all be TRUE interpret_bayes_factor(1 / 123.0) == "decisive for GTR" interpret_bayes_factor(1 / 85.0) == "very strong for GTR" interpret_bayes_factor(1 / 12.5) == "strong for GTR" interpret_bayes_factor(1 / 8.5) == "substantial for GTR" interpret_bayes_factor(1 / 1.5) == "barely worth mentioning for GTR" interpret_bayes_factor(0.99) == "barely worth mentioning for GTR" interpret_bayes_factor(1.01) == "barely worth mentioning for JC69" interpret_bayes_factor(1.5) == "barely worth mentioning for JC69" interpret_bayes_factor(8.5) == "substantial for JC69" interpret_bayes_factor(12.5) == "strong for JC69" interpret_bayes_factor(85.0) == "very strong for JC69" interpret_bayes_factor(123.0) == "decisive for JC69" ``` ## Experiment In this experiment, we will use the same DNA alignment to see which DNA nucleotide substitution model is the better fit. Load the DNA alignment, a subset of taxa from [2]: ```{r fig.width=7} fasta_filename <- get_babette_path("anthus_aco_sub.fas") image(ape::read.FASTA(fasta_filename)) ``` ## Do the run In this vignette, the MCMC run is set up to be short: ```{r} mcmc <- beautier::create_test_ns_mcmc() ``` For academic research, better use a longer MCMC chain (with an effective sample size above 200). Here we do two 'BEAST2' runs, with both site models: ```{r} if (is_beast2_installed() && is_beast2_pkg_installed("NS")) { inference_model <- create_inference_model( site_model = beautier::create_jc69_site_model(), mcmc = mcmc ) beast2_options <- create_beast2_options( beast2_path = beastier::get_default_beast2_bin_path() ) out_jc69 <- bbt_run_from_model( fasta_filename = fasta_filename, inference_model = inference_model, beast2_options = beast2_options ) bbt_delete_temp_files( inference_model = inference_model, beast2_options = beast2_options ) inference_model <- create_inference_model( site_model = beautier::create_gtr_site_model(), mcmc = mcmc ) beast2_options <- create_beast2_options( beast2_path = beastier::get_default_beast2_bin_path() ) out_gtr <- bbt_run_from_model( fasta_filename = fasta_filename, inference_model = inference_model, beast2_options = beast2_options ) bbt_delete_temp_files( inference_model = inference_model, beast2_options = beast2_options ) } ``` I will display the marginal likelihoods in a nice table: ```{r} if (is_beast2_installed() && is_beast2_pkg_installed("NS")) { df <- data.frame( model = c("JC69", "GTR"), mar_log_lik = c(out_jc69$ns$marg_log_lik, out_gtr$ns$marg_log_lik), mar_log_lik_sd = c(out_jc69$ns$marg_log_lik_sd, out_gtr$ns$marg_log_lik_sd) ) knitr::kable(df) } ``` The Bayes factor is ratio between the marginal (non-log) likelihoods. In this case, we use the simpler JC69 model as a focus: ```{r} if (is_beast2_installed() && is_beast2_pkg_installed("NS")) { bayes_factor <- exp(out_jc69$ns$marg_log_lik) / exp(out_gtr$ns$marg_log_lik) print(interpret_bayes_factor(bayes_factor)) } ``` Whatever the support is, be sure to take the error of the marginal likelihood estimation into account. ## References * [1] Maturana, P., Brewer, B. J., Klaere, S., & Bouckaert, R. (2017). Model selection and parameter inference in phylogenetics using Nested Sampling. arXiv preprint arXiv:1703.05471. * [2] Van Els, Paul, and Heraldo V. Norambuena. "A revision of species limits in Neotropical pipits Anthus based on multilocus genetic and vocal data." Ibis. ```{r check_empty_cache_at_end, include = FALSE} beastier::remove_beaustier_folders() beastier::check_empty_beaustier_folders() ```