---
title: "Summarising results from predictNMB"
output: rmarkdown::html_vignette
bibliography: predictNMB-vignettes.bib
vignette: >
  %\VignetteIndexEntry{summarising-results-with-predictNMB}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```


```{r setup}
library(predictNMB)
library(parallel)
library(ggplot2)
library(flextable)
set.seed(42)
```

This vignette is purely about how to use the `autoplot()` method and `summary()` to visualise and summarise the simulations made using `{predictNMB}`. For an introduction to `{predictNMB}`, please see the [introductory vignette](https://docs.ropensci.org/predictNMB/articles/predictNMB.html).

Firstly, as an example case, we will prepare our sampling functions and run `screen_simulation_inputs()`.

```{r}
get_nmb_sampler_training <- get_nmb_sampler(
  wtp = 28033,
  qalys_lost = function() rnorm(n = 1, mean = 0.0036, sd = 0.0005),
  high_risk_group_treatment_cost = function() rnorm(n = 1, mean = 20, sd = 3),
  high_risk_group_treatment_effect = function() rbeta(n = 1, shape1 = 40, shape2 = 60),
  use_expected_values = TRUE
)

get_nmb_sampler_evaluation <- get_nmb_sampler(
  wtp = 28033,
  qalys_lost = function() rnorm(n = 1, mean = 0.0036, sd = 0.0005),
  high_risk_group_treatment_cost = function() rnorm(n = 1, mean = 20, sd = 3),
  high_risk_group_treatment_effect = function() rbeta(n = 1, shape1 = 40, shape2 = 60)
)
```

```{r, echo=FALSE}
sim_screen_obj <- readRDS("fixtures/predictNMB-sim_screen_obj.rds")
```

```{r, eval=FALSE}
cl <- makeCluster(2)
sim_screen_obj <- screen_simulation_inputs(
  n_sims = 500,
  n_valid = 10000,
  sim_auc = seq(0.7, 0.95, 0.05),
  event_rate = c(0.1, 0.2),
  fx_nmb_training = get_nmb_sampler_training,
  fx_nmb_evaluation = get_nmb_sampler_evaluation,
  cutpoint_methods = c("all", "none", "youden", "value_optimising"),
  cl = cl
)
stopCluster(cl)
```


# Making plots with predictNMB

## Plotting the results from `screen_simulation_inputs()` 

### Choosing the x-axis variable and other constants

In this simulation screen, we vary both the event rate and the model discrimination (sim_AUC).
There are many ways that we could visualise the data. The `autoplot()` function allows us to make some basic plots to compare the impact of different cutpoint methods on Net Monetary Benefit (NMB) and another variable of our choice. 

In this case, we can visualise the impact on NMB for different methods across varying levels of `sim_auc` or `event_rate`. We control this with the `x_axis_var` argument.

```{r}
autoplot(sim_screen_obj, x_axis_var = "sim_auc")
```

To avoid the overlap of points in this second plot, which visualises NMB at varying event rates, we can specify the `dodge_width` to be non-zero.
```{r}
autoplot(sim_screen_obj, x_axis_var = "event_rate", dodge_width = 0.002)
```

For these plots, one of the screened inputs will be the x-axis variable, but the other will only be displayed at a single level. For example, if we are looking at outcomes over a range of AUC values, the prevalence will be fixed. The default setting for this second input will assume the first level, so when we visualise the change in NMB specifying `sim_auc` as the x-axis variable, we only observe this for the case where `event_rate = 0.1`. We can choose to select another level with the `constants` argument. This argument expects a named list containing the values to keep for the screened inputs which are not shown on the x-axis.

```{r}
autoplot(sim_screen_obj, x_axis_var = "sim_auc", constants = list(event_rate = 0.1))
autoplot(sim_screen_obj, x_axis_var = "sim_auc", constants = list(event_rate = 0.2))
```

We see both a change to the plot as well as the message produced when the plot is made.

### Choosing a y-axis variable

There are three options for the y-axis. The default is the NMB, but you can also visualise the Incremental Net Monetary Benefit (INB) and the selected cutpoints. These are controlled by the `what` argument, which can be any of `c("nmb", "inb", "cutpoints")`. If a vector is used, only the first value will be selected. If you choose to visualise the INB, you must list your chosen reference strategy for the calculation in the `inb_ref_col`. In this case, we use treat-all (`"all"`).

```{r, message=FALSE}
autoplot(sim_screen_obj, what = "nmb")
autoplot(sim_screen_obj, what = "inb", inb_ref_col = "all")
autoplot(sim_screen_obj, what = "cutpoints")
```


### Selecting what to show on the plot

The plots show the median (the dot), the 95% confidence interval (thick vertical lines), the range (thin vertical lines), and the lines between the points by default. These can each be shown or hidden independently, and the width of the confidence interval can be controlled using the `plot_conf_level` argument.

```{r, message=FALSE}
autoplot(sim_screen_obj)
autoplot(sim_screen_obj, plot_range = FALSE)
autoplot(sim_screen_obj, plot_conf_level = FALSE)
autoplot(sim_screen_obj, plot_conf_level = FALSE, plot_range = FALSE)
autoplot(sim_screen_obj, plot_conf_level = FALSE, plot_range = FALSE, plot_line = FALSE)
```


### Other plot modifications

Currently, the lines and dots overlap. We can use `dodge_width` to apply a horizontal dodge for all layers.

```{r, message=FALSE}
autoplot(sim_screen_obj, dodge_width = 0.01)
```

The cutpoint methods can be renamed or removed. To rename them, pass a named vector to the `rename_vector` argument. The names of the vector are the new names and the values are the names you wish to replace.

```{r, message=FALSE}
autoplot(
  sim_screen_obj,
  rename_vector = c("Treat All" = "all", 
                    "Treat None" = "none",
                    "Youden Index" = "youden", 
                    "Value Optimisation" = "value_optimising")
)
```

You can reorder the methods by passing the order as the `methods_order` argument. Also note that this will remove all methods which aren't included, and it will factor the names AFTER it has renamed them. So, if you are both renaming and re-ordering, you must provide the updated names when you order them:

```{r, message=FALSE}
autoplot(sim_screen_obj, methods_order = c("all", "none"))
autoplot(
  sim_screen_obj,
  # Assign new names to the two methods of interest
  rename_vector = c("Treat All" = "all", "Treat None" = "none"), 
  
  # Call the methods by their new names
  methods_order = c("Treat All", "Treat None") 
)
```

The transparency of all layers can be controlled with `plot_alpha`.

```{r, message=FALSE}
autoplot(sim_screen_obj, plot_alpha = 0.2)
autoplot(sim_screen_obj, plot_alpha = 1)
```

## Plotting the results from `do_nmb_sim()`

Many of the same arguments that we used above can be used with the object returned from `do_nmb_sim()`

```{r, include=FALSE}
do_nmb_sim_obj <- sim_screen_obj$simulations[[1]]
```

```{r, eval=FALSE}
do_nmb_sim_obj <- do_nmb_sim(
  n_sims = 500,
  n_valid = 10000,
  sim_auc = 0.7,
  event_rate = 0.1,
  fx_nmb_training = get_nmb_sampler_training,
  fx_nmb_evaluation = get_nmb_sampler_evaluation,
  cutpoint_methods = c("all", "none", "youden", "value_optimising")
)
```

### `autoplot()`

The plots here show the results of a single simulation and compare the available cutpoints.

```{r}
autoplot(do_nmb_sim_obj) + theme_sim()
```

The y-axis variable and names and orders of methods can be controlled in the same way as previously:

```{r}
autoplot(do_nmb_sim_obj, what = "nmb") + theme_sim()
autoplot(
  do_nmb_sim_obj,
  what = "inb",
  inb_ref_col = "all",
  rename_vector = c(
    "Value-Optimising" = "value_optimising",
    "Treat-None" = "none",
    "Youden Index" = "youden"
  )
) + theme_sim()
autoplot(
  do_nmb_sim_obj,
  what = "cutpoints",
  methods_order = c("all", "none", "youden", "value optimising")
) + theme_sim()
```

These plots display the median as the solid bar, the grey part of the distributions are the outer 5% of the simulated values and the light blue region is the 95% CI. For the methods that select thresholds based on the values in the 2x2 table, including the value-optimising thresholds, this may look a little strange as the cutpoints are highly variable. This can be stabilised with more simulations. The fill colours of the histogram are controlled with `fill_cols` and the line for the median is controlled with `median_line_col`. The thickness of the median line is controlled with `median_line_size` and its transparency with `median_line_alpha`. 

```{r}
autoplot(
  do_nmb_sim_obj,
  fill_cols = c("red", "blue"),
  median_line_col = "yellow",
  median_line_alpha = 1,
  median_line_size = 0.9
) + theme_sim()
```

The `n_bins` argument controls the number of bins used for the histograms and the `label_wrap_width` is the number of characters above which to start a new line for the facet labels. This can be handy when using detailed names for the methods when the font of the label is relatively large compared to the plot, though a space is needed to determine where to split the label. The width of the confidence intervals can also be controlled by the `conf.level` argument in this `autoplot()` call.

```{r, fig.height=3, fig.width=6}
autoplot(
  do_nmb_sim_obj,
  n_bins = 15,
  rename_vector = c(
    "Value- Optimising" = "value_optimising",
    "Treat- None" = "none",
    "Treat- All" = "all",
    "Youden Index" = "youden"
  ),
  label_wrap_width = 5,
  conf.level = 0.8
) + theme_sim()
```


### `ce_plot()`

A cost-effectiveness plot may be useful to decision-makers deciding between competing alternatives as it shows both the incremental healthcare costs associated with the change as well as the incremental improvement in patient benefit (QALYs). For a detailed explanation on how to interpret cost-effectiveness plots, see [@fenwick2006].

For these plots, a reference group must be specified with `ref_col`. In this example, the reference group is to treat none.

```{r}
ce_plot(do_nmb_sim_obj, ref_col = "none")
```

Similarly to `autoplot()`, the methods can be selected and renamed using `methods_order` and `rename_vector`. The diagonal line is the cost-effectiveness plane and this is based on the willingness to pay. In this example, it was \$28033 and is stored within the `predictNMBsim` object via the `fx_nmb_evaluation` function that was originally used. Simulations underneath the cost-effectiveness plane are deemed cost-effective.

```{r}
attr(do_nmb_sim_obj$meta_data$fx_nmb_evaluation, "wtp")
```
The WTP can be adjusted using the WTP argument but it is not advisable to use a different value to what was used in the initial simulations because the plotted WTP will be different to the simulation objects. If a different WTP value should be investigated, it is advisable to re-run the simulations again as `do_nmb_sim` objects.

```{r}
ce_plot(do_nmb_sim_obj, ref_col = "none", wtp = 100000)
```

The line can be removed by using `show_wtp = FALSE`.

```{r}
ce_plot(do_nmb_sim_obj, ref_col = "none", show_wtp = FALSE)
```

The `shape` argument controls the shape aesthetic in `geom_point()`. It can also be mapped to whether the point is "cost-effective" (falls beneath the cost-effectiveness plane/WTP line) or the cutpoint "method", and the proportion of cost-effective simulations can be added to the methods section of the legend if `add_prop_ce = TRUE`. For the sake of demonstration, we'll change the WTP line for the "cost-effective" shape input.
```{r}
ce_plot(do_nmb_sim_obj, ref_col = "none", shape = 15, add_prop_ce = TRUE)
ce_plot(do_nmb_sim_obj, ref_col = "none", shape = "square", add_prop_ce = TRUE)
ce_plot(do_nmb_sim_obj, ref_col = "none", shape = "cost-effective", wtp = 80000, add_prop_ce = TRUE)
ce_plot(do_nmb_sim_obj, ref_col = "none", shape = "method", add_prop_ce = TRUE)
```

If you want the method to be mapped to shape but not colour, then, just as with any `ggplot` object, you can adjust the values on the returned plot with `ggplot2::scale_color_manual()`.
```{r}
ce_plot(do_nmb_sim_obj, ref_col = "none", shape = "method") + 
  ggplot2::scale_color_manual(values = rep("black", 3))
```


# Making tables with predictNMB

To make tables from the same objects as we used for the plots, we instead use `summary()`. This can be applied to either type of object (`screen_simulation_inputs()` or `do_nmb_sim()`). Using the `%>%` (or `|>`) operator, we can pass it straight to `flextable()` from the `{flextable}` package that we have already loaded.


```{r, eval=FALSE}
summary(sim_screen_obj)
```
```{r, echo=FALSE}
summary(sim_screen_obj) %>% flextable()
```

```{r, eval=FALSE}
summary(do_nmb_sim_obj)
```

```{r, echo=FALSE}
summary(do_nmb_sim_obj) %>% flextable()
```

By default, the methods are aggregated by the median and the 95% confidence intervals (and rounded to 2 and 1 decimal places, respectively). These are the default list of functions passed to the `summary()` as the `agg_functions` argument. These can be changed to any functions which aggregate a numeric vector.

```{r, eval=FALSE}
summary(
  do_nmb_sim_obj,
  agg_functions = list(
    "mean" = function(x) round(mean(x), digits=2),
    "min" = min, 
    "max" = max
  )
)
```

```{r, echo=FALSE}
summary(
  do_nmb_sim_obj,
  agg_functions = list(
    "mean" = function(x) round(mean(x), digits=2),
    "min" = min, 
    "max" = max
  )
) %>%
  flextable()
```

The `what` and `rename_vector` arguments work in the same way as they did when using `autoplot()`.

```{r, eval=FALSE}
summary(
  do_nmb_sim_obj,
  what = "inb",
  inb_ref_col = "all",
  rename_vector = c(
    "Value-Optimising" = "value_optimising",
    "Treat-None" = "none",
    "Youden Index" = "youden"
  )
)
```

```{r, echo=FALSE}
summary(
  do_nmb_sim_obj,
  what = "inb",
  inb_ref_col = "all",
  rename_vector = c(
    "Value-Optimising" = "value_optimising",
    "Treat-None" = "none",
    "Youden Index" = "youden"
  )
) %>%
  flextable()
```

```{r, eval=FALSE}
summary(sim_screen_obj)
```

```{r, echo=FALSE}
summary(sim_screen_obj) %>% flextable()
```

The summary table contains the same outputs for both `do_nmb_sim()` and `screen_simulation_inputs()`, but they are arranged slightly differently. Each row in table for this summary is for a unique set of inputs. By default, this is trimmed to include only those inputs that vary in our function call — here, `sim_auc` and `event_rate` — by using the `show_full_inputs` argument. By default, only the inputs that vary are shown. However, we can set `show_full_inputs = TRUE` to see more.

```{r, eval=FALSE}
summary(sim_screen_obj, show_full_inputs = TRUE)
```

In the table below, we merge repeated values using `merge_v()` and add the `theme_box()` to make it a bit easier to read. (You can see more about making tables with `{flextable}` [here](https://davidgohel.github.io/flextable/).)

```{r, echo=FALSE}
summary(sim_screen_obj, show_full_inputs = TRUE) %>%
  flextable() %>%
  merge_v(j = 1:9) %>% 
  theme_box()
```


# References