---
title: "Primers and probes calibration vignette"
author: "Edward Wallace"
date: "April 2022"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Primers and probes calibration vignette}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---


# Summary: calibrating primer sets from a real experimental test

This vignette shows how to use tidyqpcr functions to calibrate qPCR probes.

This is real qPCR data by Edward Wallace in Feb 2019, testing new RT-qPCR primer sets against _S. cerevisiae_ genes. We took exponential-phase total RNA previously extracted by Jamie Auxillos.

We tested 2-3 primer sets each for 7 genes:

* ECM38/YLR299W (3 primer sets)
* FET5/YFL041W (3 primer sets)
* GPT2/YKR067W
* ILV5/YLR355C
* NRD1/YNL251C
* RDL1/YOR285W
* TFS1/YLR178C

We started with two biological replicate RNA samples, treated with DNase and then split for a test sample with reverse transcriptase (RT) and negative control without reverse transcriptase (-RT). We also took a no template (NT) negative control. For each RT reaction we do serial 5x dilutions down to 125x to form a quantitative calibration curve.

The data were measured on a Roche LC480 instrument in a single 384-well plate.
Quantification was performed in the Roche LightCycler software prior to exporting the data.


## Setup knitr options and load packages

```{r setup,warning=FALSE,message=FALSE}
# knitr options for report generation
knitr::opts_chunk$set(
  warning = FALSE, message = FALSE, echo = TRUE, cache = FALSE,
  results = "show"
)

# Load packages
library(tidyr)
library(ggplot2)
library(dplyr)
library(tidyqpcr)

# set default theme for graphics
theme_set(theme_bw(base_size = 11) %+replace%
  theme(
    strip.background = element_blank(),
    panel.grid = element_blank()
  ))
```

# Set up experiment

## Describe which primer set we put in which well using a row key

In this experiment, each primer set was in a different row of the 384-well plate.
We describe this by creating a row key, a data frame describing the rows of the plate, the primer sets, and the genes that they detect.

We refer to each primer set as a `target_id`, because each primer set has a different target amplicon, on a different location in a gene.
tidyqpcr insists on having a variable called `target_id` that uniquely identifies each different target that you detect.

```{r create_rowkey,dependson="plate_functions"}
# Names of target genes
gene_name_levels <- c("ECM38", "FET5", "GPT2", "ILV5", "NRD1", "RDL1", "TFS1")
# ORF ids of target genes
target_levels <- c("YLR299W", "YFL041W", "YKR067W", "YLR355C",
                   "YNL251C", "YOR285W", "YLR178C")
# Repeats of gene names to account for testing multiple primer sets
gene_name_values <- c(rep(gene_name_levels[1:2], each = 3),
                      rep(gene_name_levels[3:7], each = 2))
# id numbers of multiple probesets (reflecting IDs as ordered)
target_id_levels <- paste(gene_name_values,
  c(1, 2, 3, 1, 3, 4, 1, 4, 1, 4, 1, 2, 4, 5, 1, 5),
  sep = "_"
)


rowkey <- tibble(
  well_row = LETTERS[1:16],
  gene_name = factor(gene_name_values, levels = gene_name_levels),
  target_id = factor(target_id_levels, levels = target_id_levels)
)
print(rowkey)
```

## Combine the row key describing primer sets with column key describing on samples and dilutions.

We use a default design built in to tidyqpcr, `create_colkey_4diln_2ctrl_in_24`.

```{r label_plates,dependson=c("plate_functions","create_rowkey")}
plate1plan <-
  label_plate_rowcol(
    create_blank_plate(),
    rowkey,
    create_colkey_4diln_2ctrl_in_24()
  ) %>%
  mutate(sample_id = paste(biol_rep, dilution_nice, sep = "_"))
```

## Spot-check the plate plan 

Checks that for selected technical replicate/probe/dilution combinations, the plate contains the right number of replicates.

```{r print_techreps,results="show"}
plate1plan %>%
  filter(tech_rep == "1",
         target_id == target_id_levels[1],
         dilution_nice == "1x")

plate1plan %>%
  filter(tech_rep == "2",
         target_id == target_id_levels[4])
```

## Display the plate plan

This can be printed out to facilitate loading the plate correctly.

```{r display_plates,fig.height=8,fig.width=12,dependson="label_plates"}
display_plate_qpcr(plate1plan)
```


# Analyse Cq (quantification cycle count) data

## Load and summarize data

```{r load_plates,dependson="label_plates",results="show"}
# NOTE: system.file() accesses data from this R package
# To use your own data, remove the call to system.file(),
# instead pass your data's filename to read_lightcycler_1colour_cq()
# or to another relevant read_ function
file_path_cq <- system.file("extdata",
              "Edward_qPCR_Nrd1_calibration_2019-02-02_Cq.txt.gz",
              package = "tidyqpcr")

plates <- 
  file_path_cq %>%
  read_lightcycler_1colour_cq() %>%
  right_join(plate1plan, by = "well")
```


```{r show_plates,dependson="load_plates",results="show"}
plates

summary(plates)
```

## Visualise Cq values for each well.

Visualising the Cq values shows that the Cq value is different for each primer set in each row.
Within each section of a row for a single replicate of dilutions, Cq consistently increases with dilutions as expected.
The grey tiles for most -RT and NT columns mean that the value is `NA`, i.e. no Cq value was reported. This is good.

```{r check_edge_effects,fig.height=4,fig.width=6,dependson="load_plates",results="show"}
display_plate_value(plates)
```

Visualisation might also help to identify unwanted positional effects.
For example, a PCR machine is broken, wells close to an edge of the plate can have different behaviour from central wells. 

## Plot unnormalized data shows that -RT and NT controls are low

This plot visualises the Cq data in a way that highlights the meaning instead of the position on the plate.
Again, it shows that the Cq value is different for each primer set, and that for each primer st Cq consistently increases with dilutions as expected.

Again, we detect no signal in NT (no template) negative control so those points are mostly missing.
There is a very weak signal with high Cq in some -RT (no reverse transcriptase) negative controls.

```{r plot_unnormalized,dependson="load_plates",fig.height=6,fig.width=9}

ggplot(data = plates) +
  geom_point(aes(x = target_id,
                 y = cq,
                 colour = dilution_nice,
                 shape = prep_type),
    position = position_jitter(width = 0.2, height = 0)
  ) +
  labs(
    y = "Cycle count to threshold",
    title = "All reps, unnormalized"
  ) +
  facet_wrap(~biol_rep) +
  scale_y_continuous(breaks = seq(from = 15, to = 35, by = 5)) + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
        panel.grid.major.y = element_line(colour="grey80", size = 0.2))
```


## Dilution series is linear for all probes

Visual display of linearity of cq with log(dilution).

```{r plot_dilutions,dependson="load_plates",fig.height=11,fig.width=6}
ggplot(data = filter(plates, prep_type == "+RT"), aes(x = dilution, y = cq)) +
  geom_point() +
  stat_smooth(
    formula = y ~ x, method = "lm", se = FALSE,
    aes(colour = "fit", linetype = "fit")
  ) +
  stat_smooth(
    formula = y ~ 1 + offset(-x * log(10) / log(2)), method = "lm", se = FALSE,
    aes(colour = "theory", linetype = "theory")
  ) +
  scale_x_log10(breaks = 10 ^ - (0:3)) +
  scale_y_continuous(breaks = seq(0, 30, 2)) +
  labs(
    y = "Cycle count to threshold",
    title = "All reps, unnormalized",
    colour = "Dilution", linetype = "Dilution"
  ) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
```

## Calculate primer efficiencies for all probes

Use regression to estimate linearity of cq with log(dilution), including the slope or efficiency.

```{r calibrate_dilutions,dependson="load_plates",results="show"}
calculate_efficiency_bytargetid(plates)
```


## Dilution series for nice probes only shows linearity clearly

```{r plot_dilutions_nice,dependson="load_plates",fig.height=6,fig.width=4}
target_id_levels_niceprobes <- 
    c("ECM38_3", "FET5_1", "GPT2_4", "ILV5_4",
      "NRD1_1",  "RDL1_4", "TFS1_1")

ggplot(
  data = filter(plates,
                prep_type == "+RT",
                target_id %in% target_id_levels_niceprobes),
  aes(x = dilution, y = cq)
) +
  geom_point() +
  stat_smooth(
    formula = y ~ x, method = "lm", se = FALSE,
    aes(colour = "fit", linetype = "fit")
  ) +
  stat_smooth(
    formula = y ~ 1 + offset(-x * log(10) / log(2)),
    method = "lm",
    se = FALSE,
    aes(colour = "theory", linetype = "theory")
  ) +
  scale_x_log10(breaks = 10 ^ - (0:3)) +
  scale_y_continuous(breaks = seq(0, 30, 2)) +
  labs(
    y = "Cycle count to threshold",
    title = "All reps, unnormalized",
    colour = "Dilution", linetype = "Dilution"
  ) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
```


# Analyse amplification and melt curve data

## Load raw data for amplification and melt curves.


```{r load_amp,dependson="label_plates",results="show"}
# NOTE: system.file() accesses data from this R package
# To use your own data, remove the call to system.file(),
# instead pass your data's filename to read_lightcycler_1colour_cq()
# or to another relevant read_ function

file_path_raw <- system.file("extdata/Edward_qPCR_Nrd1_calibration_2019-02-02.txt.gz",
                             package = "tidyqpcr")

plate1curve <- file_path_raw %>%
    read_lightcycler_1colour_raw() %>%
    debaseline() %>%
    left_join(plate1plan, by = "well")

# amplification curve is program 2
platesamp <- plate1curve %>%
  filter(program_no == 2)

# melt curve is program 3 or 4, depending on cycler setting
platesmelt <- plate1curve %>%
  filter(program_no == 3) %>%
  calculate_drdt_plate() %>%
  filter(temperature >= 61)
```


## Plot de-baseline'd raw data for single well

```{r plotamp_A1,dependson="load_amp",fig.width=4,fig.height=3}
ggplot(
  data = platesamp %>% filter(well == "A1"),
  aes(x = cycle, y = fluor_signal)
) +
  geom_line() +
  scale_y_continuous(expand = c(0.01, 0.01))
```

## Plot all amplification curves

Broken up by technical replicate here, to avoid overplotting.

```{r plotamp_all,dependson="load_amp",fig.height=11,fig.width=7}
ggplot(
  data = platesamp %>%
    filter(tech_rep == "1"),
  aes(x = cycle,
      y = fluor_signal,
      colour = factor(dilution),
      linetype = prep_type)
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  scale_linetype_manual(values = c("+RT" = "solid",
                                   "-RT" = "dashed",
                                   "NT" = "dotted")) +
  geom_line() +
  scale_x_continuous(breaks = seq(60, 100, 10),
                     minor_breaks = seq(60, 100, 5)) +
  labs(title = "All Amp Curves, tech_rep A")

ggplot(
  data = platesamp %>%
    filter(tech_rep == "2"),
  aes(x = cycle,
      y = fluor_signal,
      colour = factor(dilution),
      linetype = prep_type)
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  scale_linetype_manual(values = c("+RT" = "solid",
                                   "-RT" = "dashed",
                                   "NT" = "dotted")) +
  geom_line() +
  scale_x_continuous(breaks = seq(60, 100, 10),
                     minor_breaks = seq(60, 100, 5)) +
  labs(title = "All Amp Curves, tech_rep B")
```

## Plot melt curve for single well

```{r plotmelt_A1,dependson="load_amp",fig.width=4,fig.height=1.5}
ggplot(
  data = platesmelt %>%
    filter(well == "A1"),
  aes(x = temperature, y = dRdT)
) +
  facet_wrap(~target_id) +
  geom_line() +
  scale_y_continuous(expand = c(0.02, 0.02))
```


## Plot all melt curves

Again broken up by technical replicate.

```{r plotmelt_all,dependson="load_amp",fig.height=11,fig.width=7}
ggplot(
  data = platesmelt %>%
    filter(tech_rep == "1"),
  aes(x = temperature,
      y = dRdT,
      colour = factor(dilution),
      linetype = prep_type)
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  scale_linetype_manual(values = c("+RT" = "solid",
                                   "-RT" = "dashed",
                                   "NT" = "dotted")) +
  geom_line() +
  scale_x_continuous(breaks = seq(60, 100, 10),
                     minor_breaks = seq(60, 100, 5)) +
  labs(title = "All Melt Curves, tech_rep A")

ggplot(
  data = platesmelt %>%
    filter(tech_rep == "2"),
  aes(x = temperature,
      y = dRdT,
      colour = factor(dilution),
      linetype = prep_type)
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  scale_linetype_manual(values = c("+RT" = "solid",
                                   "-RT" = "dashed",
                                   "NT" = "dotted")) +
  geom_line() +
  scale_x_continuous(breaks = seq(60, 100, 10),
                     minor_breaks = seq(60, 100, 5)) +
  labs(title = "All Melt Curves, tech_rep B")
```


## Plot zoomed melt curves

```{r plotmelt_zoomed,dependson="load_amp",fig.height=11,fig.width=7}
ggplot(
  data = platesmelt %>%
    filter(tech_rep == "1", prep_type == "+RT"),
  aes(x = temperature, y = dRdT, colour = factor(dilution))
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  geom_line() +
  scale_x_continuous(
    breaks = seq(60, 100, 5),
    minor_breaks = seq(60, 100, 1),
    limits = c(73, 87)
  ) +
  labs(title = "Melt curves, zoomed, tech_rep A") +
  theme(
    panel.grid.major.x = element_line(colour = "grey50", size = 0.4),
    panel.grid.minor.x = element_line(colour = "grey70", size = 0.1)
  )

ggplot(
  data = platesmelt %>%
    filter(tech_rep == "2", prep_type == "+RT"),
  aes(x = temperature, y = dRdT, colour = factor(dilution))
) +
  facet_grid(target_id ~ biol_rep, scales = "free_y") +
  geom_line() +
  scale_x_continuous(
    breaks = seq(60, 100, 5),
    minor_breaks = seq(60, 100, 1),
    limits = c(73, 87)
  ) +
  labs(title = "Melt curves, zoomed, tech_rep B") +
  theme(
    panel.grid.major.x = element_line(colour = "grey50", size = 0.4),
    panel.grid.minor.x = element_line(colour = "grey70", size = 0.1)
  )
```


## Plot only zoomed melt curves for nice probes

```{r plotmelt_zoomed_nice,dependson="load_amp",fig.height=6,fig.width=4}
ggplot(
  data = platesmelt %>%
    filter(
      tech_rep == "1",
      prep_type == "+RT",
      dilution_nice == "1x",
      target_id %in% target_id_levels_niceprobes
    ),
  aes(x = temperature, y = dRdT, colour = biol_rep)
) +
  facet_grid(target_id ~ ., scales = "free_y") +
  geom_line() +
  scale_x_continuous(
    breaks = seq(60, 100, 5),
    minor_breaks = seq(60, 100, 1),
    limits = c(73, 87)
  ) +
  labs(title = "Nice probes, tech_rep A") +
  theme(
    panel.grid.major.x = element_line(colour = "grey50", size = 0.4),
    panel.grid.minor.x = element_line(colour = "grey70", size = 0.1)
  )
```


# Conclude acceptable primer sets

These probes have sensible standard curves, amplification curves, melt curves. In tie-breakers we pick the more highly detected probe.

* ECM38 set 3
* FET5 set 1 or 4
* GPT2 set 4
* ILV5 set 4
* NRD1 set 1 or 2
* RDL1 set 4
* TFS1 set 1