---
title: "Get started"
output: 
  rmarkdown::html_vignette:
    fig_width: 8
    fig_height: 5
vignette: >
  %\VignetteIndexEntry{getting_started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# `fellingdater`: Estimate, report and combine felling dates of historical tree-ring series

This R-package offers a suite of functions designed to assist you in inferring estimates for felling dates from dated tree-ring series. The presence of (partially) preserved sapwood or waney edge allows on a (pre-)historical timber allows to estimate the missing number of sapwood rings. from this estimate, a range for the actual felling date can be determined and reported, for individual series as well as for a group of related timbers. 

In cases where it can be assumed that a group of historical timbers were all felled simultaneously (i.e., in the same year), but due to the absence of the bark/cambial zone (waney edge) and the final formed tree ring, this cannot be determined, the preserved sapwood rings can be used to establish a date range for the felling event.

Taking into account the observed number of sapwood rings across all analysed samples and combining them into a single estimate, a more accurate and precise estimation of the felling date year for the group of timbers under study is likely to be obtained.

This vignette provides a quick overview of the package main functions, including a new tool to sum sapwood probability distributions, comparable to 'summed probability densities (SPD)' commonly applied to sets of radiocarbon (^14^C) dates.

## Installation

The latest developing version is hosted on [GitHub](https://github.com/ropensci/fellingdater/) and [R-universe](https://ropensci.r-universe.dev/fellingdater), and can be installed locally:

``` r
# install.packages("pak")
pak::pak("ropensci/fellingdater")
```

or

``` r
install.packages("fellingdater", repos = "https://ropensci.r-universe.dev")
```

## Basic example

In the following example the combined felling date range for a set of five dated tree-ring series is computed:

```{r setup}
library(fellingdater)

```

```{r example}
## a data set where all series have partially preserved sapwood:
sw_combine(trs_example1, plot = TRUE)

```

The light grey distributions represent the probability density function of the felling date range for each individual series. The dark grey distribution is the combined estimate for a common felling date.

The sapwood data used in the example above to estimate the felling date range, was published by [Hollstein (1980)](https://search.worldcat.org/nl/title/6391864) :

```{r model_sapwood_counts}
sw_model("Hollstein_1980", plot = TRUE)
```

## Main functions

### sw_interval

This function calculates the probability density function (PDF) and the highest probability density interval (HDI) for the range of potential felling dates, based on the observed number of sapwood rings, their chronological dating and the selected sapwood data and model.

In the example below, we observe 10 sapwood rings on a sample (with the last ring dated to 1234 AD) that originates from the Southern Baltic region. The sapwood model published by [Wazny, 1990](https://www.academia.edu/3486096/Aufbau_und_Anwendung_der_Dendrochronologie_f%C3%BCr_Eichenholz_in_Polen) covers this provenance region. The hdi delineates an interval in which the actual felling date is most likely situated. It represents the shortest interval within a probability distribution for a given probability mass or credible interval. The HDI summarizes the distribution by specifying an interval that spans most of the distribution, typically 95%, such that every point inside the interval has higher credibility than any point outside the interval.

```{r interval-individual}
sw_interval(n_sapwood = 10,
            last = 1234,
            hdi = TRUE,
            cred_mass = .95, 
            sw_data = "Wazny_1990", 
            densfun = "lognormal")
```

When `hdi = FALSE`, a matrix is returned with scaled _p_-values for each number of observed sapwood rings.

The results of this procedure can be visualized by setting `plot = TRUE`.

```{r }
# 10 sapwood rings observed and the Wazny 1990 sapwood model:
sw_interval(n_sapwood = 10,
            last = 1234,
            hdi = TRUE,
            cred_mass = .95, 
            sw_data = "Wazny_1990", 
            densfun = "lognormal",
            plot = TRUE)
```

### fd_report

The `fd_report()`function reports estimates of the felling date (fd) range for individual series.

```{r}
tmp <- data.frame(id = c("aaa", "bbb", "ccc"),
                 swr = c(10, 11, 12),
                 waneyedge = c(FALSE, FALSE,TRUE),
                 end = c(123, 456, 1789))
fd_report(tmp,
         series = "id",
         n_sapwood = "swr",
         last = "end",
         sw_data = "Wazny_1990")
```

### sw_combine

The the `sw_combine()` function combines felling dates of a group of related series with (partially) preserved sapwood, in order to narrow down the range of a common felling date. The function returns a list with:

-   the PDF for the felling date of the individual series and the PDF of the model that combines these individual series (`$dataRaw`),

-   the HDI for the combined estimate of the common felling date (`$hdi_model`),

-   the *Agreement index* (`$A_comb`) of the model, expressing how well the individual series fit into the model (ideally around 100%, and not lower than the critical threshold A_c = 60%) ,

-   an overview of the felling date range for the individual series (`$individual_series`), and their *Agreement index* (A_i) to the combined model.


#### trs_example0

A data set with dated tree-ring series, all with partially preserved sapwood. The names of the variables in the data set are mapped to the parameters of the `sw_combine()` function. In the example below, the numeric output is returned:

```{r}
trs_example0
# In trs_example0, the column names are not equivalent to the default names in `sw_combine()`.

output_comb <- sw_combine(trs_example0,
   series = "trs",
   last = "end",
   n_sapwood = "swr",
   waneyedge = "bark",
   cred_mass = .954,
   plot = FALSE
)

head(output_comb$rawData, 20)

output_comb[-1]
```

#### trs_example2

A data set with 5 tree-ring series, one of which has an exact felling date (waney edge present):

```{r}
trs_example2

sw_combine(trs_example2, plot = TRUE)

```

#### trs_example3

A data set where all tree-ring series have been measured up to the waney edge:

```{r }
sw_combine(trs_example3, plot = TRUE)
```

#### trs_example4

An attempt to compute a common felling date for a group of tree-ring series, all including partially preserved sapwood:

```{r }
sw_combine(trs_example4, plot = TRUE)
```

No common felling date can be computed for this particular data set. The model fails (agreement index of the model is below the critical threshold value (A_c) of 60%). Three series have an individual agreement index below 60%.

#### trs_example5

When no sapwood rings are observed and measured, only an `earliest possible felling date` (*terminus post quem*) can be determined:

```{r }
sw_combine(trs_example5,  plot = TRUE)
```

### sw_sum

The `sw_sum()` function calculates the the summed probability density (SPD) for a set of felling date ranges.
In the example, the SPD for a set of 9 series with partially preserved sapwood or waney edge is computed and plotted. A smoother for the SPD is plotted as well. The dots represent exact felling dates (i.e., series with waney edge). 

```{r }
spd7 <- sw_sum(trs_example7, plot = FALSE)
sw_sum_plot(spd7, window_smooth = 5, bar_col = "#95d5b2", trend_col = "#dda15e", dot_col = "forestgreen", dot_shape = 23, dot_size = 4)
```

## Helper functions

### sw_data_overview

The function `sw_data_overview()` provides an overview of all published sapwood data sets distributed with this package.

```{r overview}
sw_data_overview()

```

Load the original data with, for example, `get("Hollstein_1980")`.

### sw_data_info

More information on one of the sapwood data sets - how to cite the data set, the geographical area the data represents, the number of observations and some basic summary stats - can be retrieved by using the `sw_data_info()` function.

```{r info}
sw_data_info("Pilcher_1987")

```

### sw_model and sw_model_plot()

A graphical representation of a sapwood data set is provided by the `sw_model_plot()` function.

```{r model}

tmp <- sw_model(sw_data = "Sohar_2012_ELL_c", 
                densfun = "lognormal",
                cred_mass = 0.95,
                plot = FALSE)

sw_model_plot(tmp, 
              bar_fill = "steelblue3",
              bar_color = "grey60",
              line_color = "red3")

sw_data_info("Sohar_2012_ELL_c")

```

The `sw_model()` function fits a distribution to a data set of observed sapwood numbers and computes the highest posterior density interval (HDI) for a given credibility mass. The density function fitted to the sapwood data set should be one of:

-   "lognormal" (the default value),
-   "normal",
-   "weibull",
-   "gamma".

The credible interval should be a value between 0 and 1.

```{r sw_model_plot2}
sw_model("Wazny_1990", densfun = "gamma", cred_mass= .90, plot = TRUE)

```

When `plot = FALSE`, a list with the numeric output of the modelling process is returned.

### read_fh and get_header

The `read_fh()` function is an extension to the `dplR::read.fh()` function from the [**dplR package**](https://github.com/AndyBunn/dplR) ([Bunn 2008](https://doi.org/10.1016/j.dendro.2008.01.002), [Bunn 2010](https://doi.org/10.1016/j.dendro.2009.12.001), [Bunn *et al.* 2022](https://github.com/AndyBunn/dplR)). It allows to read .fh (format Heidelberg) files of ring widths AND additional information found in the HEADER fields are listed as attributes.

This function also works with ring-width data in CHRONO and HALF-CHRONO format.

Furthermore, the `read_fh()` function is case insensitive.

In the example below, an .fh file with dated tree-ring series from a medieval ship DOEL1 is read with `read_fh()`. The ring width measurement, in different formats, are converted to a `data.frame`.

```{r}
Doel1 <- system.file("extdata", "DOEL1.fh", package = "fellingdater")

Doel1_trs <- read_fh(Doel1, verbose = FALSE)

head(Doel1_trs, 15)

```

When `header = TRUE`, the `get_header()` function is triggered and HEADER fields in the .fh file are returned as a `data.frame`, instead of the ring-width measurements.

```{r}
read_fh(Doel1, verbose = FALSE, header = TRUE)

```

The `data.frame` with the HEADER fields can then be used as input for the sw_functions:

```{r fig.asp=1}
Doel1_header <- read_fh(Doel1, verbose = FALSE, header = TRUE)

Doel1_header |> 
   # first convert column 'waneyedge' to a logical vector 
   # (in the original .fh file header fields the presence of waney edge is
   # indicated by "WK" - a character string - in the corresponding HEADER field)
   dplyr::mutate(waneyedge = dplyr::if_else(grepl("wk", waneyedge, ignore.case = TRUE),
                               TRUE, FALSE)
   ) |> 
   sw_combine(plot = TRUE)
   
```

The attempt to combine the dated tree-ring series from DOEL1 into a single felling-date range fails. These tree-ring series cannot represent a single event.

However, leaving out the tree-ring series of the keelplank (series "K1_091") of this medieval ship shows that the hull planking and some frame elements could share a common felling date:

```{r fig.asp=1}
Doel1_header |> 
   dplyr::mutate(waneyedge = dplyr::if_else(grepl("wk", waneyedge, ignore.case = TRUE),
                               TRUE, FALSE)
   ) |> 
   dplyr::filter(series != "K1_091") |> 
   sw_combine(plot = TRUE)
```

For more details see: [Haneca & Daly (2014)](https://doi.org/10.1111/1095-9270.12037)

### cor_table

The `cor_table()` function computes common correlation values between dated tree-ring series (x) and a set of reference chronologies (y). This function assists in checking the end date of the series against absolutely dated reference chronologies, but also to select the most appropriate sapwood model for your data according to the provenance of the wood (by running it against chronologies that represent a geographically confined region).

The correlation values computed are:

-   glk: 'Gleichläufigkeit' or 'percentage of parallel variation' ([Buras & Wilmking 2015](https://doi.org/10.1016/j.dendro.2015.03.003); [Eckstein & Bauch 1969](https://doi.org/10.1007/BF02741777); [Huber 1943](https://doi.org/10.1007/BF02603303); [Visser 2020](https://doi.org/10.1111/arcm.12600))

-   glk_p: significance level associated with the glk-value ([Jansma 1995](https://dspace.library.uu.nl/handle/1874/45149))

-   r_pearson: the Pearson's correlation coefficient

-   t_St: Student's t-value

-   t_BP: t-values according to the [Baillie & Pilcher (1973)](https://repository.arizona.edu/handle/10150/260029) algorithm

-   t_Ho: t-values according to the [Hollstein (1980)](https://search.worldcat.org/nl/title/6391864) algorithm

```{r}
Doel1 <- system.file("extdata", "DOEL1.fh", package = "fellingdater")

Doel1_trs <- read_fh(Doel1, verbose = FALSE)

# crossdating ring-width series from Doel 1 against each other:
cor_results <- cor_table(Doel1_trs, output = "table", min_overlap = 80)
head(cor_results, 20)

```