Using edge list generating functions and dyad_id

spatsoc can be used in social network analysis to generate edge lists from GPS relocation data.

Edge lists are generated using either the edge_dist or the edge_nn function.

Note: The grouping functions and their application in social network analysis are further described in the vignette Using spatsoc in social network analysis - grouping functions.

Generate edge lists

spatsoc provides users with one temporal (group_times) and two edge list generating functions (edge_dist, edge_nn) to generate edge lists from GPS relocations. Users can consider edges defined by either the spatial proximity between individuals (with edge_dist), by nearest neighbour (with edge_nn) or by nearest neighbour with a maximum distance (with edge_nn). The edge lists can be used directly by the animal social network package asnipe to generate networks.

1. Load packages and prepare data

spatsoc expects a data.table for all DT arguments and date time columns to be formatted POSIXct.

## Load packages
library(spatsoc)
library(data.table)
## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]

Next, we will group relocations temporally with group_times and generate edges lists with one of edge_dist, edge_dist. Note: these are mutually exclusive, only select one edge list generating function at a time.

2. a) edge_dist

Distance based edge lists where relocations in each timegroup are considered edges if they are within the spatial distance defined by the user with the threshold argument. Depending on species and study system, relevant temporal and spatial distance thresholds are used. In this case, relocations within 5 minutes and 50 meters are considered edges.

This is the non-chain rule implementation similar to group_pts. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if fillNA is TRUE).

Optionally, edge_dist can return the distances between individuals (less than the threshold) in a column named ‘distance’ with argument returnDist = TRUE.

# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
#>            ID        X       Y            datetime population minutes timegroup
#>        <char>    <num>   <num>              <POSc>      <int>   <int>     <int>
#>     1:      A 715851.4 5505340 2016-11-01 00:00:54          1       0         1
#>     2:      A 715822.8 5505289 2016-11-01 02:01:22          1       0         2
#>     3:      A 715872.9 5505252 2016-11-01 04:01:24          1       0         3
#>     4:      A 715820.5 5505231 2016-11-01 06:01:05          1       0         4
#>     5:      A 715830.6 5505227 2016-11-01 08:01:11          1       0         5
#>    ---                                                                         
#> 14293:      J 700616.5 5509069 2017-02-28 14:00:54          1       0      1393
#> 14294:      J 700622.6 5509065 2017-02-28 16:00:11          1       0      1394
#> 14295:      J 700657.5 5509277 2017-02-28 18:00:55          1       0      1449
#> 14296:      J 700610.3 5509269 2017-02-28 20:00:48          1       0      1395
#> 14297:      J 700744.0 5508782 2017-02-28 22:00:39          1       0      1396

# Edge list generation
edges <- edge_dist(
  DT,
  threshold = 100,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup',
  returnDist = TRUE,
  fillNA = TRUE
)

2. b) edge_nn

Nearest neighbour based edge lists where each individual is connected to their nearest neighbour. edge_nn can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. As with grouping functions and edge_dist, temporal and spatial threshold depend on species and study system.

NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold.

Optionally, edge_nn can return the distances between individuals (less than the threshold) in a column named ‘distance’ with argument returnDist = TRUE.

# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')

# Edge list generation
edges <- edge_nn(
  DT,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup'
)

# Edge list generation using maximum distance threshold
edges <- edge_nn(
  DT, 
  id = 'ID', 
  coords = c('X', 'Y'),
  timegroup = 'timegroup', 
  threshold = 100
)

# Edge list generation using maximum distance threshold, returning distances
edges <- edge_nn(
  DT, 
  id = 'ID', 
  coords = c('X', 'Y'),
  timegroup = 'timegroup', 
  threshold = 100,
  returnDist = TRUE
)

Dyads

3. dyad_id

The function dyad_id can be used to generate a unique, undirected dyad identifier for edge lists.

# In this case, using the edges generated in 2. a) edge_dist
dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
#> Key: <timegroup, ID1>
#>        timegroup    ID1    ID2  distance dyadID
#>            <int> <char> <char>     <num> <char>
#>     1:         1      A   <NA>        NA   <NA>
#>     2:         1      B      G  5.782904    B-G
#>     3:         1      C   <NA>        NA   <NA>
#>     4:         1      D   <NA>        NA   <NA>
#>     5:         1      E      H 65.061671    E-H
#>    ---                                         
#> 22942:      1457      G   <NA>        NA   <NA>
#> 22943:      1458      H   <NA>        NA   <NA>
#> 22944:      1459      I   <NA>        NA   <NA>
#> 22945:      1460      J   <NA>        NA   <NA>
#> 22946:      1461      J   <NA>        NA   <NA>

Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. Note: since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2.

4. Dyad stats

# Get the unique dyads by timegroup
# NOTE: we are explicitly selecting only where dyadID is not NA
dyads <- unique(edges[!is.na(dyadID)], by = c('timegroup', 'dyadID'))

# NOTE: if we wanted to also include where dyadID is NA, we should do it explicitly
# dyadNN <- unique(DT[!is.na(NN)], by = c('timegroup', 'dyadID'))

# Get where NN was NA
# dyadNA <- DT[is.na(NN)]

# Combine where NN is NA
# dyads <- rbindlist(list(dyadNN, dyadNA))


# Set the order of the rows
setorder(dyads, timegroup)

## Count number of timegroups dyads are observed together
dyads[, nObs := .N, by = .(dyadID)]

## Count consecutive relocations together
# Shift the timegroup within dyadIDs
dyads[, shifttimegrp := shift(timegroup, 1), by =  dyadID]

# Difference between consecutive timegroups for each dyadID
# where difftimegrp == 1, the dyads remained together in consecutive timegroups
dyads[, difftimegrp := timegroup - shifttimegrp]


# Run id of diff timegroups
dyads[, runid := rleid(difftimegrp), by = dyadID]

# N consecutive observations of dyadIDs
dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]

## Start and end of consecutive relocations for each dyad
# Dont consider where runs aren't more than one relocation
dyads[runCount > 1, start := fifelse(timegroup == min(timegroup), TRUE, FALSE), by = .(runid, dyadID)]

dyads[runCount > 1, end := fifelse(timegroup == max(timegroup), TRUE, FALSE), by = .(runid, dyadID)]

## Example output
dyads[dyadID == 'B-H', 
      .(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
#>     timegroup  nObs shifttimegrp difftimegrp runid runCount  start    end
#>         <int> <int>        <int>       <int> <int>    <int> <lgcl> <lgcl>
#>  1:      1340    29           NA          NA     1       NA     NA     NA
#>  2:      1341    29         1340           1     2        3   TRUE  FALSE
#>  3:      1342    29         1341           1     2        3  FALSE  FALSE
#>  4:      1343    29         1342           1     2        3  FALSE   TRUE
#>  5:      1346    29         1343           3     3       NA     NA     NA
#>  6:      1347    29         1346           1     4        3   TRUE  FALSE
#>  7:      1348    29         1347           1     4        3  FALSE  FALSE
#>  8:      1349    29         1348           1     4        3  FALSE   TRUE
#>  9:      1351    29         1349           2     5       NA     NA     NA
#> 10:      1356    29         1351           5     6       NA     NA     NA
#> 11:      1357    29         1356           1     7        2   TRUE  FALSE
#> 12:      1358    29         1357           1     7        2  FALSE   TRUE
#> 13:      1360    29         1358           2     8       NA     NA     NA
#> 14:      1361    29         1360           1     9        1     NA     NA
#> 15:      1364    29         1361           3    10       NA     NA     NA
#> 16:      1383    29         1364          19    11       NA     NA     NA
#> 17:      1384    29         1383           1    12        7   TRUE  FALSE
#> 18:      1385    29         1384           1    12        7  FALSE  FALSE
#> 19:      1386    29         1385           1    12        7  FALSE  FALSE
#> 20:      1387    29         1386           1    12        7  FALSE  FALSE
#> 21:      1388    29         1387           1    12        7  FALSE  FALSE
#> 22:      1389    29         1388           1    12        7  FALSE  FALSE
#> 23:      1390    29         1389           1    12        7  FALSE   TRUE
#> 24:      1392    29         1390           2    13       NA     NA     NA
#> 25:      1393    29         1392           1    14        3   TRUE  FALSE
#> 26:      1394    29         1393           1    14        3  FALSE  FALSE
#> 27:      1395    29         1394           1    14        3  FALSE   TRUE
#> 28:      1445    29         1395          50    15       NA     NA     NA
#> 29:      1446    29         1445           1    16        1     NA     NA
#>     timegroup  nObs shifttimegrp difftimegrp runid runCount  start    end