Bayesian Information Criterion (BIC) for a partition.

This function calculates an extended version of BIC, which is computed using a particular weighted average of the total residual sum of squares and the number of clusters.

SCEM uses the following equation for the BIC of each partition:

\[BIC(P) = (np)\log \left\lbrace\frac{RSS(P)}{np}\right\rbrace + |P|(B_{n}^{-1}-1) \log(nB_{n}),\]

where \(RSS(P) = \sum_{q=1}^{Q} RSS(S_q)\).

The sample size of each individual time series (i.e. the number of observations) is denoted by \(n\), but in dealing with archaeological data, not all the time series in a data set will have the same number of observations.

In order to have a reasonable representative value for the sample size, we have chosen to use the natural arithmetic mean \(n=(n_1+\dots+n_p)/p\).

\((B_{n}^{-1}-1)\log(nB_{n})\) is the tuning parameter that places the penalty on the number of clusters (also note that the term \(nB_{n}\)). Using a different tuning parameter \(\gamma_{n}\) in place of \((B_{n}^{-1}-1)\log(nB_{n})\) allows stronger or weaker penalties on the number of clusters.

EBIC(paths, partition, bandwidth)

Arguments

paths	A list of data frames, where each frame contains the data for one individual. Every data frame should have two columns with names 'distance' and 'oxygen'.
partition	A list of vectors. Each element in the list is a vector of integers, corresponding to individuals considered in one group.
bandwidth	Denotes the order of the bandwidth that should be used in the estimation process. bandwidth = k will mean that the bandwidth is n^k.

Value

Value of the extended BIC function for the partition.

Examples

armenia_split = split(armenia,f = armenia$ID)
band = -0.33
p = length(armenia_split)
EBIC(armenia_split,1:p,band)
#> [1] 69.70434