This function calculates an extended version of BIC, which is computed using a particular weighted average of the total residual sum of squares and the number of clusters.
SCEM uses the following equation for the BIC of each partition:
\[BIC(P) = (np)\log \left\lbrace\frac{RSS(P)}{np}\right\rbrace + |P|(B_{n}^{-1}-1) \log(nB_{n}),\]
where \(RSS(P) = \sum_{q=1}^{Q} RSS(S_q)\).
The sample size of each individual time series (i.e. the number of observations) is denoted by \(n\), but in dealing with archaeological data, not all the time series in a data set will have the same number of observations.
In order to have a reasonable representative value for the sample size, we have chosen to use the natural arithmetic mean \(n=(n_1+\dots+n_p)/p\).
\((B_{n}^{-1}-1)\log(nB_{n})\) is the tuning parameter that places the penalty on the number of clusters (also note that the term \(nB_{n}\)). Using a different tuning parameter \(\gamma_{n}\) in place of \((B_{n}^{-1}-1)\log(nB_{n})\) allows stronger or weaker penalties on the number of clusters.
EBIC(paths, partition, bandwidth)
paths | A list of data frames, where each frame contains the data for one individual. Every data frame should have two columns with names 'distance' and 'oxygen'. |
---|---|
partition | A list of vectors. Each element in the list is a vector of integers, corresponding to individuals considered in one group. |
bandwidth | Denotes the order of the bandwidth that should be used in the estimation process. bandwidth = k will mean that the bandwidth is n^k. |
Value of the extended BIC function for the partition.
armenia_split = split(armenia,f = armenia$ID) band = -0.33 p = length(armenia_split) EBIC(armenia_split,1:p,band)#> [1] 69.70434