Skip to content

Instantly share code, notes, and snippets.

@grenkoca
Last active March 22, 2023 14:00
Show Gist options
  • Select an option

  • Save grenkoca/314e971360cb5994382cdf5df6ba6480 to your computer and use it in GitHub Desktop.

Select an option

Save grenkoca/314e971360cb5994382cdf5df6ba6480 to your computer and use it in GitHub Desktop.
Estimate the optimal K for partitioning a dataset in R, as described in SC3 (Kiselev et al., 2017)
# See methods section of SC3 for details (Kiselev et al., 2017)
estkTW <- function(dataset) {
p <- ncol(dataset)
n <- nrow(dataset)
# compute Tracy-Widom bound
x <- scale(dataset)
muTW <- (sqrt(n - 1) + sqrt(p))^2
sigmaTW <- (sqrt(n - 1) + sqrt(p)) * (1/sqrt(n - 1) + 1/sqrt(p))^(1/3)
sigmaHatNaive <- tmult(x) # x left-multiplied by its transpose
bd <- 3.273 * sigmaTW + muTW # 3.2730 is the p=0.001 percentile point for the Tracy-Widom distribution
# compute eigenvalues and return the amount which falls above the bound
evals <- eigen(sigmaHatNaive, symmetric = TRUE, only.values = TRUE)$value
k <- 0
for (i in 1:length(evals)) {
if (evals[i] > bd) {
k <- k + 1
}
}
return(k)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment