import { ckmeans } from "https://deno.land/x/simplestatistics@v7.7.6/index.js";
Ckmeans clustering is an improvement on heuristic-based clustering approaches like Jenks. The algorithm was developed in Haizhou Wang and Mingzhou Song as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.
Minimizing the difference within groups - what Wang & Song refer to as
withinss
, or within sum-of-squares, means that groups are optimally
homogenous within and the data is split into representative groups.
This is very useful for visualization, where you may want to represent
a continuous variable in discrete color or style groups. This function
can provide groups that emphasize differences between data.
Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.
This implementation is based on Ckmeans 3.4.6, which introduced a new divide and conquer approach that improved runtime from O(kn^2) to O(kn log(n)).
Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided.
References
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming Haizhou Wang and Mingzhou Song ISSN 2073-4859
from The R Journal Vol. 3/2, December 2011
Examples
ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3);
// The input, clustered into groups of similar numbers.
//= [[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]);
ckmeans([-1, 2, -1, 2, 4, 5, 6, -1, 2, -1], 3); // The input, clustered into groups of similar numbers. //= [[-1, -1, -1, -1], [2, 2, 2], [4, 5, 6]]);