KMeans:
Filter:
KMeans/Classes (extension) | Data Analysis

KMeans
ExtensionExtension

A simple yet effective clustering algorithm

Description

The k-means clustering algorithm is simple: given a set of data points, it finds a number ("k") of centroids which represent the data distribution pretty well. Each centroid is representative of one cluster, and each data point is labelled as belonging to the cluster whose centroid is nearest.

This can be used for unsupervised classification of data, and it can be used "on-line" meaning that results are available even after only a few data points have been added, and you can easily add more data points and the algorithm can update the cluster positions and labels accordingly.

Any dimensionality of data can be clustered (the examples below use 2D data).

Class Methods

.new

Create new instance

Arguments:

k

Define number of clusters

Instance Methods

.add

Add data points

Arguments:

datum

.update

Run the learning step

.centroids

Returns:

centroid positions

.data

Returns:

data stored internally

.assignments

Returns:

assignments

.classify

Arguments:

datum

.reset

.k

Examples