K-Means is an algorithm to detect clusters in a given set of points. It does this without you supervising or correcting the results. It works with any number of dimensions as well (that is, it works on a plane, 3D space, 4D space and any other finite dimensional spaces). And OpenCV comes with this algorithm built right into it!

The function you need to call to execute the algorithm is:

double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers)

This function is in the *cv* namespace. So you can use it by *cv::kmeans* or by simply including the *cv* namespace. If you know how K-means works, the parameters should be self explanatory.

`samples`

:*(input)*The actual data points that you need to cluster. It should contain exactly one point per row. That is, if you have 50 points in a 2D plane, then you should have a matrix with 50 rows and 2 columns.`clusterCount`

:*(input)*The number of clusters in the data points.`labels`

:*(output)*Returns the cluster each point belongs to. It can also be used to indicate the initial guess for each point.`termcrit`

:*(input)*This is an iterative algorithm. So you need to specify the termination criteria (number of iterations & desired accuracy)`attempts`

:*(input)*The number of times the algorithm is run with different center placements`flags`

:*(input)*Possible values include:`KMEANS_RANDOM_CENTER`

: Centers are generated randomly`KMEANS_PP_CENTER`

: Uses the kmeans++ center initialization`KMEANS_USE_INITIAL_LABELS`

: The first iteration uses the supplied*labels*to calculate centers. Later iterations use random or semi-random centers (use the above two flags for that).

`centers`

:*(output)*This matrix holds the center of each cluster.

The function returns the compactness of the final clustering. What is compactness? It's a measure of how good the labeling was done. The smaller the better.

When *attempts* is 1, the value returned is the compactness of the only iteration that happened. If *attempts* is more than 1, the final labeling returned is the one with the least compactness.

The C equivalent of the k-means function is:

int cvKMeans2(const CvArr* samples, int nclusters, CvArr* labels, CvTermCriteria termcrit, int attempts=1, CvRNG* rng=0, int flags=0, CvArr* centers=0, double* compactness=0)

The parameters are similar to the C++ interface.

`samples`

:*(input)*The actual data points that you need to cluster. It should contain exactly one point per row.`nclusters`

:*(input)*The number of clusters in the data points.`labels`

:*(output)*Returns the cluster each point belongs to. It can also be used to indicate the initial guess for each point.`termcrit`

:*(input)*This is an iterative algorithm. So you need to specify the termination criteria (number of iterations & desired accuracy)`attempts`

:*(input)*The number of times the algorithm is run with different center placements`rng`

: (input) A random number generate used to generate the initial guess. Puts you in total control of what's happening.`flags`

:*(input)*Possible values include:`0`

: (the number 0) Centers are generated randomly`KMEANS_USE_INITIAL_LABELS`

: The first iteration uses the supplied*labels*to calculate centers. Later iterations use random or semi-random centers (use the above two flags for that).

`centers`

:*(output)*This matrix holds the center of each cluster.`compactness`

:*(output)*Holds the compactness of the best labeling scheme.

If you're still using the C interface, I highly recommend you shift to the more intuitive and no-more-tears C++ interface!

You got to know how to run K-means without writing any code! You got to know about the C++ and C functions that you can use to execute K-Means on your data sets.