machine learning - Clustering Baseline Comparison, KMeans -


i'm working on algorithm makes guess @ k kmeans clustering. guess i'm looking data set use comparison, or maybe few data sets number of clusters "known" see how algorithm doing @ guessing k.

i first check uci repository data sets: http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numatt=&numins=&type=&sort=nameup&view=table

i believe there in there labels.

there text clustering data sets used in papers baselines, such 20newsgroups: http://qwone.com/~jason/20newsgroups/

another great method (one thesis chair advocated) construct own small example data set. best way go start small, try 2 or 3 variables can represent graphically, , label clusters yourself.

the added benefit of small, homebrew data set know answers , great debugging.


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -