machine learning - Clustering Baseline Comparison, KMeans -
i'm working on algorithm makes guess @ k kmeans clustering. guess i'm looking data set use comparison, or maybe few data sets number of clusters "known" see how algorithm doing @ guessing k.
i first check uci repository data sets: http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numatt=&numins=&type=&sort=nameup&view=table
i believe there in there labels.
there text clustering data sets used in papers baselines, such 20newsgroups: http://qwone.com/~jason/20newsgroups/
another great method (one thesis chair advocated) construct own small example data set. best way go start small, try 2 or 3 variables can represent graphically, , label clusters yourself.
the added benefit of small, homebrew data set know answers , great debugging.
Comments
Post a Comment