machine learning - Clustering Baseline Comparison, KMeans -

- July 15, 2011

i'm working on algorithm makes guess @ k kmeans clustering. guess i'm looking data set use comparison, or maybe few data sets number of clusters "known" see how algorithm doing @ guessing k.

i first check uci repository data sets: http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numatt=&numins=&type=&sort=nameup&view=table

i believe there in there labels.

there text clustering data sets used in papers baselines, such 20newsgroups: http://qwone.com/~jason/20newsgroups/

another great method (one thesis chair advocated) construct own small example data set. best way go start small, try 2 or 3 variables can represent graphically, , label clusters yourself.

the added benefit of small, homebrew data set know answers , great debugging.

Search This Blog

Search

machine learning - Clustering Baseline Comparison, KMeans -

Comments

Post a Comment

Popular posts from this blog

c++ - Creating new partition disk winapi -

php - Warning: file_get_contents() expects parameter 1 to be a valid path, array given 16 -

VBA function to include CDATA -