TDSM 10.29

From The Data Science Design Manual Wikia
Jump to: navigation, search

If we have prior knowledge of the data set, then we can make a good guess of k, and try some other k around the good guess.

However, the "right" number of clusters is usually unknown. Textbook page 333 says the easiest way to find the right k is to try them all and pick the best one.

Textbook page 335 introduces an "elbow method" to estimate k. As shown in the following figure from the textbook, the black error curve's shape is like an arm in a typing position. K = 1, 2 is the upper arm. K=3 is the elbow. Thus 3 is our estimated result of k.

10-29.jpg