DataScience With Python/R/SAS: Unsupervised Learning | Clustering | Machine Learning | Sci-kit-Learn

A cluster is a group of similar data points. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions.

k-means algorithm
K-Means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

The k-means algorithm takes a dataset X of N points as input, together with a parameter K specifying how many clusters to create. The output is a set of K cluster centroids and a labeling of X that assigns each of the points in X to a unique cluster. All points within a cluster are closer in distance to their centroid than they are to any other centroid.

Clusteting with the iris dataset

Given the iris dataset, if we knew that there were 3 types of iris, but did not have access to a taxonomist to label them: we could try a clustering task: split the observations into well-separated group called clusters.

Lets load the iris dataset and analyse.

Preparing X and y using pandas and instantiate the K-mean cluster model then fit the dataset to it.

Predict for a new dataset and plotting the findings,

DataScience With Python/R/SAS

Easy Pages

Unsupervised Learning | Clustering | Machine Learning | Sci-kit-Learn | Part-5

No comments:

Post a Comment