Unsupervised learning is a machine learning technique where the model learns from unlabeled data. It discovers patterns and structures from the input data without predefined labels.
It’s used for clustering, dimensionality reduction, anomaly detection, etc.
Clustering is a method of grouping data points such that points in the same group (cluster) are more similar to each other than to those in other groups.
K-Means is a clustering algorithm that partitions n
observations into k
clusters. Each cluster is represented by the centroid of its points.
k
clusters.k
random points as centroids.from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# Generate synthetic data
X, y = make_blobs(n_samples=300, centers=3, random_state=42)
# n_samples=300: generate 300 data points
# centers=3: generate 3 cluster centers
# random_state=42: ensures reproducibility
# Initialize KMeans model
kmeans = KMeans(n_clusters=3) # n_clusters=3: number of clusters to form
# Fit the model to the data
kmeans.fit(X)
# Visualize clusters and centroids
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis') # c=kmeans.labels_: color by cluster label
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red') # s=200: size of centroids
plt.title("K-Means Clustering")
plt.show()