Skip to content
Unverified — AI-generated content. Help verify this page

Clustering

Clustering groups unlabeled data points so that points within a cluster are more similar to each other than to points in other clusters. It is the most fundamental unsupervised learning task — used in customer segmentation, image compression, anomaly detection, topic discovery, and gene expression analysis.

Why Clustering Matters

Supervised learning requires labels. Labels are expensive. In most organizations, 95%+ of data is unlabeled. Clustering finds hidden structure without human annotation:

  • Customer segmentation: Group customers by purchasing behavior for targeted marketing
  • Image compression: Reduce color palette by clustering pixel colors
  • Document organization: Group similar documents without predefined categories
  • Anomaly detection: Points that don't belong to any cluster are anomalies
  • Pre-labeling: Generate pseudo-labels for semi-supervised pipelines

Intuition: What Makes a Good Cluster?

A good clustering maximizes intra-cluster similarity and minimizes inter-cluster similarity. The challenge: "similarity" depends on the distance metric, and the "right" number of clusters is rarely known in advance.


K-Means: Lloyd's Algorithm

The Idea

Partition n data points into K clusters by iterating between two steps:

  1. Assign each point to the nearest centroid
  2. Update each centroid to the mean of its assigned points

Mathematical Formulation

Given data X={x1,x2,,xn} where xiRd, K-Means minimizes the within-cluster sum of squares (WCSS):

J=k=1KxiCkxiμk2

where Ck is the set of points in cluster k and μk=1|Ck|xiCkxi is the centroid.

Worked Example — K-Means WCSS and One Iteration

Mini Dataset (K=2):

Pointx1x2
A11
B21
C12
D88
E98
F89

Initialize centroids: mu1 = [1, 1] (point A), mu2 = [9, 8] (point E)

Step 1: Assignment — assign each point to nearest centroid A: d(A, mu1)=0.00, d(A, mu2)=sqrt(64+49)=10.63 -> Cluster 1 B: d(B, mu1)=1.00, d(B, mu2)=sqrt(49+49)=9.90 -> Cluster 1 C: d(C, mu1)=1.00, d(C, mu2)=sqrt(64+36)=10.00 -> Cluster 1 D: d(D, mu1)=sqrt(49+49)=9.90, d(D, mu2)=1.00 -> Cluster 2 E: d(E, mu1)=10.63, d(E, mu2)=0.00 -> Cluster 2 F: d(F, mu1)=10.00, d(F, mu2)=sqrt(1+1)=1.41 -> Cluster 2

Cluster 1: {A, B, C}, Cluster 2:

Step 2: Update — recompute centroids mu1 = [(1+2+1)/3, (1+1+2)/3] = [1.33, 1.33] mu2 = [(8+9+8)/3, (8+8+9)/3] = [8.33, 8.33]

Step 3: Compute WCSS Cluster 1: (1-1.33)^2+(1-1.33)^2 + (2-1.33)^2+(1-1.33)^2 + (1-1.33)^2+(2-1.33)^2 = 0.11+0.11 + 0.44+0.11 + 0.11+0.44 = 1.33 Cluster 2: (8-8.33)^2+(8-8.33)^2 + (9-8.33)^2+(8-8.33)^2 + (8-8.33)^2+(9-8.33)^2 = 0.11+0.11 + 0.44+0.11 + 0.11+0.44 = 1.33 WCSS = 1.33 + 1.33 = 2.67

Interpret: "K-Means found two natural clusters in one iteration. The WCSS of 2.67 is low, indicating tight clusters. Each point is within ~0.67 units of its centroid."

Lloyd's Algorithm Step-by-Step

  1. Initialize K centroids μ1,,μK (randomly or via K-Means++)
  2. Assignment step: For each point xi, assign to nearest centroid:
ci=argminkxiμk2
  1. Update step: Recompute each centroid:
μk=1|Ck|xiCkxi
  1. Repeat until centroids converge (movement <ϵ) or max iterations reached

Convergence guarantee: WCSS decreases monotonically at each step. Since there are finitely many partitions, the algorithm always converges — but possibly to a local minimum.

K-Means++ Initialization

Random initialization can lead to poor convergence. K-Means++ selects initial centroids that are spread out:

  1. Choose first centroid uniformly at random from data
  2. For each remaining centroid, choose point x with probability proportional to D(x)2, where D(x) is the distance to the nearest existing centroid
  3. Repeat until K centroids are chosen

This gives an O(logK)-competitive approximation to the optimal WCSS.

From-Scratch NumPy Implementation

python
import numpy as np

class KMeansFromScratch:
    """K-Means clustering with K-Means++ initialization."""

    def __init__(self, n_clusters=3, max_iter=300, tol=1e-4, random_state=42):
        self.n_clusters = n_clusters
        self.max_iter = max_iter
        self.tol = tol
        self.rng = np.random.RandomState(random_state)

    def _init_centroids_plus_plus(self, X):
        """K-Means++ initialization."""
        n_samples = X.shape[0]
        centroids = [X[self.rng.randint(n_samples)]]

        for _ in range(1, self.n_clusters):
            # Squared distances to nearest centroid
            dists = np.min(
                [np.sum((X - c) ** 2, axis=1) for c in centroids], axis=0
            )
            # Probability proportional to D(x)^2
            probs = dists / dists.sum()
            idx = self.rng.choice(n_samples, p=probs)
            centroids.append(X[idx])

        return np.array(centroids)

    def fit(self, X):
        """Fit K-Means to data X of shape (n_samples, n_features)."""
        self.centroids_ = self._init_centroids_plus_plus(X)
        self.inertia_history_ = []

        for iteration in range(self.max_iter):
            # Assignment step: compute distances to all centroids
            distances = np.array([
                np.sum((X - c) ** 2, axis=1) for c in self.centroids_
            ]).T  # shape: (n_samples, n_clusters)
            self.labels_ = np.argmin(distances, axis=1)

            # Compute inertia (WCSS)
            inertia = sum(
                np.sum((X[self.labels_ == k] - self.centroids_[k]) ** 2)
                for k in range(self.n_clusters)
            )
            self.inertia_history_.append(inertia)

            # Update step: recompute centroids
            new_centroids = np.array([
                X[self.labels_ == k].mean(axis=0)
                if np.any(self.labels_ == k)
                else self.centroids_[k]  # keep old if empty cluster
                for k in range(self.n_clusters)
            ])

            # Check convergence
            shift = np.sum((new_centroids - self.centroids_) ** 2)
            self.centroids_ = new_centroids
            if shift < self.tol:
                break

        self.n_iter_ = iteration + 1
        self.inertia_ = self.inertia_history_[-1]
        return self

    def predict(self, X):
        """Assign new points to nearest centroid."""
        distances = np.array([
            np.sum((X - c) ** 2, axis=1) for c in self.centroids_
        ]).T
        return np.argmin(distances, axis=1)

Scikit-learn K-Means

python
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Generate synthetic data
X, y_true = make_blobs(n_samples=500, centers=4, cluster_std=0.8,
                        random_state=42)

# Always scale before clustering
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Fit K-Means
kmeans = KMeans(n_clusters=4, init='k-means++', n_init=10,
                max_iter=300, random_state=42)
kmeans.fit(X_scaled)

print(f"Inertia (WCSS): {kmeans.inertia_:.2f}")
print(f"Iterations: {kmeans.n_iter_}")
print(f"Cluster sizes: {np.bincount(kmeans.labels_)}")

Choosing K: Elbow Method and Silhouette Score

Elbow Method

Plot WCSS (inertia) vs. K. The "elbow" — where adding more clusters yields diminishing returns — suggests the right K.

python
inertias = []
K_range = range(2, 11)

for k in K_range:
    km = KMeans(n_clusters=k, n_init=10, random_state=42)
    km.fit(X_scaled)
    inertias.append(km.inertia_)

plt.figure(figsize=(8, 4))
plt.plot(K_range, inertias, 'bo-')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('Inertia (WCSS)')
plt.title('Elbow Method')
plt.grid(True, alpha=0.3)
plt.show()

Limitation: The elbow is often ambiguous. Use silhouette score for a more quantitative answer.

Silhouette Score

For each point i, the silhouette coefficient is:

s(i)=b(i)a(i)max(a(i),b(i))

where:

  • a(i) = mean distance from i to all other points in the same cluster (cohesion)
  • b(i) = mean distance from i to all points in the nearest other cluster (separation)

The overall silhouette score is s¯=1ni=1ns(i).

Worked Example — Silhouette Score

Using the K-Means result above. Compute silhouette for point A=[1,1]:

Cluster 1: {A=[1,1], B=[2,1], C=[1,2]} Cluster 2:

Step 1: a(A) = mean distance to same-cluster points d(A,B) = sqrt((1-2)^2+(1-1)^2) = 1.000 d(A,C) = sqrt((1-1)^2+(1-2)^2) = 1.000 a(A) = (1.000 + 1.000) / 2 = 1.000

Step 2: b(A) = mean distance to nearest other cluster d(A,D) = sqrt(49+49) = 9.899 d(A,E) = sqrt(64+49) = 10.630 d(A,F) = sqrt(49+64) = 10.630 b(A) = (9.899 + 10.630 + 10.630) / 3 = 10.386

Step 3: Compute silhouette s(A) = (10.386 - 1.000) / max(1.000, 10.386) = 9.386/10.386 = 0.904

Interpret: "s(A) = 0.904, very close to 1.0, meaning point A is well-clustered (much closer to its own cluster than to the other). Values near 0 indicate borderline points, negative values suggest wrong cluster assignment."

  • s(i)1: Point is well-clustered
  • s(i)0: Point is on the border between clusters
  • s(i)<0: Point is likely in the wrong cluster
python
from sklearn.metrics import silhouette_score, silhouette_samples

sil_scores = []
for k in K_range:
    km = KMeans(n_clusters=k, n_init=10, random_state=42)
    labels = km.fit_predict(X_scaled)
    score = silhouette_score(X_scaled, labels)
    sil_scores.append(score)
    print(f"K={k}: Silhouette = {score:.3f}")

best_k = K_range[np.argmax(sil_scores)]
print(f"\nBest K by silhouette: {best_k}")

DBSCAN: Density-Based Clustering

The Idea

Unlike K-Means, DBSCAN doesn't require specifying K. It finds clusters as dense regions separated by sparse regions and can discover clusters of arbitrary shape.

Key Parameters

  • eps (ε): Maximum distance between two points to be considered neighbors
  • min_samples: Minimum number of points in the ε-neighborhood to form a core point

Point Types

  1. Core point: Has at least min_samples points within radius ε
  2. Border point: Within ε of a core point but not a core point itself
  3. Noise point: Neither core nor border — labeled as -1

Algorithm

  1. For each unvisited point p:
    • Find all points within ε distance (the ε-neighborhood)
    • If |Nε(p)| min_samples, p is a core point — start a new cluster
    • Expand the cluster by recursively adding all density-reachable points
  2. Points not reachable from any core point are noise

Choosing eps with the k-Distance Graph

python
from sklearn.neighbors import NearestNeighbors

# Use k = min_samples - 1 (convention)
k = 4
nn = NearestNeighbors(n_neighbors=k + 1)
nn.fit(X_scaled)
distances, _ = nn.kneighbors(X_scaled)

# Sort k-th nearest neighbor distances
k_distances = np.sort(distances[:, k])

plt.figure(figsize=(8, 4))
plt.plot(k_distances)
plt.xlabel('Points (sorted by distance)')
plt.ylabel(f'{k}-th Nearest Neighbor Distance')
plt.title('k-Distance Graph for Choosing eps')
plt.grid(True, alpha=0.3)
plt.show()
# Look for the "elbow" — that distance is a good eps

Scikit-learn DBSCAN

python
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5, metric='euclidean')
labels = dbscan.fit_predict(X_scaled)

n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)

print(f"Clusters found: {n_clusters}")
print(f"Noise points: {n_noise}")
print(f"Silhouette (excl. noise): "
      f"{silhouette_score(X_scaled[labels != -1], labels[labels != -1]):.3f}")

DBSCAN vs K-Means

FeatureK-MeansDBSCAN
Requires KYesNo
Cluster shapeSphericalArbitrary
Handles noiseNoYes (labels as -1)
Handles varying densityNoPoorly (single eps)
ComplexityO(nKdi)O(nlogn) with tree
DeterministicNo (init-dependent)Yes

Hierarchical Agglomerative Clustering

The Idea

Start with each point as its own cluster. Repeatedly merge the two closest clusters until only one remains. The result is a dendrogram — a tree showing the merge history.

Linkage Criteria

The "distance between clusters" depends on the linkage method:

  • Single linkage: d(A,B)=minaA,bBd(a,b) — prone to chaining
  • Complete linkage: d(A,B)=maxaA,bBd(a,b) — produces compact clusters
  • Average linkage: d(A,B)=1|A||B|aAbBd(a,b)
  • Ward's method: Minimizes the increase in total within-cluster variance — most commonly used

Ward's merge criterion: Merge clusters A and B that minimize:

Δ=|A||B||A|+|B|μAμB2
Worked Example — Ward's Merge Criterion

3 candidate cluster merges:

  • Merge X: A(3 points, mu=[1,1]) with B(3 points, mu=[2,2])
  • Merge Y: C(2 points, mu=[1,1]) with D(4 points, mu=[5,5])
  • Merge Z: E(5 points, mu=[3,3]) with F(5 points, mu=[3.5, 3.5])

Step 1: Delta for merge X Delta_X = (3*3)/(3+3) * ||(1-2)^2 + (1-2)^2|| = 9/6 * (1+1) = 1.5 * 2 = 3.0

Step 2: Delta for merge Y Delta_Y = (2*4)/(2+4) * ||(1-5)^2 + (1-5)^2|| = 8/6 * (16+16) = 1.333 * 32 = 42.67

Step 3: Delta for merge Z Delta_Z = (5*5)/(5+5) * ||(3-3.5)^2 + (3-3.5)^2|| = 25/10 * (0.25+0.25) = 2.5 * 0.5 = 1.25

Step 4: Ward's method merges the pair with smallest Delta Delta_Z = 1.25 < Delta_X = 3.0 < Delta_Y = 42.67 -> Merge E and F first (they're closest and similar-sized)

Interpret: "Ward's method prefers merging clusters that are close together AND similar in size. Merge Y has a huge Delta (42.67) because the centroids are far apart. Merge Z wins because the clusters are very close (0.5 units apart) even though they're larger."

Scikit-learn + Dendrogram

python
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage

# Compute linkage matrix for dendrogram
Z = linkage(X_scaled, method='ward')

plt.figure(figsize=(12, 5))
dendrogram(Z, truncate_mode='lastp', p=20, leaf_rotation=90)
plt.title('Hierarchical Clustering Dendrogram (Ward)')
plt.xlabel('Cluster Size')
plt.ylabel('Distance')
plt.show()

# Cut the dendrogram at desired number of clusters
agg = AgglomerativeClustering(n_clusters=4, linkage='ward')
agg_labels = agg.fit_predict(X_scaled)

print(f"Silhouette: {silhouette_score(X_scaled, agg_labels):.3f}")

Gaussian Mixture Models (GMM)

The Idea

K-Means assigns each point to exactly one cluster (hard assignment). GMMs model data as a mixture of K Gaussian distributions and provide soft assignments — the probability that each point belongs to each cluster.

Generative Model

The data is generated by:

  1. Choose cluster k with probability πk (mixing coefficient)
  2. Draw x from N(μk,Σk)

The probability density:

p(x)=k=1KπkN(xμk,Σk)

where:

N(xμ,Σ)=1(2π)d/2|Σ|1/2exp(12(xμ)TΣ1(xμ))

The EM Algorithm

Since we can't directly maximize the log-likelihood (it involves a sum inside the log), we use Expectation-Maximization:

E-step — Compute responsibilities (posterior probability that point i belongs to cluster k):

γik=πkN(xiμk,Σk)j=1KπjN(xiμj,Σj)
Worked Example — GMM E-Step (Responsibilities)

1D data, K=2 Gaussians. Parameters:

  • Cluster 1: pi1=0.5, mu1=2, sigma1=1
  • Cluster 2: pi2=0.5, mu2=6, sigma2=1

Compute responsibilities for x=3.5:

Step 1: N(3.5 | mu1=2, sigma=1) = (1/sqrt(2*pi)) * exp(-(3.5-2)^2/2) = 0.3989 * exp(-1.125) = 0.3989 * 0.3247 = 0.1295

Step 2: N(3.5 | mu2=6, sigma=1) = 0.3989 * exp(-(3.5-6)^2/2) = 0.3989 * exp(-3.125) = 0.3989 * 0.0439 = 0.0175

Step 3: Compute responsibilities gamma_1 = (0.5 * 0.1295) / (0.5 * 0.1295 + 0.5 * 0.0175) = 0.0648 / (0.0648 + 0.0088) = 0.0648 / 0.0735 = 0.881 gamma_2 = 0.0088 / 0.0735 = 0.119

Step 4: Verify: 0.881 + 0.119 = 1.000

Interpret: "Point x=3.5 has 88.1% responsibility to cluster 1 (centered at 2) and 11.9% to cluster 2 (centered at 6). Unlike K-Means which would assign it 100% to cluster 1, GMM gives soft assignments reflecting uncertainty."

M-step — Update parameters using responsibilities:

Nk=i=1nγikμk=1Nki=1nγikxiΣk=1Nki=1nγik(xiμk)(xiμk)Tπk=Nkn

Repeat until log-likelihood converges.

Scikit-learn GMM

python
from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=4, covariance_type='full',
                       n_init=5, random_state=42)
gmm.fit(X_scaled)

# Soft assignments (probabilities)
probs = gmm.predict_proba(X_scaled)
labels = gmm.predict(X_scaled)

print(f"Log-likelihood: {gmm.score(X_scaled):.3f}")
print(f"BIC: {gmm.bic(X_scaled):.1f}")
print(f"AIC: {gmm.aic(X_scaled):.1f}")

# Use BIC to select number of components
bics = []
for k in range(2, 10):
    g = GaussianMixture(n_components=k, random_state=42)
    g.fit(X_scaled)
    bics.append(g.bic(X_scaled))

best_k = range(2, 10)[np.argmin(bics)]
print(f"Best K by BIC: {best_k}")

Covariance Types

TypeParameters per componentShape
'full'd(d+1)/2Arbitrary ellipsoid
'tied'd(d+1)/2 (shared)Same ellipsoid, different centers
'diag'dAxis-aligned ellipsoid
'spherical'1Sphere (like K-Means)

Real Dataset: Mall Customers Segmentation

python
import pandas as pd
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import numpy as np

# Load Mall Customers dataset
# Columns: CustomerID, Gender, Age, Annual Income (k$), Spending Score (1-100)
url = "https://raw.githubusercontent.com/dsrscientist/dataset1/master/mall_customers.csv"
df = pd.read_csv(url)
print(df.head())
print(f"\nShape: {df.shape}")
print(df.describe())

# Use Annual Income and Spending Score for 2D visualization
X = df[['Annual Income (k$)', 'Spending Score (1-100)']].values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# --- Elbow Method ---
inertias = []
sil_scores = []
K_range = range(2, 11)

for k in K_range:
    km = KMeans(n_clusters=k, n_init=10, random_state=42)
    km.fit(X_scaled)
    inertias.append(km.inertia_)
    sil_scores.append(silhouette_score(X_scaled, km.labels_))

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(K_range, inertias, 'bo-')
axes[0].set_title('Elbow Method')
axes[0].set_xlabel('K')
axes[0].set_ylabel('Inertia')

axes[1].plot(K_range, sil_scores, 'ro-')
axes[1].set_title('Silhouette Score')
axes[1].set_xlabel('K')
axes[1].set_ylabel('Score')
plt.tight_layout()
plt.show()

# --- K-Means with best K ---
best_k = 5  # Typical result for this dataset
kmeans = KMeans(n_clusters=best_k, n_init=10, random_state=42)
km_labels = kmeans.fit_predict(X_scaled)

# --- Compare all methods ---
methods = {
    'K-Means (K=5)': km_labels,
    'DBSCAN (eps=0.5)': DBSCAN(eps=0.5, min_samples=5).fit_predict(X_scaled),
    'Agglomerative (K=5)': AgglomerativeClustering(n_clusters=5).fit_predict(X_scaled),
    'GMM (K=5)': GaussianMixture(n_components=5, random_state=42).fit_predict(X_scaled),
}

fig, axes = plt.subplots(1, 4, figsize=(20, 5))
for ax, (name, labels) in zip(axes, methods.items()):
    scatter = ax.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=30, alpha=0.7)
    ax.set_title(name)
    ax.set_xlabel('Annual Income (k$)')
    ax.set_ylabel('Spending Score')
    # Silhouette (skip noise-only results)
    valid = labels[labels != -1]
    if len(set(valid)) > 1:
        sil = silhouette_score(X_scaled[labels != -1], valid)
        ax.text(0.05, 0.95, f'Sil: {sil:.3f}', transform=ax.transAxes, va='top')

plt.tight_layout()
plt.show()

# --- Segment profiling ---
df['Cluster'] = km_labels
profile = df.groupby('Cluster').agg({
    'Age': 'mean',
    'Annual Income (k$)': 'mean',
    'Spending Score (1-100)': 'mean',
    'CustomerID': 'count'
}).rename(columns={'CustomerID': 'Count'}).round(1)
print("\nCluster Profiles:")
print(profile)

Hyperparameters Reference

K-Means

ParameterDefaultGuidance
n_clusters8Use elbow + silhouette to choose
init'k-means++'Always use K-Means++
n_init10More runs = better chance of global optimum
max_iter300Increase if not converging
tol1e-4Convergence threshold for centroid movement

DBSCAN

ParameterDefaultGuidance
eps0.5Use k-distance graph to choose
min_samples5Rule of thumb: 2×d (dimensions)
metric'euclidean'Try 'cosine' for text data

Agglomerative

ParameterDefaultGuidance
n_clusters2Use dendrogram to pick cutoff
linkage'ward'Ward for compact clusters, average for varied
metric'euclidean'Ward requires euclidean

GMM

ParameterDefaultGuidance
n_components1Use BIC/AIC to select
covariance_type'full'Start full, try diag if overfitting
n_init1Use 5-10 for stability
reg_covar1e-6Increase if singular covariance errors

Edge Cases and Pitfalls

Scaling Is Mandatory

K-Means and DBSCAN use distance metrics. If features have different scales, the feature with the largest range dominates. Always standardize with StandardScaler or MinMaxScaler.

Empty Clusters in K-Means

If a centroid has no assigned points, the cluster dies. Scikit-learn handles this by reinitializing the centroid, but from-scratch implementations must handle it explicitly.

DBSCAN with Varying Density

A single eps cannot capture clusters with different densities. Use HDBSCAN (hierarchical DBSCAN) instead:

python
import hdbscan

clusterer = hdbscan.HDBSCAN(min_cluster_size=15, min_samples=5)
labels = clusterer.fit_predict(X_scaled)

High-Dimensional Data

Distance metrics become less meaningful in high dimensions (curse of dimensionality). Apply dimensionality reduction before clustering.

Non-Convex Clusters

K-Means assumes convex, spherical clusters. For non-convex shapes, use DBSCAN, spectral clustering, or GMM.

Categorical Features

K-Means uses Euclidean distance and cannot handle categorical features directly. Use K-Modes or K-Prototypes for mixed data, or encode categoricals properly.


Comparison Summary

AlgorithmBest ForWeaknessesComplexity
K-MeansLarge datasets, spherical clustersRequires K, sensitive to initO(nKdi)
DBSCANArbitrary shapes, noise detectionStruggles with varying densityO(nlogn)
AgglomerativeSmall-medium data, dendrogramO(n2) memory, no predictO(n2logn)
GMMSoft assignments, elliptical clustersCan overfit, slow with many featuresO(nKd2i)
HDBSCANVarying density, robustMore complex to tuneO(nlogn)

Cross-References

"What I cannot create, I do not understand." — Richard Feynman