Skip to content
Unverified — AI-generated content. Help verify this page

Visualization Decision Tree

Choosing the wrong chart type is one of the most common EDA mistakes. This page provides a complete decision tree: given your data and your question, it tells you exactly which chart to use, with code for each.


Master Decision Flowchart


Chart Type Reference

Distribution Charts

Histogram

Use when: Showing the distribution shape of a single continuous variable.

python
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

data = np.random.lognormal(4, 0.8, 5000)

fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(data, bins=50, edgecolor='white', alpha=0.7, color='steelblue')
ax.axvline(np.mean(data), color='red', linestyle='--', label=f'Mean: {np.mean(data):.1f}')
ax.axvline(np.median(data), color='green', linestyle='--', label=f'Median: {np.median(data):.1f}')
ax.set_title('Histogram — Distribution of Values')
ax.legend()
plt.show()
Good forBad for
Distribution shapeComparing many groups
Detecting skewnessSmall datasets (< 30)
Finding outliersCategorical data

KDE (Kernel Density Estimate)

Use when: Comparing distributions of the same variable across groups.

python
fig, ax = plt.subplots(figsize=(10, 5))
for group, color in [('A', 'steelblue'), ('B', 'coral'), ('C', 'green')]:
    data = np.random.normal(np.random.uniform(40, 60), 10, 500)
    sns.kdeplot(data, fill=True, alpha=0.3, label=group, color=color, ax=ax)
ax.set_title('KDE — Comparing Group Distributions')
ax.legend()
plt.show()

Box Plot

Use when: Comparing distributions across categories, detecting outliers.

python
data = pd.DataFrame({
    'value': np.concatenate([np.random.normal(m, 10, 200) for m in [40, 50, 45, 55]]),
    'group': np.repeat(['A', 'B', 'C', 'D'], 200)
})

fig, ax = plt.subplots(figsize=(10, 5))
sns.boxplot(data=data, x='group', y='value', ax=ax)
ax.set_title('Box Plot — Distribution Comparison')
plt.show()

Violin Plot

Use when: Full distribution shape comparison (richer than box plot).

python
fig, ax = plt.subplots(figsize=(10, 5))
sns.violinplot(data=data, x='group', y='value', inner='quart', ax=ax)
ax.set_title('Violin Plot — Full Distribution Shape')
plt.show()

ECDF (Empirical Cumulative Distribution)

Use when: Comparing distributions precisely, reading percentiles.

python
fig, ax = plt.subplots(figsize=(10, 5))
for group in data['group'].unique():
    subset = data[data['group'] == group]['value']
    sorted_data = np.sort(subset)
    ecdf = np.arange(1, len(sorted_data) + 1) / len(sorted_data)
    ax.step(sorted_data, ecdf, label=group)
ax.set_title('ECDF — Precise Distribution Comparison')
ax.set_ylabel('Cumulative Probability')
ax.legend()
plt.show()

Ridgeline (Joy Plot)

Use when: Comparing many distributions in a compact, visually striking way.

python
# Using seaborn FacetGrid
groups = pd.DataFrame({
    'value': np.concatenate([np.random.normal(m, 8, 300) for m in range(30, 80, 5)]),
    'month': np.repeat([f'Month {i}' for i in range(1, 11)], 300),
})

g = sns.FacetGrid(groups, row='month', height=1.2, aspect=6)
g.map_dataframe(sns.kdeplot, x='value', fill=True, alpha=0.6)
g.set_titles("{row_name}")
g.fig.suptitle('Ridgeline — Many Distributions', y=1.02)
plt.show()

Relationship Charts

Scatter Plot

Use when: Showing relationship between two continuous variables.

python
x = np.random.randn(500)
y = 0.7 * x + np.random.randn(500) * 0.5

fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(x, y, alpha=0.4, s=20)
z = np.polyfit(x, y, 1)
ax.plot(np.sort(x), np.poly1d(z)(np.sort(x)), 'r-', linewidth=2)
ax.set_title(f'Scatter Plot (r = {np.corrcoef(x, y)[0,1]:.3f})')
plt.show()

Heatmap (Correlation Matrix)

Use when: Showing pairwise relationships among many variables.

python
corr = pd.DataFrame(np.random.randn(1000, 6), columns=[f'Var{i}' for i in range(6)]).corr()

fig, ax = plt.subplots(figsize=(8, 7))
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f', cmap='RdBu_r', center=0, ax=ax)
ax.set_title('Correlation Heatmap')
plt.show()

Pair Plot

Use when: Scanning all pairwise relationships in one view.

python
# sns.pairplot(df[numeric_cols], hue='target', diag_kind='kde', corner=True)

Bubble Chart

Use when: Encoding a third dimension via bubble size on a scatter plot.

python
n = 100
fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
    np.random.randn(n), np.random.randn(n),
    s=np.random.uniform(50, 500, n),
    c=np.random.randn(n), cmap='RdYlBu',
    alpha=0.6, edgecolors='white'
)
plt.colorbar(scatter, label='Color Dimension')
ax.set_title('Bubble Chart — 4 Dimensions')
plt.show()

Comparison Charts

Bar Chart (Vertical)

Use when: Comparing values across a small number of categories.

python
categories = ['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon']
values = [42, 38, 55, 28, 47]

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(categories, values, color='steelblue')
ax.set_title('Bar Chart — Category Comparison')
plt.show()

Horizontal Bar

Use when: Many categories (>7) or long category labels.

python
fig, ax = plt.subplots(figsize=(10, 6))
sorted_idx = np.argsort(values)
ax.barh([categories[i] for i in sorted_idx], [values[i] for i in sorted_idx])
ax.set_title('Horizontal Bar — Sorted for Readability')
plt.show()

Grouped Bar

Use when: Comparing values across categories with sub-groups.

python
x = np.arange(4)
fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(x - 0.2, [10, 20, 15, 25], 0.4, label='Group A')
ax.bar(x + 0.2, [12, 18, 20, 22], 0.4, label='Group B')
ax.set_xticks(x)
ax.set_xticklabels(['Q1', 'Q2', 'Q3', 'Q4'])
ax.set_title('Grouped Bar — Sub-group Comparison')
ax.legend()
plt.show()

Dot Plot / Lollipop

Use when: Cleaner alternative to bar chart (less ink, easier to read).

python
fig, ax = plt.subplots(figsize=(10, 5))
ax.hlines(y=categories, xmin=0, xmax=values, color='steelblue', linewidth=2)
ax.scatter(values, categories, color='steelblue', s=100, zorder=3)
ax.set_title('Lollipop Chart — Clean Comparison')
plt.show()

Dumbbell / Slope Chart

Use when: Showing before/after or two-point comparisons.

python
cats = ['Alpha', 'Beta', 'Gamma', 'Delta']
before = [30, 45, 20, 55]
after = [38, 42, 35, 50]

fig, ax = plt.subplots(figsize=(10, 5))
for i, cat in enumerate(cats):
    ax.plot([before[i], after[i]], [i, i], 'o-', color='steelblue', markersize=10)
    ax.text(before[i] - 2, i, str(before[i]), ha='right', va='center')
    ax.text(after[i] + 2, i, str(after[i]), ha='left', va='center')
ax.set_yticks(range(len(cats)))
ax.set_yticklabels(cats)
ax.set_title('Dumbbell — Before vs After')
plt.show()

Composition Charts

Pie Chart

Use when: Showing parts of a whole (max 5-6 categories).

python
fig, ax = plt.subplots(figsize=(8, 8))
ax.pie([35, 28, 22, 15], labels=['A', 'B', 'C', 'D'],
       autopct='%1.1f%%', startangle=90)
ax.set_title('Pie — Parts of a Whole')
plt.show()

Stacked Bar

Use when: Showing composition that changes across categories or time.

python
cats = ['Q1', 'Q2', 'Q3', 'Q4']
a = [20, 25, 30, 35]
b = [15, 20, 25, 20]
c = [10, 15, 10, 15]

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(cats, a, label='Product A')
ax.bar(cats, b, bottom=a, label='Product B')
ax.bar(cats, [a[i]+b[i] for i in range(4)], bottom=[0]*4, width=0, label='')
ax.bar(cats, c, bottom=[a[i]+b[i] for i in range(4)], label='Product C')
ax.set_title('Stacked Bar — Composition Over Time')
ax.legend()
plt.show()

Treemap

Use when: Hierarchical part-to-whole with nested categories.

python
import plotly.express as px

tree = pd.DataFrame({
    'category': ['Electronics']*3 + ['Clothing']*2 + ['Home']*2,
    'product': ['Phones', 'Laptops', 'Tablets', 'Shirts', 'Pants', 'Furniture', 'Decor'],
    'revenue': [500, 400, 200, 300, 250, 350, 150],
})

fig = px.treemap(tree, path=['category', 'product'], values='revenue',
                  title='Treemap — Hierarchical Composition')
fig.show()

Time Series Charts

Line Chart

Use when: Showing trends over time for one or more series.

python
dates = pd.date_range('2024-01-01', periods=365, freq='D')
values = np.cumsum(np.random.randn(365) * 2 + 0.5) + 100

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(dates, values, linewidth=1.5)
ax.set_title('Line Chart — Time Series Trend')
fig.autofmt_xdate()
plt.show()

Area Chart

Use when: Emphasizing volume under a time series or composition over time.

python
fig, ax = plt.subplots(figsize=(12, 5))
ax.fill_between(dates, 0, values, alpha=0.3, color='steelblue')
ax.plot(dates, values, color='steelblue', linewidth=1)
ax.set_title('Area Chart — Volume Emphasis')
fig.autofmt_xdate()
plt.show()

Quick Selection Table

Your QuestionChart TypeLibrary
What is the shape of my data?Histogram + KDEmatplotlib / seaborn
Are there outliers?Box plotseaborn
How do groups differ?Violin / Boxseaborn
Are two variables related?Scatter + regressionseaborn / plotly
Which features correlate?Heatmapseaborn
What is the trend over time?Line chartmatplotlib
What is the composition?Pie (few) / Stacked bar (many)matplotlib
How does composition change over time?Stacked areamatplotlib
Which category is biggest?Horizontal bar (sorted)matplotlib
What is the hierarchy?Treemap / Sunburstplotly
How do many features interact?Pair plotseaborn
Where are the dense regions?2D KDE / Hexbinseaborn / matplotlib
What is the exact distribution?ECDFseaborn
How do many distributions compare?Ridgelineseaborn FacetGrid
What changed between two points?Dumbbell / Slopematplotlib
How does location affect values?Choropleth / Bubble mapplotly

Anti-Patterns to Avoid

Anti-PatternProblemBetter Alternative
Pie chart with 10+ slicesUnreadableHorizontal bar
3D bar chartDistorts perception2D bar
Dual y-axisConfusing correlationTwo separate charts
Truncated y-axis (not starting at 0)Exaggerates differencesStart at 0 or use dot plot
Rainbow color paletteNot accessible, distractingSequential or diverging palette
Too many lines on one chartClutteredSmall multiples / FacetGrid
Default title ("Figure 1")UninformativeDescriptive title with key finding

Key Takeaways

  • Start with "What am I trying to show?" not "What chart looks cool?"
  • Distribution: histogram, KDE, box, violin, ECDF
  • Relationship: scatter, heatmap, pair plot, bubble
  • Comparison: bar (horizontal for many categories), dot/lollipop
  • Composition: pie (max 5), stacked bar, treemap
  • Time: line chart, area chart, calendar heatmap
  • Always use sorted horizontal bars for ranked categorical data
  • Use small multiples (FacetGrid) instead of overloaded single charts
  • Choose colorblind-friendly palettes (viridis, cividis, or seaborn's colorblind)

"What I cannot create, I do not understand." — Richard Feynman