Bias-Variance Tradeoff

The bias-variance tradeoff is one of the most fundamental concepts in machine learning. It explains why models fail to generalize and guides us in improving them.

Bias and Variance Defined

Bias: Error from incorrect assumptions in the model (underfitting)

High bias → model is too simple
Cannot capture underlying patterns
Example: Using a line to fit a curved relationship

Variance: Error from sensitivity to training data fluctuations (overfitting)

High variance → model is too complex
Memorizes noise instead of learning patterns
Example: Using a 20th-degree polynomial for 10 data points

Total Error Decomposition

For a model’s prediction $\hat{f}(x)$ and true function $f(x)$ :

\text{Expected Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}

Where:

Bias²: $(E[\hat{f}(x)] - f(x))^2$ - How far is average prediction from truth?
Variance: $E[(\hat{f}(x) - E[\hat{f}(x)])^2]$ - How much do predictions vary?
Irreducible Error: Noise in data that cannot be removed

Key insight: You cannot minimize both bias and variance simultaneously! Reducing one typically increases the other.

Model Complexity Spectrum

Model Complexity	Bias	Variance	Training Error	Test Error	Behavior
Too Simple	High	Low	High	High	Underfits both train and test
Just Right	Balanced	Balanced	Medium	Medium	Good generalization
Too Complex	Low	High	Low	High	Overfits train, fails on test

Visualizing the Tradeoff


Error
  │
  │     Bias²
  │    ╱
  │   ╱
  │  ╱────────────
  │              ╲
  │               ╲  Variance
  │                ╲╱
  │                 ╲
  │   Total Error    ╲
  │        ╱╲         ╲
  │       ╱  ╲         ╲
  │      ╱    ╲         ╲
  │     ╱      ╲_____    ╲
  │    ╱             ╲____╲____
  └────────────────────────────────→ Model Complexity
   Simple                      Complex
           ↑
    Optimal Complexity

The sweet spot is where total error is minimized—balancing bias and variance.

Diagnosing Your Model

Diagnosing Bias vs. Variance

High Bias (Underfitting)

Symptoms:
- Training error is high
- Validation error is high
- Small gap between train and val error
Solutions:
- Increase model complexity (more layers, more neurons)
- Train longer
- Reduce regularization
- Add more features

High Variance (Overfitting)

Symptoms:
- Training error is low
- Validation error is high
- Large gap between train and val error
Solutions:
- Get more training data
- Add regularization (L2, dropout)
- Reduce model complexity
- Use data augmentation
- Early stopping

Learning Curves

Learning curves show how error changes with training set size:


# Generate learning curves
train_sizes = [10, 50, 100, 200, 500, 1000, 2000]
train_errors = []
val_errors = []
 
for size in train_sizes:
    # Train on subset
    X_subset = X_train[:size]
    y_subset = y_train[:size]
 
    net = train_model(X_subset, y_subset)
 
    train_errors.append(compute_error(net, X_subset, y_subset))
    val_errors.append(compute_error(net, X_val, y_val))
 
# Plot
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_errors, 'o-', label='Training Error')
plt.plot(train_sizes, val_errors, 'o-', label='Validation Error')
plt.xlabel('Training Set Size')
plt.ylabel('Error')
plt.title('Learning Curves')
plt.legend()
plt.grid(True)

Interpreting Learning Curves

High Bias (Underfitting):

Both train and val errors converge to high value
Curves plateau early
Small gap between curves
Diagnosis: More data won’t help much!

High Variance (Overfitting):

Large gap between train and val error
Training error is low
Validation error decreases slowly with more data
Diagnosis: More data will help!

Well-Fitted Model:

Small gap between train and val error
Both errors are acceptably low
Diagnosis: Model is working well!

Strategies for the Tradeoff

Reduce Variance (Fight Overfitting)

Get more training data (most effective if possible)
Add regularization:
- L2 regularization / weight decay
- Dropout
- Early stopping
Reduce model complexity:
- Fewer layers
- Fewer neurons per layer
Data augmentation (artificially increase data)
Ensemble methods (average multiple models)

Reduce Bias (Fight Underfitting)

Increase model complexity:
- More layers (deeper network)
- More neurons per layer (wider network)
Add more features or better features
Reduce regularization (if too strong)
Train longer (more epochs)
Use more advanced architecture

The Role of Training Data

Small dataset:

Higher variance risk (easier to memorize)
Need stronger regularization
Simpler models often better

Large dataset:

Lower variance risk
Can use complex models
Regularization less critical (but still helpful)

Rule of thumb:

Need ≈10× examples per parameter for good generalization
Deep learning often “breaks” this rule with clever techniques

Modern Deep Learning Context

Deep learning challenges traditional bias-variance intuition:

Traditional view: Bigger models → more variance → worse generalization

Deep learning reality: Bigger models often generalize better!

Why?

Overparameterization can be beneficial
SGD provides implicit regularization
Batch normalization, dropout, and other techniques control variance
Neural networks find “simple” solutions in high-dimensional spaces

Double descent: Error first decreases, increases at interpolation threshold, then decreases again with more parameters!

This doesn’t mean bias-variance is obsolete—it’s still useful for diagnosis:

Training error high? → Bias problem
Large train-val gap? → Variance problem

Practical Workflow

Start simple: Begin with a simple model
Check bias: Can it fit training data?
- If no → increase complexity
Check variance: Large train-val gap?
- If yes → add regularization or get more data
Iterate: Repeat until satisfactory performance
Final evaluation: Test on held-out test set


# Bias-variance diagnostic workflow
def diagnose_model(model, X_train, y_train, X_val, y_val):
    """Diagnose if model has high bias or high variance."""
    train_error = compute_error(model, X_train, y_train)
    val_error = compute_error(model, X_val, y_val)
 
    print(f"Training Error: {train_error:.3f}")
    print(f"Validation Error: {val_error:.3f}")
    print(f"Gap: {val_error - train_error:.3f}")
 
    if train_error > 0.15:  # High training error
        print("\n⚠ HIGH BIAS (Underfitting)")
        print("Recommendations:")
        print("- Increase model complexity")
        print("- Add more features")
        print("- Reduce regularization")
        print("- Train longer")
    elif (val_error - train_error) > 0.10:  # Large gap
        print("\n⚠ HIGH VARIANCE (Overfitting)")
        print("Recommendations:")
        print("- Get more training data")
        print("- Add regularization (L2, dropout)")
        print("- Reduce model complexity")
        print("- Use data augmentation")
    else:
        print("\n✓ GOOD FIT")
        print("Model is performing well!")

Learning Resources

Reading

Understanding the Bias-Variance Tradeoff - Excellent visual explanation
CS229: Bias-Variance Tradeoff
ESL Chapter 7: Model Assessment and Selection

Regularization - Reducing variance
Dropout - Variance reduction technique
Model Selection - Choosing model complexity
Cross-Validation - Estimating generalization error

Next Steps

Learn specific regularization techniques
Understand dropout for variance reduction
Study model selection strategies
Practice diagnosis in MNIST example