Skip to Content
LibraryConceptsBias-Variance Tradeoff

Bias-Variance Tradeoff

The bias-variance tradeoff is one of the most fundamental concepts in machine learning. It explains why models fail to generalize and guides us in improving them.

Bias and Variance Defined

Bias: Error from incorrect assumptions in the model (underfitting)

  • High bias → model is too simple
  • Cannot capture underlying patterns
  • Example: Using a line to fit a curved relationship

Variance: Error from sensitivity to training data fluctuations (overfitting)

  • High variance → model is too complex
  • Memorizes noise instead of learning patterns
  • Example: Using a 20th-degree polynomial for 10 data points

Total Error Decomposition

For a model’s prediction f^(x)\hat{f}(x) and true function f(x)f(x):

Expected Error=Bias2+Variance+Irreducible Error\text{Expected Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}

Where:

  • Bias²: (E[f^(x)]f(x))2(E[\hat{f}(x)] - f(x))^2 - How far is average prediction from truth?
  • Variance: E[(f^(x)E[f^(x)])2]E[(\hat{f}(x) - E[\hat{f}(x)])^2] - How much do predictions vary?
  • Irreducible Error: Noise in data that cannot be removed

Key insight: You cannot minimize both bias and variance simultaneously! Reducing one typically increases the other.

Model Complexity Spectrum

Model ComplexityBiasVarianceTraining ErrorTest ErrorBehavior
Too SimpleHighLowHighHighUnderfits both train and test
Just RightBalancedBalancedMediumMediumGood generalization
Too ComplexLowHighLowHighOverfits train, fails on test

Visualizing the Tradeoff

Error │ Bias² │ ╱ │ ╱ │ ╱──────────── │ ╲ │ ╲ Variance │ ╲╱ │ ╲ │ Total Error ╲ │ ╱╲ ╲ │ ╱ ╲ ╲ │ ╱ ╲ ╲ │ ╱ ╲_____ ╲ │ ╱ ╲____╲____ └────────────────────────────────→ Model Complexity Simple Complex Optimal Complexity

The sweet spot is where total error is minimized—balancing bias and variance.

Diagnosing Your Model

Diagnosing Bias vs. Variance

High Bias (Underfitting)

  • Symptoms:
    • Training error is high
    • Validation error is high
    • Small gap between train and val error
  • Solutions:
    • Increase model complexity (more layers, more neurons)
    • Train longer
    • Reduce regularization
    • Add more features

High Variance (Overfitting)

  • Symptoms:
    • Training error is low
    • Validation error is high
    • Large gap between train and val error
  • Solutions:
    • Get more training data
    • Add regularization (L2, dropout)
    • Reduce model complexity
    • Use data augmentation
    • Early stopping

Learning Curves

Learning curves show how error changes with training set size:

# Generate learning curves train_sizes = [10, 50, 100, 200, 500, 1000, 2000] train_errors = [] val_errors = [] for size in train_sizes: # Train on subset X_subset = X_train[:size] y_subset = y_train[:size] net = train_model(X_subset, y_subset) train_errors.append(compute_error(net, X_subset, y_subset)) val_errors.append(compute_error(net, X_val, y_val)) # Plot plt.figure(figsize=(10, 6)) plt.plot(train_sizes, train_errors, 'o-', label='Training Error') plt.plot(train_sizes, val_errors, 'o-', label='Validation Error') plt.xlabel('Training Set Size') plt.ylabel('Error') plt.title('Learning Curves') plt.legend() plt.grid(True)

Interpreting Learning Curves

High Bias (Underfitting):

  • Both train and val errors converge to high value
  • Curves plateau early
  • Small gap between curves
  • Diagnosis: More data won’t help much!

High Variance (Overfitting):

  • Large gap between train and val error
  • Training error is low
  • Validation error decreases slowly with more data
  • Diagnosis: More data will help!

Well-Fitted Model:

  • Small gap between train and val error
  • Both errors are acceptably low
  • Diagnosis: Model is working well!

Strategies for the Tradeoff

Reduce Variance (Fight Overfitting)

  1. Get more training data (most effective if possible)
  2. Add regularization:
    • L2 regularization / weight decay
    • Dropout
    • Early stopping
  3. Reduce model complexity:
    • Fewer layers
    • Fewer neurons per layer
  4. Data augmentation (artificially increase data)
  5. Ensemble methods (average multiple models)

Reduce Bias (Fight Underfitting)

  1. Increase model complexity:
    • More layers (deeper network)
    • More neurons per layer (wider network)
  2. Add more features or better features
  3. Reduce regularization (if too strong)
  4. Train longer (more epochs)
  5. Use more advanced architecture

The Role of Training Data

Small dataset:

  • Higher variance risk (easier to memorize)
  • Need stronger regularization
  • Simpler models often better

Large dataset:

  • Lower variance risk
  • Can use complex models
  • Regularization less critical (but still helpful)

Rule of thumb:

  • Need ≈10× examples per parameter for good generalization
  • Deep learning often “breaks” this rule with clever techniques

Modern Deep Learning Context

Deep learning challenges traditional bias-variance intuition:

Traditional view: Bigger models → more variance → worse generalization

Deep learning reality: Bigger models often generalize better!

Why?

  • Overparameterization can be beneficial
  • SGD provides implicit regularization
  • Batch normalization, dropout, and other techniques control variance
  • Neural networks find “simple” solutions in high-dimensional spaces

Double descent: Error first decreases, increases at interpolation threshold, then decreases again with more parameters!

This doesn’t mean bias-variance is obsolete—it’s still useful for diagnosis:

  • Training error high? → Bias problem
  • Large train-val gap? → Variance problem

Practical Workflow

  1. Start simple: Begin with a simple model
  2. Check bias: Can it fit training data?
    • If no → increase complexity
  3. Check variance: Large train-val gap?
    • If yes → add regularization or get more data
  4. Iterate: Repeat until satisfactory performance
  5. Final evaluation: Test on held-out test set
# Bias-variance diagnostic workflow def diagnose_model(model, X_train, y_train, X_val, y_val): """Diagnose if model has high bias or high variance.""" train_error = compute_error(model, X_train, y_train) val_error = compute_error(model, X_val, y_val) print(f"Training Error: {train_error:.3f}") print(f"Validation Error: {val_error:.3f}") print(f"Gap: {val_error - train_error:.3f}") if train_error > 0.15: # High training error print("\n⚠ HIGH BIAS (Underfitting)") print("Recommendations:") print("- Increase model complexity") print("- Add more features") print("- Reduce regularization") print("- Train longer") elif (val_error - train_error) > 0.10: # Large gap print("\n⚠ HIGH VARIANCE (Overfitting)") print("Recommendations:") print("- Get more training data") print("- Add regularization (L2, dropout)") print("- Reduce model complexity") print("- Use data augmentation") else: print("\n✓ GOOD FIT") print("Model is performing well!")

Learning Resources

Reading

  • Regularization - Reducing variance
  • Dropout - Variance reduction technique
  • Model Selection - Choosing model complexity
  • Cross-Validation - Estimating generalization error

Next Steps

  1. Learn specific regularization techniques
  2. Understand dropout for variance reduction
  3. Study model selection strategies
  4. Practice diagnosis in MNIST example