Neural Network Foundations

Build a deep, mathematical understanding of how neural networks learn through gradient descent and backpropagation. This module emphasizes implementation from scratch to truly understand the mechanics of forward and backward propagation.

Why This Module Is Essential

Neural networks are the foundation of all modern deep learning. This module builds your understanding from the ground up, starting with simple perceptrons and progressing to multi-layer networks with backpropagation.

What makes this module critical:

You’ll implement everything from scratch to truly understand the math
Visual intuitions complement mathematical formalism
Hands-on coding cements theoretical understanding
Foundation for all subsequent learning and research

Learning Objectives

After completing this module, you will be able to:

Mathematical Intuition: Develop a fundamental, mathematical understanding of how neural networks learn through gradient descent and backpropagation
Implementation from Scratch: Implement a complete neural network in NumPy without high-level libraries to understand the mechanics of forward and backward propagation
Optimization Mastery: Understand core optimization concepts including SGD, momentum, Adam, and their trade-offs
Regularization Techniques: Apply L2 regularization and dropout to prevent overfitting and understand the bias-variance tradeoff
Debugging Skills: Use gradient checking and systematic debugging to ensure correct implementations

Prerequisites

Before starting this module, ensure you have:

Linear Algebra: Matrix multiplication, vectors, dot products
Calculus: Derivatives, chain rule, gradients
Python: Proficiency with NumPy for numerical computations
Basic ML: Understanding of supervised learning concepts

Recommended preparation if needed:

Khan Academy’s linear algebra course
3Blue1Brown’s essence of calculus series

Week 1: Building Blocks and Backpropagation

Day 1-2: From Linear Classifiers to Neural Networks

Core Concepts:

Linear Classifiers
- SVM loss and softmax loss
- Linear decision boundaries
- Score functions and loss functions
Perceptron
- Single neuron architecture
- Activation functions (sigmoid, tanh, ReLU)
- Limitations of single-layer networks
Multi-Layer Perceptrons (MLPs)
- Hidden layers and depth
- Universal approximation theorem
- Why depth matters

Learning Resources:

Videos: Welch Labs Neural Networks Demystified Parts 1-3
Reading: CS231n Linear Classification notes
Code: Implement a two-layer network in NumPy

Checkpoint: Can you explain why we need non-linear activation functions?

Day 3-4: Backpropagation - The Learning Algorithm

Core Concept:

Backpropagation
- Chain rule application to neural networks
- Computational graphs
- Forward pass and backward pass
- Gradient checking

Learning Resources:

Videos:
- Welch Labs Neural Networks Demystified Parts 4-5
- 3Blue1Brown: Backpropagation calculus
Reading: CS231n Backpropagation notes
Code: Implement backpropagation from scratch

Critical Exercise: Implement gradient checking to verify your backprop implementation.

Checkpoint: Can you derive the backpropagation update for a two-layer network?

Day 5-7: Optimization Algorithms

Core Concept:

Optimization Algorithms
- Stochastic Gradient Descent (SGD)
- Momentum
- RMSprop
- Adam optimizer
- Learning rate schedules

Learning Resources:

Videos: CS231n Lecture 7 (Training Neural Networks II)
Reading: CS231n Optimization notes, Adam paper
Code: Implement SGD, Momentum, and Adam from scratch

Experiments:

Compare SGD vs Momentum vs Adam on MNIST
Visualize loss landscapes
Try different learning rates and observe convergence

Checkpoint: Can you explain when to use Adam vs SGD with momentum?

Week 2: Regularization and Practical Training

Day 8-10: Preventing Overfitting

Core Concepts:

Regularization
- L2 weight decay
- Early stopping
- Data augmentation
- How regularization affects loss landscape
Dropout
- Dropout as ensemble learning
- Inverted dropout
- When and where to apply dropout
- Dropout at test time
Bias-Variance Tradeoff
- Underfitting vs overfitting
- Model capacity
- Diagnosing learning problems

Learning Resources:

Videos: Welch Labs Neural Networks Demystified Parts 6-7
Reading: CS231n Regularization notes
Experiments:
- Train with/without L2 regularization
- Observe dropout’s effect on overfitting
- Plot train vs validation curves

Checkpoint: Can you diagnose whether a model is overfitting or underfitting from its learning curves?

Day 11-12: Practical Training Considerations

Core Concept:

Training Practices
- Weight initialization strategies (Xavier, He initialization)
- Learning rate selection and tuning
- Batch size effects
- Debugging neural networks
- Hyperparameter search strategies

Learning Resources:

Reading: CS231n Neural Network Tips and Tricks
Videos: CS231n Lecture 6 (Training Neural Networks I)
Practice: Debug intentionally broken implementations

Practical Skills:

Initialize weights correctly
Select appropriate learning rates
Use gradient checking
Diagnose vanishing/exploding gradients
Monitor training with TensorBoard

Checkpoint: Can you systematically debug a neural network that isn’t learning?

Day 13-14: Complete Implementation Project

Hands-On Project:

MNIST Digit Classification from Scratch

Assignment Requirements:

Build a 2-3 layer neural network in pure NumPy
Implement backpropagation from scratch
Implement at least 2 optimizers (SGD + Adam)
Add L2 regularization and dropout
Achieve >95% test accuracy on MNIST
Visualize learned features
Debug with gradient checking

Deliverables:

Complete, working implementation
Training curves (loss and accuracy)
Analysis of hyperparameter choices
Comparison of different optimizers
Visualization of first-layer weights

Time Estimate: 8-15 hours

Module Completion Criteria

You have completed this module when you can:

✅ Implement a multi-layer neural network from scratch in NumPy
✅ Implement backpropagation without references or libraries
✅ Explain the mathematical intuition behind backpropagation
✅ Implement SGD, Momentum, and Adam optimizers
✅ Apply L2 regularization and dropout correctly
✅ Diagnose and fix common training problems
✅ Use gradient checking to verify implementations
✅ Achieve >95% accuracy on MNIST with your from-scratch implementation

Key Resources

Primary Videos

Welch Labs: Neural Networks Demystified (7 parts, ~2 hours total)
- Best intuitive introduction to neural networks
- Excellent visualizations of forward and backward propagation
3Blue1Brown: Neural Networks (4 videos, ~1 hour total)
- Beautiful mathematical intuition
- Backpropagation calculus explained visually
CS231n Lectures 1-4 (Stanford, ~4 hours total)
- Image classification, loss functions, optimization
- Backpropagation and neural networks

Essential Reading

CS231n Course Notes: Linear Classification, Optimization, Backpropagation, Neural Networks
Michael Nielsen: “Neural Networks and Deep Learning” (Chapters 1-2)
Original papers: Adam optimizer (Kingma & Ba, 2014)

Hands-On Practice

CS231n Assignment 1: KNN, SVM, Softmax, Two-Layer Net
- Essential programming assignment
- Expect 8-15 hours
- Don’t skip this!

Common Pitfalls

1. Skipping Implementation

Problem: Using PyTorch/TensorFlow without understanding the math underneath Solution: Force yourself to implement everything in NumPy first

2. Not Using Gradient Checking

Problem: Subtle bugs in backpropagation implementation Solution: Always verify with numerical gradient checking

3. Wrong Weight Initialization

Problem: Vanishing or exploding gradients from the start Solution: Use Xavier or He initialization, never initialize to zeros

4. Learning Rate Too High/Low

Problem: Divergence or extremely slow convergence Solution: Start with lr=1e-3, adjust by factors of 10

5. Not Monitoring Train/Val Curves

Problem: Can’t diagnose overfitting or underfitting Solution: Always plot both training and validation loss/accuracy

Success Tips

Code Everything from Scratch First
- There’s no substitute for implementing backpropagation yourself
- Only use frameworks after you understand what they’re doing
Visualize Everything
- Draw the computational graph
- Plot loss curves
- Visualize learned weights
- Visualize activation distributions
Embrace the Struggle
- Debugging backpropagation is frustrating but educational
- Every bug you fix deepens your understanding
- The struggle is where learning happens
Mathematical Understanding Before Code
- Derive the gradient on paper first
- Understand the shape of every tensor
- Then implement
Start Simple, Then Add Complexity
- Get a simple 2-layer network working first
- Add one feature at a time (dropout, momentum, etc.)
- Verify each addition works

Time Investment

Total estimated time: 15-20 hours over 2 weeks

Videos: 3-4 hours
Reading: 3-4 hours
CS231n Assignment 1: 8-15 hours
Additional exercises: 2-3 hours

Don’t rush this module. Solid foundations here will make everything else easier.

Connection to Advanced Topics

The concepts from this module appear everywhere in deep learning:

Backpropagation → Used in all modern architectures (CNNs, Transformers, GANs)
Optimization → Critical for training large models (GPT, BERT)
Regularization → Essential for small datasets (medical imaging, few-shot learning)
Debugging skills → Invaluable when implementing novel architectures

Next Steps

After completing this module:

Immediate: Move to Module 2: CNNs
Parallel: Continue with CS231n lectures and assignments
Deepen: Read Christopher Olah’s blog posts on understanding neural networks

Key Takeaway

“There’s no substitute for coding everything from scratch at least once.”

Libraries hide crucial details. The theoretical knowledge from lectures becomes real when you debug your own backpropagation code. Deep understanding comes from implementation—embrace the struggle, that’s where learning happens.

Ready to begin? Start with Linear Classifiers.