Linear Classifiers
Linear classifiers form the foundation of neural networks. Before understanding deep learning, we must understand these simple models that create linear decision boundaries in feature space.
Support Vector Machines (SVM)
The SVM loss function (hinge loss) penalizes predictions that fall on the wrong side of the decision boundary.
For a datapoint , the loss is:
where are the class scores and is a margin (typically 1).
Intuition: The loss is zero when the correct class score is at least higher than all incorrect class scores. Otherwise, we accumulate a penalty proportional to how far we are from the desired margin.
Key Properties:
- Creates a margin between classes
- Only cares about points near the decision boundary
- Robust to outliers on the correct side
- Loss saturates once margin is satisfied
Softmax Classifier
The softmax function converts raw scores into probabilities:
The loss is the negative log-likelihood (cross-entropy):
Why Cross-Entropy?
Cross-entropy has several desirable properties:
- Penalizes confident wrong predictions heavily
- Has smooth gradients everywhere for optimization
- Connects to maximum likelihood estimation
- Never fully saturates (always encourages improvement)
Comparison with Hinge Loss:
- Softmax never stops optimizing (always wants higher probability for correct class)
- SVM stops once margin is achieved
- Softmax outputs have probabilistic interpretation
- Both work well in practice
Decision Boundaries
Linear classifiers create hyperplane decision boundaries in the input space:
Limitations:
- Can only separate linearly separable data
- Cannot solve XOR problem with single linear classifier
- Limited expressiveness for complex patterns
Why This Matters: Neural networks overcome these limitations by stacking multiple linear transformations with nonlinearities, creating arbitrarily complex decision boundaries.
Mathematical Formulation
For a linear classifier with weights and bias :
where:
- is the input
- is the number of classes
- Output is a vector of class scores
The classifier predicts:
Learning Resources
Videos
Reading
Related Concepts
- Perceptron - Single neuron with activation function
- Multi-Layer Perceptrons - Stacking linear layers
- Backpropagation - Training neural networks
Next Steps
After understanding linear classifiers:
- Learn about the Perceptron algorithm
- Understand why multiple layers are needed
- Study activation functions that enable non-linearity