Papers Library
Deep dives into influential machine learning research papers. Each paper page includes problem statement, methodology, key innovations, results, and impact on the field.
Foundational Papers
Papers that established core techniques:
Transformers and Attention (2017)
- Attention Is All You Need ⭐
- Authors: Vaswani et al., Google Brain/Research, 2017
- Impact: Foundation for BERT, GPT, and modern NLP
- Key innovation: Self-attention mechanism, eliminating recurrence
- Citations: 100,000+ (most cited ML paper)
Computer Vision
Papers advancing visual understanding:
CNN Architectures (2012-2015)
-
AlexNet - ImageNet Classification with Deep CNNs
- Authors: Krizhevsky, Sutskever, Hinton, 2012
- Impact: Started the deep learning revolution
- Key innovation: Deep CNNs with ReLU and dropout on ImageNet
- Result: 15.3% top-5 error (vs 26.2% previous best)
-
VGG - Very Deep Convolutional Networks
- Authors: Simonyan & Zisserman, Oxford, 2014
- Impact: Demonstrated importance of depth
- Key innovation: Simple 3×3 conv design, up to 19 layers
- Result: 7.3% top-5 error on ImageNet
-
ResNet - Deep Residual Learning ⭐
- Authors: He et al., Microsoft Research, 2015
- Impact: Revolutionary skip connections
- Key innovation: Residual connections enable 100+ layer networks
- Result: 3.57% top-5 error, superhuman performance
Vision Transformers (2021)
-
Vision Transformer (ViT)
- Authors: Dosovitskiy et al., Google Research, 2021
- Impact: Transformers work for vision with sufficient data
- Key innovation: Image patches as tokens, pure transformer architecture
- Result: Matches or exceeds CNN performance with pre-training
Multimodal Learning
Papers bridging vision and language:
Vision-Language Models (2021)
- CLIP - Contrastive Language-Image Pre-training ⭐
- Authors: Radford et al., OpenAI, 2021
- Impact: Enabled zero-shot image classification
- Key innovation: Contrastive pre-training on 400M image-text pairs
- Applications: Visual search, zero-shot classification, text-to-image
Generative Models
Papers for content generation:
Diffusion Models (2020-2022)
-
DDPM - Denoising Diffusion Probabilistic Models ⭐
- Authors: Ho et al., UC Berkeley, 2020
- Impact: Foundation of modern diffusion models
- Key innovation: Noise prediction is simpler than direct prediction
- Result: High-quality image generation, stable training
-
DDIM - Denoising Diffusion Implicit Models
- Authors: Song et al., Stanford, 2021
- Impact: Made diffusion practical for production
- Key innovation: Deterministic sampling, 20-50x faster
- Result: Same quality with 50 steps instead of 1000
-
DALL-E 2 ⭐
- Authors: Ramesh et al., OpenAI, 2022
- Impact: Demonstrated powerful text-to-image generation
- Key innovation: Two-stage CLIP + diffusion architecture
- Applications: Creative tools, product visualization, design
Browse by Year
2022
- DALL-E 2 - Text-to-image generation
2021
- DDIM - Fast diffusion sampling
- CLIP - Vision-language pre-training
- Vision Transformer - Transformers for vision
2020
- DDPM - Diffusion models foundation
2017
- Attention Is All You Need - Transformer architecture
2015
- ResNet - Residual connections
2014
- VGG - Deep simple CNNs
2012
- AlexNet - Deep learning revolution
Browse by Topic
Architecture Design
- AlexNet - First deep CNN
- VGG - Depth with simplicity
- ResNet - Skip connections
- Attention Is All You Need - Self-attention
- Vision Transformer - Transformers for images
Training Techniques
- AlexNet - Dropout, data augmentation
- DDPM - Noise prediction objective
- CLIP - Contrastive learning at scale
Transfer Learning & Zero-Shot
- CLIP - Zero-shot image classification
- Vision Transformer - Pre-training strategies
Generative Modeling
Most Influential Papers
Papers with the highest impact on the field:
-
Attention Is All You Need (2017)
- 100,000+ citations
- Foundation for modern NLP and multimodal models
- Enabled BERT, GPT, CLIP, and transformers everywhere
-
ResNet (2015)
- Skip connections revolutionized deep learning
- Enabled training of 100+ layer networks
- Still widely used as backbone
-
CLIP (2021)
- Demonstrated power of natural language supervision
- Enabled zero-shot transfer
- Foundation for text-to-image models
-
DDPM (2020)
- Established diffusion as preferred generative approach
- Powers Stable Diffusion, DALL-E 2, Midjourney
- Replaced GANs for high-quality generation
-
AlexNet (2012)
- Started the deep learning revolution
- Proved deep CNNs work at scale
- Launched modern era of computer vision
Explore More
- Concepts Library → - Core ML concepts explained
- Examples → - Implementation guides
- Blog → - Applications and insights
- Learning Paths → - Structured paper reading sequences