Skip to Content
LibraryPapersOverview

Papers Library

Deep dives into influential machine learning research papers. Each paper page includes problem statement, methodology, key innovations, results, and impact on the field.


Foundational Papers

Papers that established core techniques:

Transformers and Attention (2017)

  • Attention Is All You Need
    • Authors: Vaswani et al., Google Brain/Research, 2017
    • Impact: Foundation for BERT, GPT, and modern NLP
    • Key innovation: Self-attention mechanism, eliminating recurrence
    • Citations: 100,000+ (most cited ML paper)

Computer Vision

Papers advancing visual understanding:

CNN Architectures (2012-2015)

  • AlexNet - ImageNet Classification with Deep CNNs
    • Authors: Krizhevsky, Sutskever, Hinton, 2012
    • Impact: Started the deep learning revolution
    • Key innovation: Deep CNNs with ReLU and dropout on ImageNet
    • Result: 15.3% top-5 error (vs 26.2% previous best)
  • VGG - Very Deep Convolutional Networks
    • Authors: Simonyan & Zisserman, Oxford, 2014
    • Impact: Demonstrated importance of depth
    • Key innovation: Simple 3×3 conv design, up to 19 layers
    • Result: 7.3% top-5 error on ImageNet
  • ResNet - Deep Residual Learning

    • Authors: He et al., Microsoft Research, 2015
    • Impact: Revolutionary skip connections
    • Key innovation: Residual connections enable 100+ layer networks
    • Result: 3.57% top-5 error, superhuman performance

Vision Transformers (2021)

  • Vision Transformer (ViT)
    • Authors: Dosovitskiy et al., Google Research, 2021
    • Impact: Transformers work for vision with sufficient data
    • Key innovation: Image patches as tokens, pure transformer architecture
    • Result: Matches or exceeds CNN performance with pre-training

Multimodal Learning

Papers bridging vision and language:

Vision-Language Models (2021)

  • CLIP - Contrastive Language-Image Pre-training
    • Authors: Radford et al., OpenAI, 2021
    • Impact: Enabled zero-shot image classification
    • Key innovation: Contrastive pre-training on 400M image-text pairs
    • Applications: Visual search, zero-shot classification, text-to-image

Generative Models

Papers for content generation:

Diffusion Models (2020-2022)

  • DDPM - Denoising Diffusion Probabilistic Models

    • Authors: Ho et al., UC Berkeley, 2020
    • Impact: Foundation of modern diffusion models
    • Key innovation: Noise prediction is simpler than direct prediction
    • Result: High-quality image generation, stable training
  • DDIM - Denoising Diffusion Implicit Models
    • Authors: Song et al., Stanford, 2021
    • Impact: Made diffusion practical for production
    • Key innovation: Deterministic sampling, 20-50x faster
    • Result: Same quality with 50 steps instead of 1000
  • DALL-E 2

    • Authors: Ramesh et al., OpenAI, 2022
    • Impact: Demonstrated powerful text-to-image generation
    • Key innovation: Two-stage CLIP + diffusion architecture
    • Applications: Creative tools, product visualization, design

Browse by Year

2022

2021

2020

  • DDPM - Diffusion models foundation

2017

2015

  • ResNet - Residual connections

2014

  • VGG - Deep simple CNNs

2012

  • AlexNet - Deep learning revolution

Browse by Topic

Architecture Design

Training Techniques

  • AlexNet - Dropout, data augmentation
  • DDPM - Noise prediction objective
  • CLIP - Contrastive learning at scale

Transfer Learning & Zero-Shot

Generative Modeling


Most Influential Papers

Papers with the highest impact on the field:

  1. Attention Is All You Need (2017)

    • 100,000+ citations
    • Foundation for modern NLP and multimodal models
    • Enabled BERT, GPT, CLIP, and transformers everywhere
  2. ResNet (2015)

    • Skip connections revolutionized deep learning
    • Enabled training of 100+ layer networks
    • Still widely used as backbone
  3. CLIP (2021)

    • Demonstrated power of natural language supervision
    • Enabled zero-shot transfer
    • Foundation for text-to-image models
  4. DDPM (2020)

    • Established diffusion as preferred generative approach
    • Powers Stable Diffusion, DALL-E 2, Midjourney
    • Replaced GANs for high-quality generation
  5. AlexNet (2012)

    • Started the deep learning revolution
    • Proved deep CNNs work at scale
    • Launched modern era of computer vision

Explore More