Library Statistics

Current size and growth of the mlnotes knowledge base.

Last Updated: November 11, 2025

Overview

Total Content Items: 90

The mlnotes library has grown to 90 interconnected content items spanning concepts, papers, examples, learning paths, and blog posts.

By Content Type

Type	Count	Description
Concepts	40	Core ML concepts, architectures, and methods
Blog Posts	13	Applications, research guides, and overviews
Learning Paths	11	Structured learning sequences
Papers	9	Research paper analyses
Other	17	Examples, resources, and domain overviews

Content Type Breakdown

40 Concepts: Foundation of the knowledge base - atomic, reusable concept pages
13 Blog Posts: Practical applications (CNNs, transformers, LLMs, VLMs, diffusion) and research methodology
11 Learning Paths: Structured curricula from foundations to advanced topics
9 Papers: In-depth analyses of seminal research papers
17 Other: Examples, healthcare resources, datasets, and domain-specific content

By Difficulty

Difficulty	Count	Percentage
Beginner	15	17%
Intermediate	48	53%
Advanced	19	21%
Not Classified	8	9%

The majority of content (53%) targets intermediate learners, with solid coverage of foundational (17%) and advanced (21%) topics.

By Topic (Top 10 Tags)

Rank	Tag	Count	Description
1	`deep-learning`	16	General deep learning concepts
2	`transformers`	16	Attention and transformer architectures
3	`neural-networks`	14	Foundational neural network concepts
4	`healthcare`	13	Healthcare AI applications
5	`computer-vision`	12	Vision and image processing
6	`attention`	11	Attention mechanisms
7	`diffusion-models`	10	Generative diffusion models
8	`multimodal`	10	Multimodal learning (VLMs, fusion)
9	`cnns`	9	Convolutional neural networks
10	`nlp`	9	Natural language processing

Topic Coverage Highlights

Transformers & Attention: 27 items (transformers + attention) - comprehensive coverage
Computer Vision: 21 items (computer-vision + cnns) - from foundations to modern architectures
Healthcare: 13 items - most comprehensive domain-specific coverage
Generative Models: 10 items (diffusion-models) - modern generative AI

Library Sections

Core Library (`content/library/`)

Concepts (concepts/): 40 concept pages organized by topic
Papers (papers/): 9 paper analyses from 2012-2022
Blog (blog/): 13 posts covering applications and research methods

Applications (`content/applications/`)

Healthcare: 17 pages (concepts, paths, resources, datasets)
- Most comprehensive domain coverage
- Complete learning path for EHR analysis
- Research methodology and validation guidelines

Learning Paths (`content/paths/`)

11 structured learning paths:
- Foundation: Neural Networks, CNNs, Transformers, GPT
- Advanced: VLMs, Diffusion, Advanced Training
- Specialized: Healthcare EHR, Research Methods
- Meta: Advanced concepts overview

Cross-References

The content-addressable architecture enables rich cross-referencing:

Every concept links to prerequisites and related concepts
Papers link to concept implementations and related work
Paths reference library content via stable IDs
Blog posts connect applications to underlying concepts

Estimated total cross-references: 500+ ContentLinks across 90 items

Growth Metrics

Migration Progress

Source: 109 legacy module files
Migrated: 100 files (92%)
- High-priority: 64/63 (101% - exceeded target)
- Medium-priority: 30/31 (97%)
- Low-priority: 6/15 (40% - remaining are low-value)
Result: 90 content items in new structure
Consolidation ratio: 100 source files → 90 content items (10% reduction through deduplication)

Quality Improvements

Atomicity: Split multi-concept pages into focused, reusable pages
Deduplication: Eliminated redundant content across modules
Rich linking: Added 500+ cross-references using ContentLink
Validation: Build-time checking prevents broken links
Stable IDs: Content can be safely reorganized without breaking links

Learning Time Estimates

Based on learning path metadata:

Level	Paths	Total Time
Foundation	4 paths	54-73 hours
Advanced	4 paths	40-55 hours
Specialized	3 paths	25-35 hours
Total Curriculum	11 paths	120-160 hours

Approximately 3-4 months of dedicated study for complete curriculum.

Content Distribution

By Module Origin

Module 01 (Foundations): 10 concepts + 1 example + 1 path = 12 items
Module 02 (CNNs): 5 concepts + 3 papers + 1 path + 1 blog = 10 items
Module 03 (Transformers): 6 concepts + 1 paper + 1 path + 1 blog = 9 items
Module 04 (GPT/LLMs): 6 concepts + 1 path + 1 blog = 8 items
Module 05 (VLMs): 4 concepts + 2 papers + 1 path + 1 blog + 1 overview = 9 items
Module 06 (Diffusion): 4 concepts + 3 papers + 1 path + 1 blog + 1 overview = 10 items
Module 07 (Advanced): 3 concepts + 1 path + 1 overview = 5 items
Module 09 (Research): 6 blog posts + 1 path = 7 items
Healthcare: 10 concepts + 3 overviews + 1 path + 2 resources + 1 index = 17 items
Section overviews: 3 items (foundation, advanced, paths)

Next Milestones

Short Term (Next 3 Months)

Expand to 120+ content items
Add 10+ code examples
Create 5+ additional learning paths
Expand computer vision and NLP domain coverage

Medium Term (6-12 Months)

Reach 200+ content items
Complete all major ML subdomain coverage
Add interactive demos and visualizations
Community contributions framework

Long Term (1-2 Years)

Comprehensive ML encyclopedia (500+ items)
Multi-domain expertise (healthcare, robotics, finance)
Integration with external learning platforms
Multilingual support

This statistics page is automatically generated from the content registry. Run pnpm build:registry to update.