LLM Architecture
3 topics in AI & Machine Learning
Mixture of Experts
How GPT-4 and Mixtral use specialized sub-networks to handle different types of questions — the architecture secret that lets AI be huge without being slow.
Neural Scaling Laws
Why bigger AI keeps getting better — the mathematical relationships that let researchers predict how smart an AI will be before they finish building it.
Sparse Attention
How AI models handle very long documents without running out of memory — the tricks that let language models work with books, not just paragraphs.