Blog - Note Blog

Dynamic Expert Specialization: Multi-Domain MoE Adaptation

December 27, 2025 mixture of experts 📄 PDF

Progressive 3 Phase Training Schedule for MoE's and a router that masks out models but is differentiable

Read more →

Mixtures of SubExperts for Large Language Continual Learning

December 26, 2025 mixture of experts 📄 PDF

Using MoE in the Attention Mechanism of a Transformer network for continual learning. Want to compete against LoRA

Read more →

Soft Merging of Experts with Adaptive Routing

December 25, 2025 mixture of experts network architecture 📄 PDF

Mixture of Experts model where within the network blocks of layers are weighted averages of different experts

Read more →

Unified Scaling Laws for Routed Language Models

December 22, 2025 network architecture mixture of experts 📄 PDF

Presentation for: Unified Scaling Laws for Routed Language Models. Shows how MoE models scale wrt parameter size & number of experts

Read more →

DEMix Layers: Disentangling Domains for Modular Language Modeling

December 22, 2025 mixture of experts 📄 PDF

MoE model where each expert corresponds to a different domain i.e. source text dataset (Reddit, Medical Papers, etc.). Can extrapolate to new domains by copying and training nearest expert

Read more →

Review of Sparse Expert Models in Deep Learning

December 21, 2025 mixture of experts network architecture 📄 PDF

Survey paper of MoE (Mixture of Expert) Models from 2022. Overview of variations of MoE's, strengths, & future research.

Read more →

Adaptive Mixtures of Local Experts

December 16, 2025 network architecture mixture of experts 📄 PDF

Notes on the classic: Adaptive Mixtures of Local Experts from 1991 by Hinton

Read more →