Dynamic Expert Specialization: Multi-Domain MoE Adaptation December 27, 2025 Mixture of Experts π PDF Progressive 3 Phase Training Schedule for MoE's and a router that masks out models but is differentiable Read more β
Mixtures of SubExperts for Large Language Continual Learning December 26, 2025 Mixture of Experts π PDF Using MoE in the Attention Mechanism of a Transformer network for continual learning. Want to compete against LoRA Read more β
Soft Merging of Experts with Adaptive Routing December 25, 2025 Mixture of Experts π PDF Mixture of Experts model where within the network blocks of layers are weighted averages of different experts Read more β
Unified Scaling Laws for Routed Language Models December 22, 2025 Mixture of Experts π PDF Presentation for: Unified Scaling Laws for Routed Language Models. Shows how MoE models scale wrt parameter size & number of experts Read more β
DEMix Layers: Disentangling Domains for Modular Language Modeling December 22, 2025 Mixture of Experts π PDF MoE model where each expert corresponds to a different domain i.e. source text dataset (Reddit, Medical Papers, etc.). Can extrapolate to new domains by copying and training nearest expert Read more β
Review of Sparse Expert Models in Deep Learning December 21, 2025 Mixture of Experts π PDF Survey paper of MoE (Mixture of Expert) Models from 2022. Overview of variations of MoE's, strengths, & future research. Read more β
Adaptive Mixtures of Local Experts December 16, 2025 Mixture of Experts π PDF Notes on the classic: Adaptive Mixtures of Local Experts from 1991 by Hinton Read more β