Dynamic Expert Specialization: Multi-Domain MoE Adaptation
Progressive 3 Phase Training Schedule for MoE's and a router that masks out models but is differentiable
Read more →
Progressive 3 Phase Training Schedule for MoE's and a router that masks out models but is differentiable
Read more →
Using MoE in the Attention Mechanism of a Transformer network for continual learning. Want to compete against LoRA
Read more →
Mixture of Experts model where within the network blocks of layers are weighted averages of different experts
Read more →
Presentation for: Unified Scaling Laws for Routed Language Models. Shows how MoE models scale wrt parameter size & number of experts
Read more →
MoE model where each expert corresponds to a different domain i.e. source text dataset (Reddit, Medical Papers, etc.). Can extrapolate to new domains by copying and training nearest expert
Read more →
Survey paper of MoE (Mixture of Expert) Models from 2022. Overview of variations of MoE's, strengths, & future research.
Read more →
Notes on the classic: Adaptive Mixtures of Local Experts from 1991 by Hinton
Read more →