Soft Merging of Experts with Adaptive Routing
Mixture of Experts model where within the network blocks of layers are weighted averages of different experts
Read more →
Mixture of Experts model where within the network blocks of layers are weighted averages of different experts
Read more →
Presentation for: Unified Scaling Laws for Routed Language Models. Shows how MoE models scale wrt parameter size & number of experts
Read more →
Survey paper of MoE (Mixture of Expert) Models from 2022. Overview of variations of MoE's, strengths, & future research.
Read more →
Notes on the classic: Adaptive Mixtures of Local Experts from 1991 by Hinton
Read more →