On Mixture of Experts in Large-Scale Statistical Machine Learning Applications
Nhat Ho · Nov 1, 2024
Date: 2024-11-01
Time: 15:30-16:30 (Montreal time)
Location: In person, Burnside 1104
https://mcgill.zoom.us/j/81284191962
Meeting ID: 812 8419 1962
Passcode: None
Abstract:
Mixtures of experts (MoEs), a class of statistical machine learning models that combine multiple models, known as experts, to form more complex and accurate models, have been combined into deep learning architectures to improve the ability of these architectures and AI models to capture the heterogeneity of the data and to scale up these architectures without increasing the computational cost. In mixtures of experts, each expert specializes in a different aspect of the data, which is then combined with a gating function to produce the final output. Therefore, parameter and expert estimates play a crucial role by enabling statisticians and data scientists to articulate and make sense of the diverse patterns present in the data. However, the statistical behaviors of parameters and experts in a mixture of experts have remained unsolved, which is due to the complex interaction between gating function and expert parameters.