Machine Learning Paper Picks #1 | AI @ Mozilla | Page 1

Scalable MatMul-free Language Modeling

Proposes a method to completely eliminate MatMul operations from LLMs while maintaining strong performance at billion-parameter scales.
📎 https://arxiv.org/abs/2406.02528

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Introduces MMLU-Pro, an enhanced dataset that extends MMLU by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options.
📎 https://arxiv.org/abs/2406.01574

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

Introduces LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens.
📎 https://arxiv.org/abs/2406.00605

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Introduces Mamba-2, a new architecture whose core layer is a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
📎 https://arxiv.org/abs/2405.21060

Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images

Shows that existing self-supervised and multi-modal models trained with purely synthetic data are comparable to or outperform real-image baselines but are more susceptible to adversarial noise.
📎 https://arxiv.org/abs/2405.20469

⭐ Throwback Thursday Selection⭐
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

EleutherAI's pioneering work that took reproducibility research to a whole new level, Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters.
📎 https://arxiv.org/abs/2304.01373

#Machine Learning Paper Picks #1