- Scalable MatMul-free Language Modeling
Proposes a method to completely eliminate MatMul operations from LLMs while maintaining strong performance at billion-parameter scales.
📎 https://arxiv.org/abs/2406.02528
- MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Introduces MMLU-Pro, an enhanced dataset that extends MMLU by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options.
📎 https://arxiv.org/abs/2406.01574
- LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
Introduces LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens.
📎 https://arxiv.org/abs/2406.00605
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Introduces Mamba-2, a new architecture whose core layer is a refinement of Mamba's selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.
📎 https://arxiv.org/abs/2405.21060
- Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Shows that existing self-supervised and multi-modal models trained with purely synthetic data are comparable to or outperform real-image baselines but are more susceptible to adversarial noise.
📎 https://arxiv.org/abs/2405.20469
⭐ Throwback Thursday Selection⭐
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
EleutherAI's pioneering work that took reproducibility research to a whole new level, Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters.
📎 https://arxiv.org/abs/2304.01373