Phi 4 just dropped, only on Azure AI Foundry for now. It’s a dense 14B model claimed to be competitive with Llama 3.3 70b in many benchmarks. Most of the gains are from improved post-training rather than architectural improvements.
Paper: https://arxiv.org/abs/2412.08905
https://x.com/SebastienBubeck/status/1867379311067512876
https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft’s-newest-small-language-model-specializing-in-comple/4357090
We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous...