Not a shocking level of performance luckily for released model
- only 8k context length
- 80b params
- similar to Claude 3 sonnet / mixtral
However, there's also the unreleased 400B with gpt4 level of performance
Our largest models exceed 400B parameters and are still training.
(gpt4 is 1.7T, so roughly 4x as big)
Llama 3 is trained on 16K GPUs. They plan to have 600K by the end of 2024.
in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance
https://llama.meta.com/llama3/