#Llama 3 released, bigger stuff to come

1 messages · Page 1 of 1 (latest)

silent cliff
#

Not a shocking level of performance luckily for released model

  • only 8k context length
  • 80b params
  • similar to Claude 3 sonnet / mixtral

However, there's also the unreleased 400B with gpt4 level of performance

Our largest models exceed 400B parameters and are still training.

(gpt4 is 1.7T, so roughly 4x as big)

Llama 3 is trained on 16K GPUs. They plan to have 600K by the end of 2024.

in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance
https://llama.meta.com/llama3/

humble elm
#

"Not a shocking level of performance" really is a rare sentence in these parts of town

silent cliff
#

Llama 3 released, bigger stuff to come

silent cliff
#

Zuck on:

  • Llama 3
  • open sourcing towards AGI
  • custom silicon, synthetic data, & energy constraints on scaling
  • Caesar Augustus, intelligence explosion, bioweapons, $10b models, & much more

Enjoy!

Timestamps

00:00:00 Llama 3
00:09:15 Coding on path to AGI
00:26:07 Energy bottlenecks
00:34:03 Is AI the most important technology ever?
00:3...

▶ Play video
rough gulch
#

feels good to see that meta still sucks

thick marlin
#

Looks like Meta is finding ways to squeeze out better and better performance out of smaller and smaller models. This seems really impressive (and probably harmless). Curious how they're doing it (maybe partially using a larger model to train the smaller one?). Mistral might not survive this.

thick marlin
#

Also, no MoE. Will the 400b be using it? If not, how did they get results comparable to gpt4-turbo?

thick marlin
#

Is Chinchilla dead? 🤔

spare isle
#

but do they plan to open source the 400b model?

thorny radish
#

As for llama I've heard the explanation that they just wanted to have a smaller model so it's easier to run

slender raven
#

450 Billion is in training - more to come

thick marlin
#

I still can't get over the fact that the 8b is beating the original gpt4 on the leaderboard (English version)

thick marlin
subtle marsh
#

oh right

silent cliff
#

There's also some chance that the final 400b model outperforms gpt4. Also crazy

silent cliff
thick marlin
thick marlin
#

Unless OpenAI drops GPT5 before its release... which seems much more likely, now

humble elm