Llama 3 released, bigger stuff to come | PauseAI | Page 1

silent cliff Apr 18, 2024, 4:50 PM

#

Not a shocking level of performance luckily for released model

only 8k context length
80b params
similar to Claude 3 sonnet / mixtral

However, there's also the unreleased 400B with gpt4 level of performance

Our largest models exceed 400B parameters and are still training.

(gpt4 is 1.7T, so roughly 4x as big)

Llama 3 is trained on 16K GPUs. They plan to have 600K by the end of 2024.

in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance
https://llama.meta.com/llama3/

Meta Llama

Meta Llama 3

Build the future of AI with Meta Llama 3. Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications.

humble elm Apr 18, 2024, 4:52 PM

#

"Not a shocking level of performance" really is a rare sentence in these parts of town

silent cliff Apr 18, 2024, 4:54 PM

#

Llama 3 released, bigger stuff to come

silent cliff Apr 18, 2024, 5:15 PM

#

Interview zuck + dwarkesh: https://www.youtube.com/watch?v=bc6uFV9CJGg

YouTube

Dwarkesh Patel

Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augu...

Zuck on:

Llama 3
open sourcing towards AGI
custom silicon, synthetic data, & energy constraints on scaling
Caesar Augustus, intelligence explosion, bioweapons, $10b models, & much more

Enjoy!

Timestamps

00:00:00 Llama 3
00:09:15 Coding on path to AGI
00:26:07 Energy bottlenecks
00:34:03 Is AI the most important technology ever?
00:3...

▶ Play video

rough gulch Apr 18, 2024, 5:26 PM

#

feels good to see that meta still sucks

thick marlin Apr 18, 2024, 6:02 PM

#

Looks like Meta is finding ways to squeeze out better and better performance out of smaller and smaller models. This seems really impressive (and probably harmless). Curious how they're doing it (maybe partially using a larger model to train the smaller one?). Mistral might not survive this.

thick marlin Apr 18, 2024, 6:39 PM

#

Also, no MoE. Will the 400b be using it? If not, how did they get results comparable to gpt4-turbo?

thick marlin Apr 18, 2024, 6:55 PM

#

Is Chinchilla dead? 🤔

spare isle Apr 19, 2024, 7:49 AM

#

but do they plan to open source the 400b model?

thorny radish Apr 19, 2024, 2:09 PM

#

thick marlin Is Chinchilla dead? 🤔

This actually seems to correct Chinchilla in the opposite direction in case you didn't see it https://x.com/tamaybes/status/1780639257389904013

Tamay Besiroglu (@tamaybes) on X

The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)

#

As for llama I've heard the explanation that they just wanted to have a smaller model so it's easier to run

slender raven Apr 19, 2024, 6:45 PM

#

450 Billion is in training - more to come

thick marlin Apr 19, 2024, 10:06 PM

#

I still can't get over the fact that the 8b is beating the original gpt4 on the leaderboard (English version)

subtle marsh Apr 19, 2024, 10:59 PM

#

thick marlin I still can't get over the fact that the 8b is beating the original gpt4 on the ...

70b not 8b

thick marlin Apr 20, 2024, 6:00 PM

#

subtle marsh 70b not 8b

No, 8b is above GPT-4-0314 (the original)

subtle marsh Apr 20, 2024, 6:31 PM

#

oh right

silent cliff Apr 21, 2024, 5:27 AM

#

thick marlin No, 8b is above GPT-4-0314 (the original)

How??? Insane

#

There's also some chance that the final 400b model outperforms gpt4. Also crazy

silent cliff Apr 21, 2024, 7:12 AM

#

Using Llama 3 70b on https://groq.com/ is quite something

GroqChat

A GroqLabs AI Language Interface

thick marlin Apr 21, 2024, 9:38 PM

#

silent cliff How??? Insane

Idk, it's fallen (slightly) on the leaderboard last I checked, but still wild

thick marlin Apr 21, 2024, 9:39 PM

#

silent cliff There's also some chance that the final 400b model outperforms gpt4. Also crazy

That seems almost-certain, given that the 80b is already threateningly close to gpt4's latest version, and is even overtaking Opus in some ways. This is a new era

#

Unless OpenAI drops GPT5 before its release... which seems much more likely, now

humble elm Apr 24, 2024, 12:35 PM

#

thick marlin Unless OpenAI drops GPT5 before its release... which seems much more likely, now

Didnt they say not in 2024? But they could release 4.5 ultra turbo max

thick marlin Apr 26, 2024, 5:40 PM

#

humble elm Didnt they say not in 2024? But they could release 4.5 ultra turbo max

Not aware of anyone saying this. Manifold still has >.5 https://manifold.markets/AmmonLam/will-gpt5-be-released-before-the-en-b5408f64eca6