#Llama 3.1 8B and 70B

1 messages · Page 1 of 1 (latest)

quiet urchin
glossy dome
#

Together has base models (8b, 70b) x (Reference, Turbo) listed but appears to be not usable yet.

queen rain
#

They’re coming!

#

Llama 405b is out though

thin kettle
#

Ooh wonder how this will perform

quiet urchin
#

-> #announcements message

solar sphinx
#

any good?

waxen vortex
#

Better than 3.0 70b it seems, but I still notice the repition issue

#

Finetunes should be pretty good though, if they can get it right.

empty oracle
#

Sentence repetition seem to be much less of a problem than on L3.0 70B
L3.0 fell apart after 4k, this seems to hold up to like 16k at the very least

quasi ocean
#

clearly 3.1 70b instruct is the best model under 10b...

#

looks like they put the data in wrong

#

InternLM2.5 is still much stronger on Open LLM Leaderboard

#

70B Instruct is still way lower than Qwen2 72B Instruct too

empty oracle
#

We tried to finetune Qwen2-7B and it just refused to learn

#

I think Qwen is a bit fried with synth

quasi ocean
#

ic

empty oracle
#

like, Qwen didn't learn even with 8x more LR than L3 8B

#

that's bit strange

quasi ocean
#

how many tokens did they put into Qwen?

empty oracle
quasi ocean
empty oracle
#

But quite easy for L3

quasi ocean
#

hmm

#

what were you finetuning it for

empty oracle
#

which requires a richer vocabulary that purely synthetic data can provide

#

my finetune is called L3-8B-Celeste btw

#

dataset was like 50% short story writing (natural), 25% RP and 25% (Opus synth)

waxen vortex
#

Any good settings for the 70b? Is the version run by Fireworks a quant?

hushed bough
#

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

#

Oops, just saw it posted in the 405B one, my bad

uncut galleon
#

npnp(: its live now

#

appreciate the ping regardless 🫡

hushed bough
#

Awesome, thanks!

high lintel
#

Has anyone compared the censorship levels between L3 and L3.1?

waxen vortex
#

But otherwise it seemed very unrestricted

high lintel
#

seems pretty censored

solar sphinx
hushed bough
#

@uncut galleon The context length has been updated on DeepInfra's website for Llama 3.1. It now says 128K:

https://deepinfra.com/meta-llama/Meta-Llama-3.1-8B-Instruct

https://deepinfra.com/meta-llama/Meta-Llama-3.1-70B-Instruct

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

thin kettle
uncut galleon
#

@oak mango fyi^ sorry for nerfing it temporarily

solar sphinx
#

🤔 what are the best params you guys use to get the best results for the 70b model

#

i'm getting weird ass outputs