Distilled from 405B
https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f
1 messages · Page 1 of 1 (latest)
Distilled from 405B
https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f
Soon hopefully on OpenRouter -> #1265324647969194117 message
I can already use Groq API llama-3.1-8b-instant, llama-3.1-70b-versatile free (8k context).
I cannot yet use llama-3.1-405b-reasoning, which is "available to select Groq customers only" according to https://wow.groq.com/now-available-on-groq-the-largest-and-most-capable-openly-available-foundation-model-to-date-llama-3-1-405b/
Together has base models (8b, 70b) x (Reference, Turbo) listed but appears to be not usable yet.
Ooh wonder how this will perform
-> #announcements message
any good?
Better than 3.0 70b it seems, but I still notice the repition issue
Finetunes should be pretty good though, if they can get it right.
Sentence repetition seem to be much less of a problem than on L3.0 70B
L3.0 fell apart after 4k, this seems to hold up to like 16k at the very least
clearly 3.1 70b instruct is the best model under 10b...
looks like they put the data in wrong
InternLM2.5 is still much stronger on Open LLM Leaderboard
70B Instruct is still way lower than Qwen2 72B Instruct too
tbf Qwen is a bit of a benchmaxx
We tried to finetune Qwen2-7B and it just refused to learn
I think Qwen is a bit fried with synth
ic
how many tokens did they put into Qwen?
7T I think?
Most of it being being Chinese corpora likely
I think most english stuff they put is synth
hmm, so it's not like they saturated it with more than 15Tt
I think our set which is 50% natural data and 50% Opus synth was quite hard to learn for Qwen
But quite easy for L3
storytelling/RPing
which requires a richer vocabulary that purely synthetic data can provide
my finetune is called L3-8B-Celeste btw
dataset was like 50% short story writing (natural), 25% RP and 25% (Opus synth)
Any good settings for the 70b? Is the version run by Fireworks a quant?
@uncut galleon DeepInfra now has 3.1 8B and 70B:
https://deepinfra.com/meta-llama/Meta-Llama-3.1-8B-Instruct
https://deepinfra.com/meta-llama/Meta-Llama-3.1-70B-Instruct
Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web
Oops, just saw it posted in the 405B one, my bad
Awesome, thanks!
Has anyone compared the censorship levels between L3 and L3.1?
It occasionally denies your requests, but I didn't check which providers
But otherwise it seemed very unrestricted
seems pretty censored
I hear so many good things but I rlly need slightly bigger context windows 🥺 any chance u can expand it?
@uncut galleon The context length has been updated on DeepInfra's website for Llama 3.1. It now says 128K:
https://deepinfra.com/meta-llama/Meta-Llama-3.1-8B-Instruct
https://deepinfra.com/meta-llama/Meta-Llama-3.1-70B-Instruct
Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web
This is a hilarious way to test for censorship
thx! deploying now
@oak mango fyi^ sorry for nerfing it temporarily
So grateful!