Llama 3.1 8B and 70B | OpenRouter | Page 1

quasi ocean Jul 23, 2024, 3:18 PM

#

Distilled from 405B

https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f

Llama 3.1 - a meta-llama Collection

oak mango Jul 23, 2024, 3:27 PM

#

https://tenor.com/view/send-energy-sending-healing-dragon-ball-spirit-bomb-gif-16443313

Tenor

#

Giving my energy to this

quiet urchin Jul 23, 2024, 3:28 PM

#

Soon hopefully on OpenRouter -> #1265324647969194117 message

glossy dome Jul 23, 2024, 4:04 PM

#

I can already use Groq API llama-3.1-8b-instant, llama-3.1-70b-versatile free (8k context).
I cannot yet use llama-3.1-405b-reasoning, which is "available to select Groq customers only" according to https://wow.groq.com/now-available-on-groq-the-largest-and-most-capable-openly-available-foundation-model-to-date-llama-3-1-405b/

#

Together has base models (8b, 70b) x (Reference, Turbo) listed but appears to be not usable yet.

queen rain Jul 23, 2024, 4:21 PM

#

They’re coming!

#

Llama 405b is out though

thin kettle Jul 23, 2024, 4:39 PM

#

Ooh wonder how this will perform

quiet urchin Jul 23, 2024, 4:43 PM

#

-> #announcements message

solar sphinx Jul 23, 2024, 6:16 PM

#

any good?

waxen vortex Jul 23, 2024, 6:34 PM

#

Better than 3.0 70b it seems, but I still notice the repition issue

#

Finetunes should be pretty good though, if they can get it right.

empty oracle Jul 23, 2024, 6:51 PM

#

Sentence repetition seem to be much less of a problem than on L3.0 70B
L3.0 fell apart after 4k, this seems to hold up to like 16k at the very least

quasi ocean Jul 23, 2024, 6:54 PM

#

clearly 3.1 70b instruct is the best model under 10b...

#

looks like they put the data in wrong

#

InternLM2.5 is still much stronger on Open LLM Leaderboard

#

70B Instruct is still way lower than Qwen2 72B Instruct too

empty oracle Jul 23, 2024, 7:10 PM

#

quasi ocean 70B Instruct is still way lower than Qwen2 72B Instruct too

tbf Qwen is a bit of a benchmaxx

#

We tried to finetune Qwen2-7B and it just refused to learn

#

I think Qwen is a bit fried with synth

quasi ocean Jul 23, 2024, 7:11 PM

#

ic

empty oracle Jul 23, 2024, 7:11 PM

#

like, Qwen didn't learn even with 8x more LR than L3 8B

#

that's bit strange

quasi ocean Jul 23, 2024, 7:12 PM

#

how many tokens did they put into Qwen?

empty oracle Jul 23, 2024, 7:13 PM

#

quasi ocean how many tokens did they put into Qwen?

7T I think?
Most of it being being Chinese corpora likely
I think most english stuff they put is synth

quasi ocean Jul 23, 2024, 7:13 PM

#

empty oracle 7T I think? Most of it being being Chinese corpora likely I think most english s...

hmm, so it's not like they saturated it with more than 15Tt

empty oracle Jul 23, 2024, 7:14 PM

#

quasi ocean hmm, so it's not like they saturated it with more than 15Tt

I think our set which is 50% natural data and 50% Opus synth was quite hard to learn for Qwen

#

But quite easy for L3

quasi ocean Jul 23, 2024, 7:14 PM

#

hmm

#

what were you finetuning it for

empty oracle Jul 23, 2024, 7:23 PM

#

quasi ocean what were you finetuning it for

storytelling/RPing

#

which requires a richer vocabulary that purely synthetic data can provide

#

my finetune is called L3-8B-Celeste btw

#

dataset was like 50% short story writing (natural), 25% RP and 25% (Opus synth)

waxen vortex Jul 23, 2024, 8:11 PM

#

Any good settings for the 70b? Is the version run by Fireworks a quant?

hushed bough Jul 23, 2024, 9:19 PM

#

@uncut galleon DeepInfra now has 3.1 8B and 70B:

https://deepinfra.com/meta-llama/Meta-Llama-3.1-8B-Instruct

https://deepinfra.com/meta-llama/Meta-Llama-3.1-70B-Instruct

meta-llama/Meta-Llama-3.1-8B-Instruct - Demo - DeepInfra

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

meta-llama/Meta-Llama-3.1-70B-Instruct - Demo - DeepInfra

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

#

Oops, just saw it posted in the 405B one, my bad

uncut galleon Jul 23, 2024, 9:23 PM

#

npnp(: its live now

#

appreciate the ping regardless 🫡

hushed bough Jul 23, 2024, 9:30 PM

#

Awesome, thanks!

high lintel Jul 23, 2024, 9:38 PM

#

Has anyone compared the censorship levels between L3 and L3.1?

waxen vortex Jul 23, 2024, 9:39 PM

#

high lintel Has anyone compared the censorship levels between L3 and L3.1?

It occasionally denies your requests, but I didn't check which providers

#

But otherwise it seemed very unrestricted

high lintel Jul 23, 2024, 9:44 PM

#

seems pretty censored

solar sphinx Jul 24, 2024, 12:37 AM

#

empty oracle my finetune is called L3-8B-Celeste btw

I hear so many good things but I rlly need slightly bigger context windows 🥺 any chance u can expand it?

hushed bough Jul 24, 2024, 3:48 AM

#

@uncut galleon The context length has been updated on DeepInfra's website for Llama 3.1. It now says 128K:

https://deepinfra.com/meta-llama/Meta-Llama-3.1-8B-Instruct

https://deepinfra.com/meta-llama/Meta-Llama-3.1-70B-Instruct

meta-llama/Meta-Llama-3.1-8B-Instruct - Demo - DeepInfra

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

meta-llama/Meta-Llama-3.1-70B-Instruct - Demo - DeepInfra

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Try out API on the Web

thin kettle Jul 24, 2024, 5:25 AM

#

high lintel seems pretty censored

This is a hilarious way to test for censorship

uncut galleon Jul 24, 2024, 5:32 AM

#

hushed bough <@392529839745269760> The context length has been updated on DeepInfra's website...

thx! deploying now

#

@oak mango fyi^ sorry for nerfing it temporarily

oak mango Jul 24, 2024, 12:32 PM

#

uncut galleon <@981296560749027410> fyi^ sorry for nerfing it temporarily

So grateful!

solar sphinx Jul 25, 2024, 4:31 PM

#

🤔 what are the best params you guys use to get the best results for the 70b model

#

i'm getting weird ass outputs

#Llama 3.1 8B and 70B