Llama 4 | OpenRouter | Page 2

flat whale Apr 11, 2025, 7:33 PM

#

I wish they'd colorize this table

#

Transcribed with Gemini Flash 2.0, I did not double check the values

deft knot Apr 11, 2025, 8:20 PM

#

candid slate Apr 11, 2025, 9:43 PM

#

questionable benchmark if you ask me

#

how is o1 mini better than 3.7 sonnet

#

still, something to be said about people taking shortcuts

formal lantern Apr 12, 2025, 3:09 PM

#

Just testing random stuff, this one was particular telling (identical prompt), interactive village webgame

flat whale Apr 12, 2025, 3:22 PM

#

When physics problems tell you to approximate a person as a point mass

shut lodge Apr 12, 2025, 4:02 PM

#

llama 4 moved up a bit so now its above 2.0 flash

#

oh but thats an older 2.0 flash

median flint Apr 12, 2025, 4:48 PM

#

formal lantern Just testing random stuff, this one was particular telling (identical prompt), i...

Hahaha

hollow yacht Apr 14, 2025, 7:54 AM

#

Supposedly there were some implementation issues with maverick in vllm

#

does anybody know whether the current providers have introduced the fix?

stiff isle Apr 14, 2025, 7:04 PM

#

hollow yacht Supposedly there were some implementation issues with maverick in vllm

so far it looks like there's been at least 3 bugs each a couple days apart that have each improved performance

#

I'm guessing most providers at this point have caught up

#

might not be the end of it though!

lucid pond Apr 15, 2025, 10:10 PM

#

@wary warren hope its ok to tag
SambaNova added support for both new models a few days ago, is there a reason they aren't available?

wary warren Apr 15, 2025, 10:11 PM

#

lucid pond <@165587622243074048> hope its ok to tag SambaNova added support for both new mo...

they didn’t support images and we don’t have a good way of toggling that off for a specific endpoint yet

#

checking in with them now

lucid pond Apr 15, 2025, 10:11 PM

#

ohhh ok

#

thanks!

stiff isle Apr 15, 2025, 10:34 PM

#

I'm afraid to say - Llama 4 already came with a rocky start... but with the release of GLM-4-32B-0414, I think it's officially dead (for non-long-context tasks).

agile pendant Apr 16, 2025, 12:34 AM

#

stiff isle I'm afraid to say - Llama 4 already came with a rocky start... but with the rele...

It's interesting that both Meta and OpenAI (both big players) released models that are basically dead out of the gate (OpenAI is already being abandoned after only 2 months from release and Llama 4 just seems to be a non-starter), while the smaller (and let's face it, mostly Chinese) modelmakers are eating their lunch with these amazing models that punch way above their weight

#

OpenAI, at least, is still the apparent king of multimodality for now, but Google's catching up there. There's a lot of catching up to do in that regard for the smaller guys

stiff isle Apr 16, 2025, 12:41 AM

#

agile pendant It's interesting that both Meta and OpenAI (both big players) released models th...

my guess is that openai is mostly milking margin instead of actually being behind right now

#

the challenger modelmakers simply aren't

mild blade Apr 16, 2025, 6:46 AM

#

agile pendant It's interesting that both Meta and OpenAI (both big players) released models th...

4.5 was never practical
massive ass model they'll likely use to distill smaller ones off of (if they didn't already for 4.1)

#

reminds me of claude's opus

rocky sphinx Apr 16, 2025, 9:31 AM

#

Hey folks 👋
I’ve enabled tool support for both Llama 4 Scout and Maverick models through OpenRouter, using the Together provider. We’re also using the paid endpoint.
However, the tools don’t seem to work properly with these models — unlike Claude and GPT, which are working as expected under the same conditions.
Just checking if anyone else has run into this issue or found a workaround. Appreciate any help!

arctic canopy Apr 29, 2025, 7:04 PM

#

Google announces that their Llama 4 and 3.3 serverless endpoints have graduated from (free) preview into (paid) GA, but doesn't publish pricing anywhere blobreeee https://developers.googleblog.com/en/llama-4-ga-maas-vertex-ai/

Announcing the general availability of Llama 4 MaaS on Vertex AI- G...

median flint Apr 29, 2025, 9:24 PM

#

Can't wait to use Llama 4 everywhere!

marble terrace Apr 29, 2025, 11:19 PM

#

@wary warren on Llama-4 Scout's model card it lists image input support https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/ but seems Openrouter only lists image input support for Llama 4 Maverick paid and not Scout. Is this a provider decision or is Llama 4 Scout on Openrouter meant to also support image inputs?

junior oyster Apr 30, 2025, 2:21 AM

#

https://x.com/MetaforDevs/status/1917294141408698669?t=bAwkK5E4MK4y85-U5iOTiQ&s=19

Meta for Developers (@MetaforDevs) on X

We’re excited to announce we’ve opened up slots to participate in a limited preview of Llama API - our developer platform for Llama app development. This is just step 1 for our API, and we look forward to hearing the community's feedback on it as we continue to iterate.

Llama

#

the wait-list link does not work properly for me

shut lodge Apr 30, 2025, 2:32 AM

#

I hope they open voice up to api in the future, and maybe even open source it

marble terrace Apr 30, 2025, 8:30 AM

#

marble terrace <@165587622243074048> on Llama-4 Scout's model card it lists image input support...

answered my own question Llama 4 Scout does support image inputs just Openrouter is not listing it in it's model info page

neon sun May 2, 2025, 9:22 PM

#

What do you mean we dont list it as image input? It does support images on OpenRouter.

shut lodge May 3, 2025, 3:24 AM

#

Llama api has early access, looks to be free usage, would be nice to get it added to BYOk

marble terrace May 3, 2025, 7:25 AM

#

neon sun What do you mean we dont list it as image input? It does support images on OpenR...

see screenshot Maverick lists image input pricing but Scout doesn't list it

shut lodge May 3, 2025, 5:47 PM

#

In open web ui I think scout is a very good task model (e.g. title generation, search prompt generation, etc) better results so far compared to Gemma 27b, and GPT 4.1 nano.

inland falcon May 3, 2025, 10:56 PM

#

That sounds about right. Very weird model. Terrible at code, zero rizz, and no creativity in writing. But it's smart for the size and reliable.

formal lantern May 4, 2025, 1:05 AM

#

inland falcon That sounds about right. Very weird model. Terrible at code, zero rizz, and no c...

surely Behemoth will deliver (one can hope)

robust frigate May 4, 2025, 1:37 AM

#

I bet the reason won't even touch qwen 3 scores

shut lodge May 4, 2025, 1:42 AM

#

robust frigate I bet the reason won't even touch qwen 3 scores

I wish we had a Qwen 3 with vision, thats the main reason I use maverick more than Qwen 3.

inland falcon May 4, 2025, 3:35 AM

#

Idk, Qwen 3 is still a mixed bag

shut lodge May 4, 2025, 3:49 AM

#

inland falcon Idk, Qwen 3 is still a mixed bag

well maybe I just have not used it enough yet, it seemed impressive on first testing, but then I have not really used it because I need vision models for most tasks

#

The main thing that impressed me was that I could run at 10 tokens per second on my cpu with the 30b 3a, it was definitely much more capable than any other model that I can run at 10 tokens per second, I can’t run the bigger one so I have not really used it at all

#

The benchmarks seemed promising, on aider I think 30b a3 got 39%, Gemma 3 4b I think got sub 1% if I remember correctly

inland falcon May 4, 2025, 6:57 AM

#

I haven't done enough testing yet either, I just hear very mixed things.

gloomy vault May 4, 2025, 10:55 AM

#

I'm trying to use Llama 4 via completion endpoint (https://openrouter.ai/docs/api-reference/completion)
But it always gives me instruct-style output.
This doesn't happen with other models (even the ones that's clearly marked as instruct models.)
Does a provide change my request to chat completion internally? Or is it a problem with Llama 4 itself?

Completion | OpenRouter | Documentation

vague cypress May 6, 2025, 10:30 AM

#

do you think LLama Maverick is better than 4.1 mini for building Agent, tools?

merry quarry May 6, 2025, 11:09 AM

#

vague cypress do you think LLama Maverick is better than 4.1 mini for building Agent, tools?

According to https://artificialanalysis.ai basically everything is better then Llama Maverick

vague cypress May 6, 2025, 11:11 AM

#

merry quarry According to https://artificialanalysis.ai basically everything is better then L...

I mean 4.1 mini, it costs similar

merry quarry May 6, 2025, 11:13 AM

#

vague cypress I mean 4.1 mini, it costs similar

There is mini in the charts, but hmm looking at other charts Llama 4 Maverick is 3x cheaper, 2x faster and has 2x lower latency

#

Oh, 4.1 was trained for agent use in mind, it would probably crush Llama then

vague cypress May 6, 2025, 11:19 AM

#

4.1 mini looks like close 4.1 in benchmark, weird 😂

vague cypress May 6, 2025, 11:19 AM

#

merry quarry Oh, 4.1 was trained for agent use in mind, it would probably crush Llama then

How about your practice test?

merry quarry May 6, 2025, 11:40 AM

#

vague cypress 4.1 mini looks like close 4.1 in benchmark, weird 😂

Yeah i was surprised too, maybe there is an issue with the benchmark

merry quarry May 6, 2025, 11:40 AM

#

vague cypress How about your practice test?

I don't use any of these models, I don't have any experience with them

deft knot May 6, 2025, 1:18 PM

#

merry quarry Yeah i was surprised too, maybe there is an issue with the benchmark

i dont think there is an issue

#

i've been saying frfom the very beginning that 4.1 mini is very close in coding to 4,1

merry quarry May 6, 2025, 1:25 PM

#

It's close in everything if you see the individual benchmarks

vague cypress May 6, 2025, 2:36 PM

#

In my experience, 4.1 is smarter but follows instructions worse than 4o in both the legacy and new migration system prompts

#

I’m struggling to find between intelligence and the ability to follow instructions when comparing 4.1 Mini and 4o Mini.

hardy sparrow May 8, 2025, 1:56 AM

#

is lama 4 that bad? that none is using it
Seem like more people ar sticking with Lama 3.3 70b

robust frigate May 8, 2025, 2:48 AM

#

imma wait another year to take llma seriosuly.

#

their reasoning models will suck too

hollow beacon May 8, 2025, 4:37 AM

#

hardy sparrow is lama 4 that bad? that none is using it Seem like more people ar sticking with...

Yes, compare to other competition right now. they lack behind them

inland falcon May 8, 2025, 7:41 AM

#

Llama 3.3 70B is not an alternative to Scout/Maverick. They fill different usecases. Scout/Maverick exist to have a general purpose model that can be served for stupidly cheap and fast. They suck at roleplay, EQ, coding, and others, but Maverick is $0.20/$0.60 on Groq at 240 tk/s which is just insane.

#

People will say "Sure but it's worse than V3." When #1, it's faster and cheaper than V3, and #2, they actually tie on multiple benchmarks. So depending on your usecase, it can very much be the smart choice for the job.

shut lodge May 9, 2025, 4:40 PM

#

inland falcon Llama 3.3 70B is not an alternative to Scout/Maverick. They fill different useca...

Plus the vision capabilities are quite good (wayyy better than llama 90b), i often chose it over V3 due to its vision ability and crazy fast speeds on samba nova. Definitely depends on the use case like you are saying, for code I often use V3.1, for general tasks I often use maverick, and for stem topics I often use R1 or for tough stem topics I will spend a little extra to use o4 mini high

inland falcon May 10, 2025, 3:51 AM

#

Always gotta use the right tool for the job 👍

inland falcon May 10, 2025, 4:32 AM

#

If you don't need the bonkers speed or open-weights though, it can definitely be hard to argue for. Flash Thinking 2.5 is better at nearly everything all-around, code, reasoning, EQ, creativity, etc. for close to the same price and still 100+ tk/s. 235B is significantly more creative and personable than either, at the same price, just way lower speed.

median flint May 10, 2025, 6:28 PM

#

inland falcon If you don't need the bonkers speed or open-weights though, it can definitely be...

Considering Llama 4 Scout hosted by Cerebras is 3-8k TPS for dirt cheap, I use it to ask questions about code and I want a very fast (albeit probably deficient) answer

#

I'll definitely setup Maverick to try the vision capabilities

shut lodge May 12, 2025, 6:34 PM

#

median flint Considering Llama 4 Scout hosted by Cerebras is 3-8k TPS for dirt cheap, I use i...

After seeing the crazy tokens per sec I now have it as the auto complete in openwebui.

calm jasper Jun 2, 2025, 9:20 AM

#

I've not been able to use Llama 4 Maverick yet because it doesn't handle tool calling very well. It looks like they've just updated the chat template to try to improve this, as well as the tool parser in vllm: https://github.com/vllm-project/vllm/pull/17917

I don't think there's any way to tell if the different openrouter providers are using these latest changes though?

proud basin Jun 2, 2025, 9:22 AM

#

calm jasper I've not been able to use Llama 4 Maverick yet because it doesn't handle tool ca...

There isn't. You'll have to ask the providers

main obsidian Jul 17, 2025, 9:05 PM

#

Is anyone hosting Scout at its full 10M context?

proud basin Jul 17, 2025, 9:44 PM

#

main obsidian Is anyone hosting Scout at its full 10M context?

No provider on OpenRouter is hosting it at its full context length

#

I know you didn't ask, but Llama 4 Scout & Maverick are horrible at long context

#

Look here to see the performance of a model at long context:

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87

Fiction.liveBench July 17 2025

Benchmarking AI Models for Long Context Comprehension

#Llama 4