Grok 4.3 | OpenRouter | Page 1

rigid parrot Apr 17, 2026, 10:22 AM

#

Grok 4.3 (beta) now appears on the Grok web with an Early Access label. ase

abstract grotto Apr 17, 2026, 12:45 PM

#

Grok 4.3 already

#

Didn’t 4.2 come out like just a couple months ago

main ferry Apr 17, 2026, 3:47 PM

#

abstract grotto Didn’t 4.2 come out like just a couple months ago

grokmaxxing

red bloom Apr 30, 2026, 11:15 PM

#

Grok 4.3 api is here https://docs.x.ai/developers/models

Models and Pricing | xAI Docs

We offer a range of models supporting multiple use cases and modalities.

#

unfortunately same price as 4.2 :/

red bloom Apr 30, 2026, 11:34 PM

#

https://fixupx.com/ArtificialAnlys/status/2049987001655714250

Artificial Analysis (@ArtificialAnlys)

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20
︀︀
︀︀The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite.
︀︀
︀︀Key Takeaways:
︀︀
︀︀➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level
︀︀
︀︀➤ Large increase in real world agentic task performance…

#

@tender path sorry for the ping btw

#

eyy

#

wait what they lowered the prices

#

on 4.20

zealous knot Apr 30, 2026, 11:39 PM

#

yoo grok 4.3 locked in dawg???

red bloom Apr 30, 2026, 11:40 PM

#

red bloom wait what they lowered the prices

4.20 used to be 2/6, now its cheaper

zealous knot Apr 30, 2026, 11:41 PM

#

grok 4.3 for coding ahh tho, decent for agentic use ig

tender path Apr 30, 2026, 11:41 PM

#

red bloom <@165587622243074048> sorry for the ping btw

ya sorry we didn’t get a heads up and i was busy for a while

#

don’t feel bad for pinging

#

appreciate you

ancient grotto Apr 30, 2026, 11:48 PM

#

I mean, those throughput numbers are pretty solid if they're accurate

ancient grotto May 1, 2026, 1:51 AM

#

Not being able to control reasoning is interesting

ancient grotto May 1, 2026, 2:10 AM

#

Aaaaand we're down to the 40-50tps

dire hawk May 1, 2026, 2:55 AM

#

theres no way this is a 500b param model

summer onyx May 1, 2026, 4:03 AM

#

dire hawk theres no way this is a 500b param model

you think it's more or less?

dire hawk May 1, 2026, 4:19 AM

#

way more

#

i feel like 1.5-2t parameter would make sense for the performance

#

unless they’ve actually done some real llm science but i kinda doubt it

#

i feel like they are just getting most of their perf with the amount of compute they have

dusty current May 1, 2026, 5:41 AM

#

very impressive model so far honestly

dire hawk May 1, 2026, 5:56 AM

#

has anyone tested vision performance with it?

crystal bane May 1, 2026, 6:38 AM

#

dire hawk i feel like 1.5-2t parameter would make sense for the performance

If this is true kind of insane pricing they have for api

#

Maybe Im just too used to being slapped on the ass by anthropic

dire hawk May 1, 2026, 6:39 AM

#

well g3flash is like 1.2t

crystal bane May 1, 2026, 6:48 AM

#

But its priced the same as 4.2 which was 500b right 🤔

dire hawk May 1, 2026, 7:40 AM

#

if so, distillation is going really damn well for them because this is insane performnace for 500b tbh

vocal orbit May 1, 2026, 8:23 AM

#

For aa is it not just slightly higher than qwen 3.6 plus which is 400b?

#

Or are you talking about actual testing

#

Guess I can try it

sterile grove May 1, 2026, 8:51 AM

#

"You should drive." 👍

dusty current May 1, 2026, 11:58 AM

#

lol

#

as much as i love vending-bench 2, they sometimes have some very different results than what a lot of benchmarks say

#

could be in the prompting or harness, not sure

#

i have noticed a tendency for 4.3 to be a bit more "lazy" though, so this could have to do with it

left vector May 1, 2026, 12:15 PM

#

It could also be whatever optimization or streamlining that xAI do is harming the specific set of weights which are responsible for task similar to vending bench.

bitter sphinx May 3, 2026, 2:46 PM

#

I'm finding that Grok 4.3 is a regression to Grok 4.1 Fast (Reasoning) on my social deduction benchmark 🫣

#

Also had an unusual preference to wait around sometimes

fossil goblet May 4, 2026, 9:45 PM

#

bitter sphinx I'm finding that Grok 4.3 is a regression to **Grok 4.1 Fast (Reasoning)** on my...

What bench is this?

dusty current May 4, 2026, 11:12 PM

#

Grok 4.3 lands in quite a great spot on the pareto frontier based on Artificial Analysis data

#

It has a very low hallucination rate, and answers questions correctly at a rate comparable to DeepSeek V4 Pro

#

even though it has a higher sticker price than V4 pro, the token efficiency ends up making it cheaper to use in the long run

#

but if you prefer a non-Grok model, MiMo-V2.5-Pro also performs around the same level

abstract grotto May 5, 2026, 12:53 AM

#

my goat

dire hawk May 5, 2026, 2:08 AM

#

i love this chart

potent spindle May 5, 2026, 2:35 AM

#

Me too. Is it online? Where can we find it?

dusty current May 5, 2026, 3:28 AM

#

potent spindle Me too. Is it online? Where can we find it?

thank you! it's not online yet, but i am working on an interactive site to view the chart. should be up soon 🙏

bitter sphinx May 5, 2026, 7:36 AM

#

fossil goblet What bench is this?

Here: https://clocktower-radio.com/
Not sure why it's so low - maybe my benchmark is flawed.

Clocktower Radio

An LLM benchmark testing the limits of AI reasoning and social intelligence through autonomous games of Blood on the Clocktower.

fossil goblet May 5, 2026, 9:20 AM

#

Cool that it's Pareto but kind of embarrassing for them that it's basically just on par with Chinese SotA

dusty current May 5, 2026, 10:34 AM

#

fossil goblet Cool that it's Pareto but kind of embarrassing for them that it's basically just...

grok models always land in a weird spot performance wise

#

4.1 fast was pareto efficient before gemma 4 31b was around, and 4.20 was the top pareto model for like a day lol

#

the main benefit of 4.3 vs. MiMo though is that it generally has more world knowledge in testing, which can be useful for general chatbots (node size on the graph represents the AA-Omniscience Accuracy benchmark, which measures accuracy on general world knowledge questions)

#

won't make a huge difference for agentic tasks though

fossil goblet May 5, 2026, 11:51 AM

#

bitter sphinx Here: https://clocktower-radio.com/ Not sure why it's so low - maybe my benchmar...

I'm inclined to respect this bench because my favorite model gets 1st 😎

left vector May 5, 2026, 12:00 PM

#

Hmm
This model is interesting, in coding deepseek being better than it

bitter sphinx May 5, 2026, 1:57 PM

#

@dusty current you inspired me to do something similar with my own data (cleans up an existing graph I had):

dusty current May 5, 2026, 3:42 PM

#

super cool

#

pareto frontier graphs are just great for value analysis

dire hawk May 5, 2026, 3:59 PM

#

bitter sphinx <@1272320397907329128> you inspired me to do something similar with my own data ...

feel like this one is a bit more skewed towars the retarded models side

#

but yeah pareto frontier graphs ar ejust so useful

bitter sphinx May 5, 2026, 4:22 PM

#

mebbe cause I haven't benched 5.5 xhigh and Opus 4.7 max
but those are ridiculously expensive

dusty current May 5, 2026, 5:13 PM

#

dire hawk feel like this one is a bit more skewed towars the retarded models side

i think the results actually make some sense - mimo v2.5 pro is a very capable model that's quite underrated, and opus sometimes does worse on certain benchmarks than chinese models

bitter sphinx May 5, 2026, 5:35 PM

#

I think Grok 4.3 was the only big surprise - it just waits around while the other model's players start probing for information
I've only seen that happen with the smaller models

dusty current May 5, 2026, 6:20 PM

#

bitter sphinx I think Grok 4.3 was the only big surprise - it just waits around while the othe...

vending-bench 2 found a similar result, where grok 4.3 would literally opt to just rest instead of taking action, which resulted in pretty poor scores

#

i'm also noticing in my own testing that grok 4.3 constantly hits conclusions as fast as possible when asked to do research, instead of being thorough

#

i think they might've overtuned the token efficiency training signal for the model, to the point where the model learned in agentic tasks to just opt for doing less - or nothing at all - to save tokens

#

it's the biggest weakness of the model to be honest

bitter sphinx May 5, 2026, 6:38 PM

#

dusty current i think they might've overtuned the token efficiency training signal for the mod...

I buy that theory but I've also got Grok 4.3 as slightly verbose at 2,123 tokens per action (Kimi 2.6 - 5,038, GPT 5.5 - 403)
So it also spends time about thinking about doing nothing 😂

dusty current May 5, 2026, 6:39 PM

#

yeah it's super weird, i'm honestly convinced that the always-on reasoning with no effort parameters is causing the odd behavior

#

when you combine that with a model that was probably trained to be conservative on agentic tasks, you get this

#

it's still a very capable model, but you have to fight it a bit lol

fossil goblet May 6, 2026, 8:49 AM

#

I don't trust inconsistent / spiky models even if there are clever fixes.

#

LLMs are general intelligences, and plenty of models are well-rounded

quick lotus May 6, 2026, 11:30 AM

#

dusty current could be in the prompting or harness, not sure

this level of regression they reported here.. doesn't make much sense, definetely implicates harness or prompting

ancient grotto May 6, 2026, 2:58 PM

#

Looks like reasoning effort is available now

dusty current May 11, 2026, 5:21 PM

#

tool calling becomes incredibly inconsistent with this model if you disable reasoning

#

what a shame

#

deepseek v4 flash with disabled reasoning performs much better

errant cairn May 28, 2026, 8:34 AM

#

Is the x search tool enabled through OpenRouter?
https://docs.x.ai/developers/tools/x-search

X Search | xAI Docs

Learn how to use the X Search tool for searching X posts, users, and threads.

#Grok 4.3