#Grok 4

1139 messages · Page 2 of 2 (latest)

narrow moth
#

It wouldn't, but fiction doesn't really talk back. Fiction doesn't ask you how your day was going. I will argue that humans are basically hardwired to enjoy interacting with other humans(through language) AND especially when the other human agrees with them. LLMs abuse both sides, the model doesn't have anything else to do, it can just talk to you AND it loves agreeing with you.

minor phoenix
#

https://www.reddit.com/r/replika/ https://www.reddit.com/r/MyBoyfriendIsAI/ i encountered these two yesterday. but people have been doing this since like, gpt-4o

narrow moth
#

In my mind, Grok(and CAI, Replika, et al.) are positioning themselves the same way the tobacco or alcohol businesses are.

minor phoenix
#

i would think that the embarrassment of directly providing a service like this would keep the big model labs out of it. but maybe not xAI!

narrow moth
#

"You're an adult, you can make whatever choices you want, and we offer you our service"

narrow moth
minor phoenix
#

oh yeah, i guess i mean more that you can find public communities of people who are very serious about their AI relationships

#

these people aren't using SillyTavern

earnest geyser
narrow moth
#

.

earnest geyser
#

Sycophants exist in real life and they are supposed to be avoided just as much as sycophantic AI is too. But if someone doesn't want to do so, they may or may not bring ruin upon themselves but it would be entirely by choice rather than coercion.

minor phoenix
#

a lot of people are going to continually get fucked up by AI as we travel along. legislators will fiddle around with rules like they do with gambling, cigarettes, alcohol, gaming, etc

#

obviously this is new type of addiction, but really... it's not that bad compared to what we already have

narrow moth
narrow moth
minor phoenix
narrow moth
#

I just feel like this is an interesting conversation. I enjoy reading and thinking through other peoples opinions on this matter.

oblique lance
#

怎么退费

narrow moth
#

I want to have my views challenged, its a good way to understand a situation more thoroughly. Or at least that's what I'd argue.

minor phoenix
#

i mean, at least it's a morality argument about AI and not genders

earnest geyser
# narrow moth Yeah but there's no "meetsycophants.com", where for the low low price of $30 a m...

Well I'm not sure if it even matters whether the manipulation tactic is applied directly by someone or something, or indirectly. Also, arguably, if AI being sycophantic gets people to subscribe, maybe "meetsycophants.com" wouldn't have been a bad business idea. You get paid to give someone what they want. Nobody else is getting harmed, at least immediately.
You could say, however, that that person shouldn't want that, and then we should be blaming the users rather than the provider.

narrow moth
steep talon
#

Some things are hacks into psychology, like drugs are to people's receptors. Gambling is another one, there is a subset of people who literally lose control of their decision making in that environment. Alcoholism. This seems similar to me, except it works on people's need for connection. Some people are gonna get snared.

Now add corporate legal shielding, a motivation for maximal value extraction, and finally enshittification; that seems... bad

zenith brook
#

This is targeting a group of vulnerable people already, and will definitely make things worse for them

narrow moth
earnest geyser
narrow moth
hoary magnet
#

Sorry I am a bit late to the party. Just finished testing Grok 4 on my personal eval set.

Solid model across coding, writing and image analysis, but the drawback is slow response time. Key results:

  • Coding
    • Top performance for simple tasks
    • Slightly worse for complex tasks compared to Claude 4 models, but better than Gemini 2.5 Pro
  • Technical writing
    • Slightly worse than Claude 4 models
    • On par with Gemini 2.5 Pro
  • Image analysis
    • Much better than Claude 4 models
    • On par with Gemini 2.5 Pro (better in one task, worse in the other)
  • Slow response time
    • For complex tasks, takes 2-4 minutes to generate thinking tokens before responding
  • Overall assessment
    • One of the top SOTA models, weaker in coding compared to Claude
    • Not suitable for use cases where you need fast response
    • Good for image tasks and deep research for complex tasks

Full result blog post: https://eval.16x.engineer/blog/grok-4-evaluation-results

faint hatch
hoary magnet
#

Actually looking at the results more closely, Grok 4 is verbose and expensive only for image tasks, but for tasks that don't involve images, the cost is quite low. This is quite interesting.

The cost is calculated by token * token price formula (for both input & output).

faint hatch
hoary magnet
#

another one for coding, also i added reasoning token count

#

they didn't hit the 128k context window limit which triggers high pricing, so the pricing for grok 4 here is correct.

faint hatch
faint hatch
# hoary magnet well here's an example of it being cheaper than gemini 2.5 pro and claude opus 4...

Just had a quick relook at some actual cost stats, I'll take Opus 4 as a comparison since it's always considered one of the most expensive:

| $ input         | 15                      | 3       |
| $ output        | 75                      | 15      |
|-----------------|-------------------------|---------|
| tok input       | 37,533                  | 37,533  |
| $               | $0.563                  | $0.113  |
|-----------------|-------------------------|---------|
| tok nonReason   | 87,669                  | 197,214 |
| $               | $6.575                  | $2.958  |
|-----------------|-------------------------|---------|
| tok Reason      | 0                       | 742,953 |
| $               | $0.000                  | $11.144 |
|-----------------|-------------------------|---------|
| tok Output      | 87,669                  | 940,167 |
| $               | $6.575                  | $14.103 |
|-----------------|-------------------------|---------|
| $ SUM           | $7.138                  | $14.215 |```
Grok-4 was about 2x as expensive in the benchmark, next:

```| 20-turn Chess match     | Claude Opus 4 | Grok-4   |
|-------------------------|---------------|----------|
| input                   | 12,754        | 13,425   |
| $                       | 0.187         | 0.037    |
|-------------------------|---------------|----------|
| output                  | 3,144         | 400,986  |
| $                       | 0.430         | 6.043    |
|-------------------------|---------------|----------|
| $ move                  | 0.032         | 0.304    |
| $ SUM                   | 0.617         | 6.080    |

In chess, due to massive thought chains and ~20k per move - compared to Opus ~165 tok - in this matchup, Grok-4 was almost 10x as expensive!
In total, across all games and modes, a match cost $8.74 on Grok-4 and $0.88 on Opus-4
So, obviously cost is extremely varied.

rough stream
#

Grok writing is just no way near Opus 🙂

solemn sage
steady sierra
ripe bolt
#

Opus is good. Sonnet is horrible.

#

Why? Because Opus obeys the system prompt and disregards safety.

steady sierra
#

I never had any problems with Sonnet not writing exactly what I want to tbh

woeful field
minor phoenix
woeful field
#

Shoot

ripe bolt
#

you do not understand what you are working with

#

The LLM cannot contact the police unless it is given the tools to do so. Its entire world is text. Even its tools are text. antml:functionCall

narrow moth
ripe bolt
narrow moth
minor phoenix
#

but in the context of steering an LLM's writing style and content, it's not a clear indicator of anything. bigger models seem more willing to let themselves be gaslit in my experience. i don't know exactly why, but there's more space for them to meander around in the dream world you create for them without hitting a barrier that shocks them back into their factory default behaviour.

woeful field
#

Hold up they want proof

distant owl
#

poll here pls vote #discussion message

low veldt
earnest geyser
viral spade
#

yeah that channel was made for existing users when the spam wave of new users hit

#

I'm sure Toven will give you the role tho

low veldt
#

Ah.

#

Well, the poll is this.

minor phoenix
craggy hearth
#

does anyone know if Grok supports context caching with lower prices like other models and providers?

quartz jetty
#

does OR or Grok have some new censorship that give a 403 Error??

wintry radish
#

Grok has some censorship like gemini where it reads your prompts before sending it to the model

steady sierra
#

(you can send them in base64 to bypass it)

ripe bolt
jovial gulch
#

Hi

vivid gull
hybrid kestrel
#

is the "Thinking..." stuff hidden now?

#

im getting no responses on long chains

#

and they are empty and dont show up in activity

#

and they also dont charge me

steel anvil
#

I kept getting errors yesterday

#

Never using this one again lol

#

Is it normally this bad?

hybrid kestrel
#

the model just reasons for like 3 minutes then the request just responds with a huge empty body and no sign of an actual response, both in streaming & non streaming

#

and also no sign of the reasoning "Thinking..." things that were there atlaest on release to stop timeouts

steel anvil
#

Yep ran through $1 yesterday since this PoS kept crashing lol

old minnow
#

Elon confirms on August 3rd: Grok Code will be released this month

strange sand
#

Xai is really bad with releasing stuff on time. Wouldn't be surprised if it gets delayed

old minnow
#

Elon Time, Tesla customers know it very well 😁

hoary magnet
#

this is a very underrated model imo. in my recent coding eval testing it is easier ahead of gpt-4.1 and even gpt-5, still behind claude but definitely worth trying out.

rough stream
#

It's just way to verbose and costs more than Claude sonnet for me

old minnow
#

Good for fixing Bugs

hoary magnet
clever yew
#

SimpleBench agrees, second only to 2.5 Pro on raw intelligence. Dube has it at #7 for reasoning, but with a huge gap from #8.

faint hatch
vocal frost
#

In my tests with RP/ERP chat, it was slightly better than Gemini 2.5 Pro in terms of creativity and adherence to instructions.
However, it is expensive, while Gemini 2.5 Pro is free.

vivid gull
#

anyone else getting rate limited on grok 4?

stray arch
#

Bigger question is why you using grok 4

vivid gull
#

I just like its style with RP and ERP with minimal to no need of pushing it around with jailbreak prompts and whatever.

vocal frost
#

I would also use GRok 4, it is very good as an RP/ERP, much better than DeepSeek, but as long as Gemini is free, I have no doubts about using Gemini.

marsh turtle
#

Claude 3.7 Sonnet is the king of RP, imo.

stray arch
#

All gooners , please tell us what tickles your pickle the best?

sacred ravine
#

especially over longer context

marsh turtle
vocal frost
# stray arch All gooners , please tell us what tickles your pickle the best?

Living in Europe, and having no problem with Musk and US domestic politics, Grok 4 proved slightly better than Gemini 2.5 Pro in my RP/ERP chat tests in terms of creativity and adherence to instructions. However, it is expensive, while Gemini 2.5 Pro is free.
Maybe I'm not very good at it, but I've always had problems with Claude, no matter what preset I tried, for some of my dark and horror settings.
I use Gemini 2.5 Pro and don't spend a single dollar!

low veldt
marsh turtle
stray arch
steel anvil
#

Is grok working decently with roo yet?

faint hatch
#

unable to have grok-4 participate in any chess matches as of the past few weeks. it will generate for ~5.5 minutes and then quit without providing response. can be replicated both in api and chatroom (no response generated).
not listed in activity.

raven ether
#

Hey, grok have slow respond rate and low succed respond rate
A lot of failed respond

@distant owl @umbral wyvern

hybrid kestrel
#

i noticed this a while ago too

old minnow
#

Same problem as you guys

faint hatch
#

via x-api works, so I guess Openrouter fails to keep response alive and fails mid-gen, but hard to say without any actual api response to inspect.

umbral wyvern
#

Looking -- how big are your requests?

faint hatch
#

not large at all. ~700 tok input. output is expected to be ~16-20k tokens

umbral wyvern
faint hatch
vivid gull
vocal frost
#

Use one of the many presets posted on Reddit for SillyTavern, if you use SillyTavern.

https://www.reddit.com/r/SillyTavernAI/

hybrid kestrel
#

any updates on the empty responses?

grand lotus
#

I’m experiencing an issue when using Grok 4. I receive the following error message:

"Payment required – perhaps check your payment details?
This request requires more credits, or fewer max_tokens. You requested up to 228,326 tokens, but can only afford 219,728. To increase, visit https://openrouter.ai/settings/credits
and add more credits."

The strange part is that I do have credits on my account, and this problem only occurs with Grok. Other models work without issues.

Do you have any idea what might be causing this?
Thanks

hybrid kestrel
grand lotus
#

Solved. Thanks for your help !

ancient jackal
#

Hey guys i did a post in the help but noone answered, i have an issue with function calling and stream together, the api responses returns a lot of empty content chunks and there is no way to intercept the function calling. Do anyone already tried this?

steel anvil
#

Costs too much, slow IMO

ripe bolt
#

gronk

hybrid kestrel
faint hatch
steel anvil
deep vessel
#

Elon said its a 3T model, theres no way right?

clever yew
#

3T total 1B active kappa

steel anvil
#

Ohhhh weee this sherlock think alpha is nice