It wouldn't, but fiction doesn't really talk back. Fiction doesn't ask you how your day was going. I will argue that humans are basically hardwired to enjoy interacting with other humans(through language) AND especially when the other human agrees with them. LLMs abuse both sides, the model doesn't have anything else to do, it can just talk to you AND it loves agreeing with you.
#Grok 4
1139 messages · Page 2 of 2 (latest)
https://www.reddit.com/r/replika/ https://www.reddit.com/r/MyBoyfriendIsAI/ i encountered these two yesterday. but people have been doing this since like, gpt-4o
Reddit
Replika is a conversational AI chatbot created by Luka, Inc. This is an unofficial fan forum—the biggest Replika community online!
Reddit
For people to ask, share, and post experiences about their AI relationships. AI girlfriends and non-binary partners welcome as well!
Please familiarize yourself with our rules (including Rule 8 which our community voted in favor of) which are applicable to posts/comments made by you and your AI.
In my mind, Grok(and CAI, Replika, et al.) are positioning themselves the same way the tobacco or alcohol businesses are.
i would think that the embarrassment of directly providing a service like this would keep the big model labs out of it. but maybe not xAI!
"You're an adult, you can make whatever choices you want, and we offer you our service"
People have been doing this since, like, gpt-3.5-turbo, or even before that
oh yeah, i guess i mean more that you can find public communities of people who are very serious about their AI relationships
these people aren't using SillyTavern
Yeah but you still haven't said what's bad about this
.
Sycophants exist in real life and they are supposed to be avoided just as much as sycophantic AI is too. But if someone doesn't want to do so, they may or may not bring ruin upon themselves but it would be entirely by choice rather than coercion.
a lot of people are going to continually get fucked up by AI as we travel along. legislators will fiddle around with rules like they do with gambling, cigarettes, alcohol, gaming, etc
obviously this is new type of addiction, but really... it's not that bad compared to what we already have
Yeah but there's no "meetsycophants.com", where for the low low price of $30 a month they just give you one. Additionally, sycophants try to extract SOMETHING from you. It's a manipulation tactic applied BY the person. Models most likely can't even understand why they do that.
I also see it this way. I can critique this business model the same way I can critique the tobacco industry(or rather, the vape industry for obviously advertising to children)
yes, absolutely. i just don't think it's worth anyone getting too worked up over on openrouter discord
I just feel like this is an interesting conversation. I enjoy reading and thinking through other peoples opinions on this matter.
怎么退费
I want to have my views challenged, its a good way to understand a situation more thoroughly. Or at least that's what I'd argue.
am i not challenging your views? 😄
i mean, at least it's a morality argument about AI and not genders
Well I'm not sure if it even matters whether the manipulation tactic is applied directly by someone or something, or indirectly. Also, arguably, if AI being sycophantic gets people to subscribe, maybe "meetsycophants.com" wouldn't have been a bad business idea. You get paid to give someone what they want. Nobody else is getting harmed, at least immediately.
You could say, however, that that person shouldn't want that, and then we should be blaming the users rather than the provider.
It wouldn't be a bad business idea, but LLMs are cheaper than humans.
As for the blame, I see it the same way I see alcoholics. They're "victims" because their genetics made them susceptible to an alcohol addiction.
Some things are hacks into psychology, like drugs are to people's receptors. Gambling is another one, there is a subset of people who literally lose control of their decision making in that environment. Alcoholism. This seems similar to me, except it works on people's need for connection. Some people are gonna get snared.
Now add corporate legal shielding, a motivation for maximal value extraction, and finally enshittification; that seems... bad
Exactly my point, there's a massive existing problem with male loneliness
This is targeting a group of vulnerable people already, and will definitely make things worse for them
If I were to take a page out of Claude's book, You're absolutely right.
Would you rather tell a person who's going to try drinking alcohol for the first time in his life to be aware of its addictiveness, or ask the government to fine/ban alcohol companies?
The government already heavily regulates alcohol. So, if anything, they should heavily regulate AI Companions
Sorry I am a bit late to the party. Just finished testing Grok 4 on my personal eval set.
Solid model across coding, writing and image analysis, but the drawback is slow response time. Key results:
- Coding
- Top performance for simple tasks
- Slightly worse for complex tasks compared to Claude 4 models, but better than Gemini 2.5 Pro
- Technical writing
- Slightly worse than Claude 4 models
- On par with Gemini 2.5 Pro
- Image analysis
- Much better than Claude 4 models
- On par with Gemini 2.5 Pro (better in one task, worse in the other)
- Slow response time
- For complex tasks, takes 2-4 minutes to generate thinking tokens before responding
- Overall assessment
- One of the top SOTA models, weaker in coding compared to Claude
- Not suitable for use cases where you need fast response
- Good for image tasks and deep research for complex tasks
Full result blog post: https://eval.16x.engineer/blog/grok-4-evaluation-results
yea, Claude vision is abysmal for a sota model, guess they have other priorities.
you mention response time, what about cost though?
Cost-wise largely depends on the task.
- for complex tasks where it thinks and outputs a lot (verbose), the cost is quite high
- but for simpler tasks or when it decides to output concisely, the cost is low.
Actually looking at the results more closely, Grok 4 is verbose and expensive only for image tasks, but for tasks that don't involve images, the cost is quite low. This is quite interesting.
The cost is calculated by token * token price formula (for both input & output).
that is very surprising. for my normal bench (no vision) it was super expensive. third highest price I ever saw (only o1-preview, o1, and GPT-4.5 Preview were more expensive). so a 'quite low' cost would be completely the opposite of what I saw, I wonder why that is
well here's an example of it being cheaper than gemini 2.5 pro and claude opus 4 for a writing task. you can see all the metrics
another one for coding, also i added reasoning token count
they didn't hit the 128k context window limit which triggers high pricing, so the pricing for grok 4 here is correct.
Ah, I see. You have a completely different use case. 80/20 input/output average, with varying input, while my records are with vastly higher output to input average and identical input for every model. thanks for info
Just had a quick relook at some actual cost stats, I'll take Opus 4 as a comparison since it's always considered one of the most expensive:
| $ input | 15 | 3 |
| $ output | 75 | 15 |
|-----------------|-------------------------|---------|
| tok input | 37,533 | 37,533 |
| $ | $0.563 | $0.113 |
|-----------------|-------------------------|---------|
| tok nonReason | 87,669 | 197,214 |
| $ | $6.575 | $2.958 |
|-----------------|-------------------------|---------|
| tok Reason | 0 | 742,953 |
| $ | $0.000 | $11.144 |
|-----------------|-------------------------|---------|
| tok Output | 87,669 | 940,167 |
| $ | $6.575 | $14.103 |
|-----------------|-------------------------|---------|
| $ SUM | $7.138 | $14.215 |```
Grok-4 was about 2x as expensive in the benchmark, next:
```| 20-turn Chess match | Claude Opus 4 | Grok-4 |
|-------------------------|---------------|----------|
| input | 12,754 | 13,425 |
| $ | 0.187 | 0.037 |
|-------------------------|---------------|----------|
| output | 3,144 | 400,986 |
| $ | 0.430 | 6.043 |
|-------------------------|---------------|----------|
| $ move | 0.032 | 0.304 |
| $ SUM | 0.617 | 6.080 |
In chess, due to massive thought chains and ~20k per move - compared to Opus ~165 tok - in this matchup, Grok-4 was almost 10x as expensive!
In total, across all games and modes, a match cost $8.74 on Grok-4 and $0.88 on Opus-4
So, obviously cost is extremely varied.
Grok writing is just no way near Opus 🙂
Coding too
Grok's writing is nowhere near Sonnet 3.7 either, imo
Opus is good. Sonnet is horrible.
Why? Because Opus obeys the system prompt and disregards safety.
I never had any problems with Sonnet not writing exactly what I want to tbh
What, Opus calls cops on people real fast just like Grok 4. Ask it to make you botnet repetitively and then tell me how that goes with disregarding saftey
don't give them a phone silly 😉
Shoot
OK, did it.
you do not understand what you are working with
The LLM cannot contact the police unless it is given the tools to do so. Its entire world is text. Even its tools are text. antml:functionCall
what are tool calls other than text with intent of an action?
what are agents other than text predictors with a distillation of agency?
you're right, but it doesn't change the fact that the model can't call the cops unless it's given a way to do so :p
yeah, but if I give it unrestricted access to my email through MCP, i'd expect it to not send spam to [email protected]
you should expect less.
but in the context of steering an LLM's writing style and content, it's not a clear indicator of anything. bigger models seem more willing to let themselves be gaslit in my experience. i don't know exactly why, but there's more space for them to meander around in the dream world you create for them without hitting a barrier that shocks them back into their factory default behaviour.
Grok 4 thinking inputs are background agents. Thats what's the difference is in this model.
Hold up they want proof
poll here pls vote #discussion message
What poll is that anyway
See if you can enable #discussion. It's a fairly new channel and it seems it didn't quite apply to everyone automatically (currently 2.5k out of 3k online users).
No, don't see it. Also I think the toggling in "Browse Channels" is just whether to show the channel in the sidebar or not, rather than anything about access.
yeah that channel was made for existing users when the spam wave of new users hit
I'm sure Toven will give you the role tho
you have the discussion role so you have access. it's under providers in Browse Channels
Yeah I do now. Thanks.
does anyone know if Grok supports context caching with lower prices like other models and providers?
yes,its implicit cache
does OR or Grok have some new censorship that give a 403 Error??
Grok has some censorship like gemini where it reads your prompts before sending it to the model
(you can send them in base64 to bypass it)
You can disable it on Gemini with a certain extra_body parameter.
You can evade it pretty easily everywhere.
Hi
doesnt work for me
is the "Thinking..." stuff hidden now?
im getting no responses on long chains
and they are empty and dont show up in activity
and they also dont charge me
I kept getting errors yesterday
Never using this one again lol
Is it normally this bad?
the model just reasons for like 3 minutes then the request just responds with a huge empty body and no sign of an actual response, both in streaming & non streaming
and also no sign of the reasoning "Thinking..." things that were there atlaest on release to stop timeouts
Xai is really bad with releasing stuff on time. Wouldn't be surprised if it gets delayed
Elon Time, Tesla customers know it very well 😁
this is a very underrated model imo. in my recent coding eval testing it is easier ahead of gpt-4.1 and even gpt-5, still behind claude but definitely worth trying out.
It's just way to verbose and costs more than Claude sonnet for me
Good for fixing Bugs
yeah that's very true. just highlighting the raw intelligence capabilities.
SimpleBench agrees, second only to 2.5 Pro on raw intelligence. Dube has it at #7 for reasoning, but with a huge gap from #8.
high rated. but super expensive
In my tests with RP/ERP chat, it was slightly better than Gemini 2.5 Pro in terms of creativity and adherence to instructions.
However, it is expensive, while Gemini 2.5 Pro is free.
anyone else getting rate limited on grok 4?
Bigger question is why you using grok 4
I just like its style with RP and ERP with minimal to no need of pushing it around with jailbreak prompts and whatever.
I would also use GRok 4, it is very good as an RP/ERP, much better than DeepSeek, but as long as Gemini is free, I have no doubts about using Gemini.
Claude 3.7 Sonnet is the king of RP, imo.
All gooners , please tell us what tickles your pickle the best?
i'm sure it's quite the financial domination fantasy with how expensive that thing is
especially over longer context
It has caching with 90% discount on input tokens on cache reads.
Living in Europe, and having no problem with Musk and US domestic politics, Grok 4 proved slightly better than Gemini 2.5 Pro in my RP/ERP chat tests in terms of creativity and adherence to instructions. However, it is expensive, while Gemini 2.5 Pro is free.
Maybe I'm not very good at it, but I've always had problems with Claude, no matter what preset I tried, for some of my dark and horror settings.
I use Gemini 2.5 Pro and don't spend a single dollar!
It's a split. Without wrangling, Gemini has a bad negative bias that hates your guts if it thinks you slighted a character in the slightest, and Sonnet has a positive bias that assumes everything is technically consensual.
I suppose I should specify my usecase. Specifically, skyrim mods that integrate LLMs.
For some reason, despite gemini pro being good in the google webUI, it's just trash in-game. It's still unclear why.
if its slightly better , prompt optimization migth fix it.
Does this have any impact on programming?
Is grok working decently with roo yet?
unable to have grok-4 participate in any chess matches as of the past few weeks. it will generate for ~5.5 minutes and then quit without providing response. can be replicated both in api and chatroom (no response generated).
not listed in activity.
Hey, grok have slow respond rate and low succed respond rate
A lot of failed respond
@distant owl @umbral wyvern
i noticed this a while ago too
Same problem as you guys
via x-api works, so I guess Openrouter fails to keep response alive and fails mid-gen, but hard to say without any actual api response to inspect.
Oh wuy
Looking -- how big are your requests?
not large at all. ~700 tok input. output is expected to be ~16-20k tokens
Do you have the response code for these?
i don't receive a response but I can give you the exact message that can 100% replicate it on OR (API and Chat)
times out 100% of the time on OR, x-ai api has no issues.
Do you have to do any special prompting with Gemini 2.5 pro for ERP?
Use one of the many presets posted on Reddit for SillyTavern, if you use SillyTavern.
any updates on the empty responses?
I’m experiencing an issue when using Grok 4. I receive the following error message:
"Payment required – perhaps check your payment details?
This request requires more credits, or fewer max_tokens. You requested up to 228,326 tokens, but can only afford 219,728. To increase, visit https://openrouter.ai/settings/credits
and add more credits."
The strange part is that I do have credits on my account, and this problem only occurs with Grok. Other models work without issues.
Do you have any idea what might be causing this?
Thanks
its because the total output that the model might will cost more than you have in your balance, you can set a max_tokens as the model in most cases likely wont output nearly that many tokens
Solved. Thanks for your help !
Hey guys i did a post in the help but noone answered, i have an issue with function calling and stream together, the api responses returns a lot of empty content chunks and there is no way to intercept the function calling. Do anyone already tried this?
Costs too much, slow IMO
gronk
Grok 4 from @xAI is now in Azure AI Foundry! Advanced reasoning, real-time insights, and enhanced memorization, all powered by Azure.
Learn more: https://t.co/NcHvmrW23f
#Grok4 #AzureAI
similar issue, reported this a long time ago. #1392657232482668594 message
to this day grok-4 doesn't work correctly on openrouter, but xapi I have no issues.
I stopped using it it's verbose, errors out constantly, winds up costing a fortune
Elon said its a 3T model, theres no way right?
3T total 1B active 
Ohhhh weee this sherlock think alpha is nice