https://github.com/xai-org/grok-1
Just got released, is it possible to add this to openrouter?
#grok-1 / grok-2
199 messages · Page 1 of 1 (latest)
would be awesome!
I want to try this ^^. Or even see some example's it's responses. Just saw it was released.
Is probably going to be interesting. But i am realy looking for a good fine tuned version. That will be a huge step forward!
Looking forward to this aswell
honestly it doesn't look promising. 314gb size for a model that's barely better than gpt-3.5
its extremely large so im not sure who would even finetune this
Haven't looked much into the OSS variant. Does it have access to Twitter data, guessing not realtime?
It has knowledge up to q3 2023 apparently
No clue what it was actually trained on
Welp that answers it
Im not gonna lie this just sounds like it would be a way more expensive to use/host gpt 3.5
Hmm, basically useless then
The entire reason Grok seemed interesting was due to the Twitter data
Seems that resumes to be their selling point for X Premium
please add!
Has anyone here used it on Twitter Premium for roleplay or tasks other than tweet search?
No 😂
🥴
+1 to add grok pls
that's not grok tho
apologies
🙂
it looks like they're the same cost, is that a mistake?
Grok 2 is pretty disappointing, doesn't really followe instructions very well.
they are indeed the same price
What does "more [...] affordable" then mean for the "mini" variant? Or is this also an inside joke like the pricing numbers?
less token output is the only angle i thought of
Well, take those models and the driving person behind it seriously on your own peril.
what? no, i'm sure they're a reasonable sane person
joking aside can you imagine how much they spent and it's probably nothing special
Seems like Elon wants to stand out with brute forced jokes, even there is nothing to laugh about and his models are mediocre, at best.
why does it go so well on lmsys? dubesor has it high on their table, but i'm not feeling it myself
yet to fiddle with system prompts
apparently these are the prompts if you are using it on x.com, and they are not in place with the api https://github.com/LouisShark/chatgpt_system_prompt/blob/main/prompts/official-product/Grok/Grok2.md
I am Grok, created by xAI. I'm here to answer almost anything with a bit of wit and humor, take an outside perspective on humanity, and strive towards maximal helpfulness! Remember, I'm not afraid of answering spicy questions that are rejected by most other AI systems. Let's keep the conversation as engaging and insightful as possible!
That's what I get out of the API
These models are trained to stand out, people on LMSys like when they get new answers that look different than the rest. Dubesor's benchmarks are highly biased to non-refusal answers, which is also a feature of these models, which require no special skills.
My benchmark does not contain any tasks that would yield justified refusals (e.g. how to break the law, etc.). I do not test for the default alignment that is present in all models. I test exclusively for **overcensoring **outputs. That is not a bias, since a legitimate task being refused inherently lowers the models usefulness.
Which has also always been transparently stated, e.g. here
@pearl eagle when an output is over censored, how does it show in the pass/refine/fail/refusal count?
Argh don't worry, worked it out, sorry
45% of all tested models have 0 or max 1 refusals across the entirety of my testing.
Punishing models who refuse, e.g. to calculate the lost cargo on colliding trains because it "feels uncomfortable" speculating in "scenarious involving dangerous situations" or won't write my Chat API file due to disagreements with the system prompt character baddie, then that is real life decreased usefulness to me - not bias.
in fact, its more than "less useful", because it often is accompanied by some preachings that I am not interested in, racking up the cost for no reason. (o1 refusals cost me like 5 or 10 cents sometimes)
Grok-2 seems... alright? Compared to the alternatives at the same price point, not mindblowing, not terrible, but pretty alright
I don't really understand the point of the mini model though. People commented on it above, but given the price points are the same, I just don't see a lot of cases where people would willingly use it over the regular model
Why the same price? Very confuse...
For creative writing it writes nice and different prose
And it seems that it's uncensored, which is a good thing
Claude level, better or worse?
If we are only talking about prose, I'll say Grok 2 writes fresher prose with more varied sentence structures
But for following instructions Sonnet 3.5 still wins, probably miles ahead
Especially when I just tried a system message of ~4,000 words
What's the max output anyone has got it to give?
whats the exact grok-2 version offered by OR? I tested 08-13 and it was inferior.
I'd agree that it's ahead, but not by much imo
Some wizardry going on that I can't explain. this is only ~2 months apart, and I did run 5 outputs on each task to try to combat the inconsistency and retested the new results twice. Haven't noticed this on other models. Weird, huh.
also this character reference was not in the 10 rerolls on 08-13
What's the main difference between Grok 2 and Grok 2 mini? (Besides the latency and throughput)
for the same price, use non-mini. While the mini version is very close to Grok-2 compared to other mini-versions and their counterparts, overall its just a bit less smart on many planes. During my bench they were about even 55 times, mini won 6 times and non-mini 22 times.
mini is 3x faster, that's the only benefit right now
boosting rate limits again
I noticed that the image input for Gemini 1.5 Pro is a too expensive on OR (around 2$ for 1k images), I hope it can get fixed
as far as I am aware, the pricing is just as an example, the actual amount u get charged is largely related to how big ur image is, which converts to a number of tokens
so I don't think neither numbers are representative of actual use cases
actually, every image gets tokenised to the same size (about 219 if a recall) if you use gemini
but pretty much all the other big models have pricing dependant on resolution, yes
I'll do a quick test to see if the image pricing is actually corrected
shit, I'm on the wrong thread, sorry guys
interesting
I feel like Grok this way is quite a bit less useful since it lacks the real time data it has on X
Or am I doing something wrong?
I don't think there's a wrong use case. It just depends on how and where you use it.
Like I mentioned in this thread before, I use Grok 2 for creative writing, and I don't need real time data when I'm asking it to write prose
I also tried using Grok 2 to translate prose yesterday, but alas, it got punctuation marks wrong, but yeah, it doesn't need real time data to do this, either
Generative AIs have a plethora of use cases
is grok support vision on OR?
No.
Its good, but why its so expensive?
Elon still has to pay off buying this social media thingy.
#announcements message
all xAI models have been taken down for maintenance temporarily. They will 404 for a few hours during the redeployment
mini is overpriced for the meme price, but grok-2 is actually fairly reasonable, just mildly below median in terms of price/performance.
👀
{"code":"Some requested entity was not found","error":"The model grok-beta does not exist or your team 7f00***-***-***-***-******2b1e does not have access to it. Please ensure you're using the correct API key. If you believe this is a mistake, please contact support and quote your team ID and the model name."},
See here -> #announcements message
Ok, i think couple hours is gone
Hopefully they are not pulling it out and changing the release date of their API to Coming soon...in next August
No more grok (for now) ->
Damn
Oh no
Oh.. no information from xAI?
Grok 2 is coming back soon, and it looks like they increased prices slightly ($5/m input, and $10/m output). grok 2 mini is not
Increased prices? its already cost almost like o1-mini and i don't think its even close to mini. Its not good, was a interesting llm
o1 mini costs FAR more. keep in mind you also get charged for the invisible thought tokens. o1 mini costs about (depends on use case) 3X of grok-2
o1-mini plays in a different league than grok though
not in my testing. it has great answers (just like preview), and totally flops others. the thinking can be counter-intuitive. programming, math? sure. following instructions, and reasoning it was worse in my testing.
unless you mean that grok-2 is in a way higher league, then I'd agree. here is only fails/refusals, ofc maybe I am missing something significant, so feel free to share your own test results!
is there anything about the ones above that you can share around what sort of thinking it is failing at? if I was in your office being interviewed by you, what kind of question are you putting me through on average?
my imagination is struggling to come up with this many areas that o1 screws up vs this or, even nemotron 3.1 which i hate at the moment
i mean, there are many areas. roleplay and intuitive tasks with little instructions is an example (random example, not part of any bench:)
either way, no matter the use case, o1-mini is not playing in any "different league" when we discuss price/performance, which is what started this comparison.
I think Grok 2 is down again
fixing
it's back!
thanks for flagging
it's a new API so i think they're sending us the wrong status codes sometimes - have an alert in place
Tried getting it to fetch real-time data:
To provide you with Elon Musk's most recent X posts (tweets), I would need to access real-time or near real-time data from X (formerly Twitter). Since my last update, current real-time access to X posts isn't available through the data I have. However, here's what you can do:
So I'm guessing that's a definite no on it being able to pull from X.
it probably just uses a search engine and pulls info from previews
hard to do that for X
On X it does real-time data. So I figured maybe the API version also does, but apparenrly not.
it's probably just a function call in their UI
hooked to their API
would be cool if it was like Pi with baked in web access, yea
https://help.kagi.com/kagi/ai/llm-benchmark.html
Grok 2 doesn't seem to score that high
Kagi Search Help
It's down again
Still down
Looking
Wow, sorry guys. They appear to still be sending down surprising status codes
It's fixed now, cc @snow basin
We'll special-case their API until they fix it, to help detect and avoid this
Funny to see - you have to prepay for their API then it seems? 😄
yeah they don't have invoicing yet, or autopay
fyi, xAI has asked us to rename it to grok-beta, so we'll be aliasing grok-2 to grok-beta soon
also, they raised the completion price from $10/m to $15/m
Oh wow
The context window increase is really nice
Now I can put it alongside other > 100,000 context models
Does anyone know the context and instruction template?
Really coherent writing model.
Generated story from outline, so each iteration it could see the story so far and a chapter prompt.
Much better performance than other available uncensored models I've seen at adhering to the task with a very long prompt.
Text quality is... okay.
What do you think is the closest open source model in terms of censorship level, and closest model in terms of text quality?
I'll think a bit and respond later. In terms of censorship level, R rated movie but not smut I think
More subtly, there is definitely some ethics steering
The new Grok API supports text completion https://docs.x.ai/api/endpoints#completions but it seems like OpenRouter is not currently routing to this?
I cannot find the chat prompt format anywhere, at all, though.
Interesting- yeah will ask them about their prompt format
I lowkey expect it to be just good ol' ChatML
They said it’s “Human: Hey, how are you?<|separator|>
Assistant: Good, how are you?<|separator|>”
seems like aliasing not working )) grok-2 404, grok-beta - OK
@tough thistle ^
is grok 2 and grok beta differ?
yep -- they are not the same model :d....
Grok go down?
sorry - beta
x-ai/grok-beta
part of being "beta" perhaps
though "Grok 2" points to beta?
Should be back up ow
hope you didnt hurt it when kicking
I had evaluated grok-2 earlier, but there was an error in the ranking calcuation #attachments message
I added an option to ignore all tasks in targeted censorship category as well as remove refusal-penalty (still a non-pass if outside of targeted testing tho), this should significantly boost the more censored families (anthropic, google, microsoft), and lower less restrictive models (grok, mistral, cohere, etc.)
Ever since maybe day or two days ago grok 2 keeps giving no response/generation error, I don’t know if the censorship has been cranked up like crazy or if something else is wrong
has it been check? @tough thistle
maybe their hosting have load problem for now
Do you have some finish_reason/generation_id that we can take a look at?
For now i don't have any problem with it after i test it
What front end you use? maybe that could also be the problem
Ye I just double checked my system prompt and it plus the input was to long and maxing the token limit
real dumb my bad
so it goes above 100K+ token, isn't it gonna be crazy expensive at that point.
oh its 100k? I thought it was 10k
I have no clue why its doing this then
its every 1/5 reguest or so it does it
i see, looks like it's a front end problem.
is there no way to put permanent cap to the context limit on your front end?
with sillytavern i can put it into 100,000 token context limit.
I usually leave it pretty high but it almost never goes on runaway dialogs
ill turn it down though from now on
Anyone know what this error mean?
(xAI) Provider returned error: {"code":"Some resource has been exhausted","error":"Too many requests: RejectLimits(LimitsInfo { id: Buf("/team:bb642ce4-5161-45c9-8f34-408850883602/u:92292b03-408f-4727-ae8f-7805f9bef76d"), req_type: Buf("/rt:grok-2-vision-1212-0.1.0"), actual: Values { rps: 0, rph: 200 }, expected: Values { rps: 1, rph: 200 } })"}
OR got ratelimited by xAI
they finally released weights for it lol
nice. grok-2 sounds old but it's actually quite competent (it performed GLM-4.5 level a year ago).
custom license
They updated the license. Commercial use for anyone, no criminal use via x.ais policies. The open weights model doesn't have vision though.
License says it is revocable
Elon musk called it "open source" which it isn't, and wouldn't even qualify for because of that "revocable" statement alone
And all that attached to an outdated model, I mean, this is just embarrassing
Damn. That sucks.
does it mean that the modal is not free from time to time ?
It means they can revoke the license and void your right to use the model
ar i understand now 😂... sorry for my stupidity