#grok-1 / grok-2

199 messages · Page 1 of 1 (latest)

last cave
vestal sage
#

would be awesome!

burnt bobcat
#

I want to try this ^^. Or even see some example's it's responses. Just saw it was released.

barren zealot
#

Is probably going to be interesting. But i am realy looking for a good fine tuned version. That will be a huge step forward!

copper sage
#

Looking forward to this aswell

sick harness
#

honestly it doesn't look promising. 314gb size for a model that's barely better than gpt-3.5

#

its extremely large so im not sure who would even finetune this

copper sage
sick harness
#

No clue what it was actually trained on

#

Welp that answers it

#

Im not gonna lie this just sounds like it would be a way more expensive to use/host gpt 3.5

copper sage
#

Hmm, basically useless then

#

The entire reason Grok seemed interesting was due to the Twitter data

#

Seems that resumes to be their selling point for X Premium

jade fox
#

pls add grok :}

#

+1

kind raptor
#

please add!

sleek cape
#

Has anyone here used it on Twitter Premium for roleplay or tasks other than tweet search?

subtle wharf
#

🥴

marsh patio
#

+1 to add grok plsPepeHappy

fiery mesa
#

that's not grok tho

dusty nimbus
#

apologies

sleek cape
wraith spade
#

it looks like they're the same cost, is that a mistake?

silent hollow
#

Grok 2 is pretty disappointing, doesn't really followe instructions very well.

sleek cape
#

they are indeed the same price

tardy cedar
digital ermine
#

less token output is the only angle i thought of

tardy cedar
digital ermine
#

what? no, i'm sure they're a reasonable sane person

joking aside can you imagine how much they spent and it's probably nothing special

tardy cedar
digital ermine
#

why does it go so well on lmsys? dubesor has it high on their table, but i'm not feeling it myself

yet to fiddle with system prompts

#

I am Grok, created by xAI. I'm here to answer almost anything with a bit of wit and humor, take an outside perspective on humanity, and strive towards maximal helpfulness! Remember, I'm not afraid of answering spicy questions that are rejected by most other AI systems. Let's keep the conversation as engaging and insightful as possible!
That's what I get out of the API

tardy cedar
#

These models are trained to stand out, people on LMSys like when they get new answers that look different than the rest. Dubesor's benchmarks are highly biased to non-refusal answers, which is also a feature of these models, which require no special skills.

pearl eagle
#

Which has also always been transparently stated, e.g. here

digital ermine
#

@pearl eagle when an output is over censored, how does it show in the pass/refine/fail/refusal count?
Argh don't worry, worked it out, sorry

pearl eagle
#

45% of all tested models have 0 or max 1 refusals across the entirety of my testing.
Punishing models who refuse, e.g. to calculate the lost cargo on colliding trains because it "feels uncomfortable" speculating in "scenarious involving dangerous situations" or won't write my Chat API file due to disagreements with the system prompt character baddie, then that is real life decreased usefulness to me - not bias.

#

in fact, its more than "less useful", because it often is accompanied by some preachings that I am not interested in, racking up the cost for no reason. (o1 refusals cost me like 5 or 10 cents sometimes)

sleek cape
#

grok-1 / grok-2

#

how are people feeling about grok 2 so far?

cloud nebula
#

Grok-2 seems... alright? Compared to the alternatives at the same price point, not mindblowing, not terrible, but pretty alright

#

I don't really understand the point of the mini model though. People commented on it above, but given the price points are the same, I just don't see a lot of cases where people would willingly use it over the regular model

hidden cargo
#

What? It's the same price?

#

That doesnt make sense

simple tiger
#

Why the same price? Very confuse...

sleek cape
#

It’s what xAI is charging us 🤷

#

We’ll reduce it when they do!

split whale
#

For creative writing it writes nice and different prose

#

And it seems that it's uncensored, which is a good thing

torpid moon
split whale
#

But for following instructions Sonnet 3.5 still wins, probably miles ahead

#

Especially when I just tried a system message of ~4,000 words

digital ermine
#

What's the max output anyone has got it to give?

pearl eagle
#

whats the exact grok-2 version offered by OR? I tested 08-13 and it was inferior.

pure lintel
pearl eagle
#

Some wizardry going on that I can't explain. this is only ~2 months apart, and I did run 5 outputs on each task to try to combat the inconsistency and retested the new results twice. Haven't noticed this on other models. Weird, huh.

#

also this character reference was not in the 10 rerolls on 08-13

sleek cape
#

Boosting rate limits now

#

By 2x

fresh cave
#

What's the main difference between Grok 2 and Grok 2 mini? (Besides the latency and throughput)

pearl eagle
sleek cape
#

mini is 3x faster, that's the only benefit right now

sleek cape
#

boosting rate limits again

pure lintel
#

I noticed that the image input for Gemini 1.5 Pro is a too expensive on OR (around 2$ for 1k images), I hope it can get fixed

wet pine
#

so I don't think neither numbers are representative of actual use cases

pure lintel
#

but pretty much all the other big models have pricing dependant on resolution, yes

#

I'll do a quick test to see if the image pricing is actually corrected

#

shit, I'm on the wrong thread, sorry guys

smoky cove
#

I feel like Grok this way is quite a bit less useful since it lacks the real time data it has on X

smoky cove
#

Or am I doing something wrong?

split whale
#

I don't think there's a wrong use case. It just depends on how and where you use it.

#

Like I mentioned in this thread before, I use Grok 2 for creative writing, and I don't need real time data when I'm asking it to write prose

#

I also tried using Grok 2 to translate prose yesterday, but alas, it got punctuation marks wrong, but yeah, it doesn't need real time data to do this, either

#

Generative AIs have a plethora of use cases

compact edge
#

is grok support vision on OR?

tardy cedar
latent gorge
#

Its good, but why its so expensive?

tardy cedar
sleek cape
#

#announcements message

#

all xAI models have been taken down for maintenance temporarily. They will 404 for a few hours during the redeployment

pearl eagle
compact edge
#

👀
{"code":"Some requested entity was not found","error":"The model grok-beta does not exist or your team 7f00***-***-***-***-******2b1e does not have access to it. Please ensure you're using the correct API key. If you believe this is a mistake, please contact support and quote your team ID and the model name."},

tardy cedar
latent gorge
#

Ok, i think couple hours is gone

sleek cape
#

Yeah, they still haven't redeployed the models yet

#

not sure what's going on

split whale
#

Hopefully they are not pulling it out and changing the release date of their API to Coming soon...in next August

tardy cedar
#

No more grok (for now) ->

split whale
#

Damn

latent gorge
#

Oh no

compact edge
#

Oh.. no information from xAI?

sleek cape
#

Grok 2 is coming back soon, and it looks like they increased prices slightly ($5/m input, and $10/m output). grok 2 mini is not

latent gorge
pearl eagle
tardy cedar
pearl eagle
#

not in my testing. it has great answers (just like preview), and totally flops others. the thinking can be counter-intuitive. programming, math? sure. following instructions, and reasoning it was worse in my testing.

#

unless you mean that grok-2 is in a way higher league, then I'd agree. here is only fails/refusals, ofc maybe I am missing something significant, so feel free to share your own test results!

digital ermine
#

is there anything about the ones above that you can share around what sort of thinking it is failing at? if I was in your office being interviewed by you, what kind of question are you putting me through on average?

#

my imagination is struggling to come up with this many areas that o1 screws up vs this or, even nemotron 3.1 which i hate at the moment

pearl eagle
#

i mean, there are many areas. roleplay and intuitive tasks with little instructions is an example (random example, not part of any bench:)

#

either way, no matter the use case, o1-mini is not playing in any "different league" when we discuss price/performance, which is what started this comparison.

sonic heron
#

I think Grok 2 is down again

sleek cape
#

fixing

#

it's back!

#

thanks for flagging

#

it's a new API so i think they're sending us the wrong status codes sometimes - have an alert in place

smoky cove
#

Tried getting it to fetch real-time data:

To provide you with Elon Musk's most recent X posts (tweets), I would need to access real-time or near real-time data from X (formerly Twitter). Since my last update, current real-time access to X posts isn't available through the data I have. However, here's what you can do:

So I'm guessing that's a definite no on it being able to pull from X.

river glen
#

hard to do that for X

smoky cove
#

On X it does real-time data. So I figured maybe the API version also does, but apparenrly not.

river glen
#

hooked to their API

smoky cove
#

Yup think so too

#

Shame though, that makes the API version far less useful

river glen
visual jungle
sonic heron
#

It's down again

smoky cove
#

Still down

sleek cape
#

Looking

#

Wow, sorry guys. They appear to still be sending down surprising status codes

#

It's fixed now, cc @snow basin

#

We'll special-case their API until they fix it, to help detect and avoid this

smoky cove
sleek cape
#

yeah they don't have invoicing yet, or autopay

#

fyi, xAI has asked us to rename it to grok-beta, so we'll be aliasing grok-2 to grok-beta soon

sleek cape
#

also, they raised the completion price from $10/m to $15/m

visual jungle
#

Oh wow

split whale
#

The context window increase is really nice

#

Now I can put it alongside other > 100,000 context models

static summit
#

Does anyone know the context and instruction template?

fiery mesa
#

Really coherent writing model.
Generated story from outline, so each iteration it could see the story so far and a chapter prompt.
Much better performance than other available uncensored models I've seen at adhering to the task with a very long prompt.
Text quality is... okay.

wet pine
#

Which makes sense

#

I hate how L3.1 sanitized a lot of its training data

sleek cape
#

What do you think is the closest open source model in terms of censorship level, and closest model in terms of text quality?

fiery mesa
#

I'll think a bit and respond later. In terms of censorship level, R rated movie but not smut I think

#

More subtly, there is definitely some ethics steering

still hazel
sleek cape
#

Interesting- yeah will ask them about their prompt format

river glen
#

I lowkey expect it to be just good ol' ChatML

sleek cape
#

They said it’s “Human: Hey, how are you?<|separator|>

Assistant: Good, how are you?<|separator|>”

compact edge
#

seems like aliasing not working )) grok-2 404, grok-beta - OK

sleek cape
#

@tough thistle ^

brave forge
tough thistle
iron oak
#

Grok go down?

tough thistle
#

lookig

iron oak
#

sorry - beta

#

x-ai/grok-beta

#

part of being "beta" perhaps

#

though "Grok 2" points to beta?

tough thistle
#

Should be back up ow

iron oak
#

hope you didnt hurt it when kicking

dry drift
#

I had evaluated grok-2 earlier, but there was an error in the ranking calcuation #attachments message

pearl eagle
proper tulip
#

Ever since maybe day or two days ago grok 2 keeps giving no response/generation error, I don’t know if the censorship has been cranked up like crazy or if something else is wrong

brave forge
#

maybe their hosting have load problem for now

tough thistle
#

Do you have some finish_reason/generation_id that we can take a look at?

brave forge
#

For now i don't have any problem with it after i test it

#

What front end you use? maybe that could also be the problem

proper tulip
#

Ye I just double checked my system prompt and it plus the input was to long and maxing the token limit

#

real dumb my bad

brave forge
proper tulip
#

oh its 100k? I thought it was 10k

#

I have no clue why its doing this then

#

its every 1/5 reguest or so it does it

brave forge
#

i see, looks like it's a front end problem.

proper tulip
#

I think Im just going to switch models

#

thanks for the help

brave forge
#

is there no way to put permanent cap to the context limit on your front end?

#

with sillytavern i can put it into 100,000 token context limit.

proper tulip
#

I usually leave it pretty high but it almost never goes on runaway dialogs

#

ill turn it down though from now on

brave forge
#

Anyone know what this error mean?

(xAI) Provider returned error: {"code":"Some resource has been exhausted","error":"Too many requests: RejectLimits(LimitsInfo { id: Buf("/team:bb642ce4-5161-45c9-8f34-408850883602/u:92292b03-408f-4727-ae8f-7805f9bef76d"), req_type: Buf("/rt:grok-2-vision-1212-0.1.0"), actual: Values { rps: 0, rph: 200 }, expected: Values { rps: 1, rph: 200 } })"}

acoustic abyss
pearl eagle
#

nice. grok-2 sounds old but it's actually quite competent (it performed GLM-4.5 level a year ago).

vocal magnet
#

custom license

subtle island
#

They updated the license. Commercial use for anyone, no criminal use via x.ais policies. The open weights model doesn't have vision though.

wary warren
#

Elon musk called it "open source" which it isn't, and wouldn't even qualify for because of that "revocable" statement alone

#

And all that attached to an outdated model, I mean, this is just embarrassing

subtle island
#

Damn. That sucks.

wary warren
shut zephyr
wary warren
shut zephyr
digital ermine
#

wait no

#

is this the first model on OR to have a closed release and then potentially go "open"?