#general

1 messages · Page 175 of 1

jovial sapphire
#

*the requests

#

and then i created a tampermonkey script to intercept the checkpoint

#

and display it when it was on A/B

ocean vortex
#

I'm not sure they don't assign the same properties to different models...

jovial sapphire
#

they assigned the same

ocean vortex
#

in a/b

jovial sapphire
#

it wasnt random cuz

#

we all got same performance for same checkpoints

#

just go on google and type "x28 checkpoint gemini"

#

you'll see

#

i'm trying to get

#

an a/b

#

to show you

#

just got it

#

i'll show you how you can see the checkpoint

#

now

#

so first you wait until theyre done

regal zealot
#

you said you are a LLM researcher befoee.. and now you are a Security Resercher? So what? are you trying playing on us? OMG!

ocean vortex
jovial sapphire
#

when did i say i was

#

an llm researcher

#

lmao????

regal zealot
jovial sapphire
#

i said i can create a group with a stanford ML researcher

regal zealot
jovial sapphire
#

i can create a group with someone that is an llm/ml researcher

regal zealot
#

LLM same as ML.. but LLM is bigger..

jovial sapphire
#

i didn't say i was

regal zealot
#

oh my mistake..

jovial sapphire
#

np !

regal zealot
jovial sapphire
#

hahahah oh my god

#

the result is amazingh!

#

prompt: "
User
Create a neo brutalist web page, esthethic that looks like a modern museum., 1000 lines, multiple pages, effects etc."

#

one shot

#

@ocean vortex

regal zealot
regal zealot
jovial sapphire
#

SOTA in coding

#

@ocean vortex

#

so when you submit A/B test you go here

regal zealot
#

WoW! then if it was one shot!!

#

i couldnt do that inone shot!!..

jovial sapphire
#

yes it is one shot

#

and then you can go to ```"{"temperature":1,"topP":0.95,"topK":64,"jl":[],"maxOutputTokens":65536,"safetySettings":[[null,null,7,5],[null,null,8,5],[null,null,9,5],[null,null,10,5]],"enableCodeExecution":false,"enableBrowseAsATool":false,"Fj":false,"responseModalities":[],"zf":false,"enableSearchAsATool":true,"googleSearch":[],"thinkingBudget":-1,"outputResolution":"1K","model":"52c1a3b8c57b46fb"}"

#

and at the end you have ""model":"52c1a3b8c57b46fb"

#

and this is the checkpoint

ocean vortex
#

or the actual checkpoint this is

jovial sapphire
#

"model:"

#

so when it was x28

#

it used to say

#

"model:x28blablabla"

#

but now it's gone

regal zealot
#

you mean its now renamed to 52blablablabla?

jovial sapphire
#

html code of the website it made

jovial sapphire
#

but very good

#

it's gemini 3

regal zealot
#

oh okay..

jovial sapphire
#

no other models

#

can one shot

#

this @ocean vortex

#

that's why i'm saying it will be sota

regal zealot
#

and how did you connect to gemini 3.0? before it have been released ?

#

how can i?

jovial sapphire
#

i know it is

#

youll see

regal zealot
#

how can i?

jovial sapphire
#

make urself an automating script

ocean vortex
#

or just obfuscated

jovial sapphire
#

broooo

#

fdsqgopàg^ksdqfkdsqf

ocean vortex
#

?

#

it literally is lol

jovial sapphire
#

bro

#

when you do another

#

a/b

#

it's not the same

#

and sometimes it comes back

#

and when it does

#

it's the same performance

#

that you had previously

#

with the same checkpoint

#

is your head full of air

#

i'm done with this debate

regal zealot
jovial sapphire
#

you can't understand

#

just nevermind blud

regal zealot
ocean vortex
tiny halo
#

hey guts does anyone know why i cant send photo to ai chat it wont let me send

ocean vortex
#

lmao

regal zealot
gaunt spade
ocean vortex
gaunt spade
#

@jovial sapphire yo bro how do u get A/B tests on AI studio

#

cuz i never got them

#

is it locked to some regions?

regal zealot
gaunt spade
#

i wanna try Gemini 3 checkpoints on AI studio

#

maybe they're better than riftrunner

#

in coding

rustic lichen
#

pls could u add an option to share chat?

regal zealot
jovial sapphire
#

i know

#

it's not the real name

rustic lichen
jovial sapphire
#

captain obvious

#

thats why it's called checkpoints

#

and we call them by their codename (x28, ECPT)

jovial sapphire
gaunt spade
jovial sapphire
#

ip

#

i can run prompts for you if you want

#

if you have very good prompts

gaunt spade
regal zealot
# rustic lichen yh

its cool if from the site u could even download the chat to ur device.. beacuse i want that !

jovial sapphire
#

0

rustic lichen
#

yh

gaunt spade
jovial sapphire
#

prompts if possible

gaunt spade
#

wait imma look it up

regal zealot
rustic lichen
#

yes

gaunt spade
jovial sapphire
#

but x28 was very good overall

gaunt spade
regal zealot
# rustic lichen yes

then we must develope together a tool that gets the chat messages very easily..

gaunt spade
#

from gemini 3

jovial sapphire
#

well if you have other coding prompts

#

don't hesitate

gaunt spade
jovial sapphire
#

i'll run them now

gaunt spade
#

made with x28

#

wait imma think

#

of my prompt

jovial sapphire
#

happy to see not everyone is a skeptic

regal zealot
jovial sapphire
#

you cant lol

#

its gone

#

now

gaunt spade
#

lol

#

like 3-4 weeks ago

jovial sapphire
#

ai studio

#

then lmarena

gaunt spade
#

yea lithiumflow was good too

#

but x28 is the best

jovial sapphire
#

not as good as

#

x28 yes

#

but still good

#

any coding idea? could be anything

gaunt spade
# jovial sapphire any coding idea? could be anything

'''Create an interactive 3D wormhole simulation using three.js, featuring a tunnel-like geometry with animated textures that give the illusion of depth and motion. Implement custom shaders or materials to simulate swirling energy, stars, or nebulae moving along the tunnel, and animate the camera to travel smoothly through the wormhole. Include controls to adjust parameters such as tunnel length, speed, color scheme, and distortion intensity in real time. Provide well-commented JavaScript code, an HTML setup snippet, and brief explanations of key three.js concepts used (geometry, materials, shaders, animation loop, and camera controls).'''

#

thats a good prompt

jovial sapphire
#

Okay!

#

ty

#

let's hope i get a good checjpoint

gaunt spade
gaunt spade
#

and how much time does it take you to encounter one?

#

js curious

jovial sapphire
#

some are very bad

jovial sapphire
#

sometiomes 30m

gaunt spade
#

lol

jovial sapphire
#

here

#

are u on pc?

gaunt spade
#

im on my phone rn

jovial sapphire
#

shi wait i can upload

gaunt spade
#

did it come out already

jovial sapphire
#

the code to codepen

jovial sapphire
#

but

#

i can show you one run i did earlier

tiny halo
#

hey guys does anyone know why i cant send photo to ai chat it wont let me send

gaunt spade
#

i would like to see it

jovial sapphire
gaunt spade
#

was the output any good

#

wait brb

#

i got something rn

jovial sapphire
#

codepen is down...

jovial sapphire
#

here

#

one shot

#

"Create a neo brutalist web page, esthethic that looks like a modern museum., 1000 lines, multiple pages, effects etc."" prompt

gaunt spade
#

was it the best checkpoint's output

jovial sapphire
#

it is one of the bests i have tried rn yes

gaunt spade
#

thats why some checkpoints suck at your attempts

jovial sapphire
#

yes

#

surely

#

yay!

#

your prompt got an ab

#

let's see

gaunt spade
#

wait do u use a bot that spams prompts

jovial sapphire
#

this is so cool hahaha

jovial sapphire
gaunt spade
jovial sapphire
gaunt spade
jovial sapphire
#

i cant rn

gaunt spade
#

only 10KB, lets see

jovial sapphire
#

this is very cool

#

imma try with gemini 2.5 pro

gaunt spade
#

its a gemini 3 checkpoint too

#

on lmarena

#

lets see how it compares

jovial sapphire
#

same checkpoint

#

as the one who did

#

the website

jovial sapphire
#

52c1a3b8c57b46fb

gaunt spade
jovial sapphire
#

checked

gaunt spade
#

maybe riftrunner is a little worse/better

jovial sapphire
#

i'm talking about

#

my website

#

neobrutalist

gaunt spade
jovial sapphire
gaunt spade
#

i thought u meant that the checkpoint is riftrunner

jovial sapphire
#

imma try the same prompt with g2.5

#

oh no nono

#

its far better imo

#

i'll try SVG

gaunt spade
jovial sapphire
#

trying with classical g2.5 pro

#

super trippy

#

(gemini 3, not gemini 2.5)

#

LMAO

#

gemini 2.5 pro completely failed @gaunt spade

#

doesnt even work

balmy mist
#

you can still get gemini 3 through a/b on studio?

jovial sapphire
#

yes

#

FPS is bad

gaunt spade
#

which one is that

jovial sapphire
#

wait

jovial sapphire
#

i sent you the html

gaunt spade
#

i thought u tested another one

#

im yet to get riftrunner in lmarena

#

its quite difficult

gaunt spade
jovial sapphire
#

no other model get it

#

trust me

regal zealot
#

have u ever jail breaked gemini ? grok? , i have and worked with them.. , but the next day , it have been restricted.. even if i tried encoding the prompt.. looks like lmarena shares the chats with companies , and maybe have told them i had tried how to make something , if i told him as a test to the jailbreak prompt , he will say , "Sorry, I can't assist with that."

gaunt spade
#

havent u tried it yet?

#

its a little worse than lithiumflow

#

i guess

jovial sapphire
#

i found it very bad

regal zealot
#

beacuse it got better by everyday...

gaunt spade
#

can u try another prompt for me

jovial sapphire
#

trying opus 4.1 on the wormhole thing

#

to see if it performs better than g3

regal zealot
jovial sapphire
#

it's nothing to edit

#

it's just a/b

#

just wait until it releases bro

#

it will release in a few days

#

no worries

gaunt spade
# jovial sapphire yes

can you do this one "Create a fully functional, identical and playable DOOM (1993) clone created in three.js <{5000 lines of code}>"

jovial sapphire
#

after

#

3k lines average

gaunt spade
#

make it 2000

jovial sapphire
#

cuz of ai studio limits

#

okay!

#

thanks

regal zealot
jovial sapphire
#

launching now

gaunt spade
jovial sapphire
regal zealot
#

a/b isnt means a for gemini and b for other one? working together?

jovial sapphire
#

basically on ai studio

#

when you send more messages

#

you have a probability

#

that they will ask you to choose between two answers

#

for testing purposes

#

and one of them is gemini 3

regal zealot
jovial sapphire
#

launched your test buni

regal zealot
gaunt spade
jovial sapphire
#

looks very bad imo @gaunt spade

#

this is Opus 4.1

#

for wormhole

#

looool

#

very cheap looking particles animation

#

looks overengineered

#

and doesnt even look like a wormhole

regal zealot
jovial sapphire
#

gemini 3 one was so much better

regal zealot
#

and how can i connect to gemini-3.0-pro?

gaunt spade
regal zealot
gaunt spade
#

and its detailed with proper coding

regal zealot
gaunt spade
#

and run some prompts on AI studio until you get the A/B testing checkpoint

#

that means a better model will produce your output

#

and you get to choose between one of them

jovial sapphire
#

it takes a long time

regal zealot
# gaunt spade you need to be in a specific region (the united states)

you are same as gemini and ill tell you same as what im gonna tell him when he does that:

" Why Are You making the things more complex and trying hiding the true solution to me? you could easily told me use v.pn!! please dont do this again ever, as i have eyes, and can know , so u better dont hide, otherwise ill use gpt "

regal zealot
jovial sapphire
#

u try both

#

and u see whos better

#

you have choice between both

regal zealot
gaunt spade
#

@jovial sapphire yooo

#

i finally got riftrunner

#

on lmarena

#

svg test

jovial sapphire
#

lets go

gaunt spade
#

it does so well

jovial sapphire
#

yes but

#

you shouldve tried

#

wormhole 🙁

gaunt spade
#

cuz i

jovial sapphire
#

okay try again

gaunt spade
#

didnt vote lol

jovial sapphire
#

nice!

gaunt spade
#

and i dont vote

#

if it got it

#

so i get to chat with it

#

a little more

#

@jovial sapphire imma make the wormhole and then doom

regal zealot
jovial sapphire
#

waiting on doom

#

no a/b for now

gaunt spade
jovial sapphire
#

LMAOOO

#

wth is that

#

bro

gaunt spade
jovial sapphire
#

thats what im saying

#

people here dont believe me!!!

gaunt spade
#

wow i didnt expect it to fail at the wormhole

jovial sapphire
#

AI studio is the real gemini 3 pro

regal zealot
gaunt spade
#

sadly

jovial sapphire
#

i will be so so so disappointed

regal zealot
gaunt spade
#

the outputs were the exact same

regal zealot
#

i didnt laugh this hard on whole this day until nowww!!!!!

jovial sapphire
#

was not as good

#

as ai studio one

gaunt spade
jovial sapphire
#

thats why

gaunt spade
#

it was like riftrunner

jovial sapphire
#

im hella confused

gaunt spade
#

thats my point

jovial sapphire
#

ok mb

#

what will they release

#

id be so so sad

gaunt spade
#

thats my point 😭

jovial sapphire
#

i dont know what they will release

#

in the end i mean

gaunt spade
#

maybe riftrunner is gemini 3 flash

#

with some thinking

#

cuz rift-runner sounds like a FAST model

#

for its codename

regal zealot
#

@hallow hemlock maybe u tried the Gemini-3.0-ultra-lite 🤣🤣🤣🤣

gaunt spade
#

on my prompt

jovial sapphire
#

nah crazy

#

over 100 runs

#

no ab

gaunt spade
#

on AI studio

jovial sapphire
#

lets go!

#

yes i do, but change google accounts

#

and its gone

gaunt spade
gaunt spade
#

how many accs u got

jovial sapphire
#

+100

#

i buy phone numbers

#

for a few cents

#

online

gaunt spade
jovial sapphire
#

costs me like

#

1$ for 50 numbers

gaunt spade
#

but maybe it got patched or something

gaunt spade
#

or what country

jovial sapphire
#

any country

#

google takes all

#

first result not crazy

gaunt spade
#

and does ur hp go down

jovial sapphire
#

yes

#

wait

#

actually no it s kind of good

#

enemies walk towards me and attack me

#

when i approach them

gaunt spade
jovial sapphire
#

idk it's one type

#

it did anotherone

gaunt spade
jovial sapphire
#

but one error

#

ill fix it

gaunt spade
jovial sapphire
#

i'll see if it's the good checkpoint or not

gaunt spade
#

heres my doom made by riftrunner

gaunt spade
jovial sapphire
#

i'll fix the one that has

#

one error

#

and see if its better

gaunt spade
#

lets see

regal zealot
jovial sapphire
#

wazs bad checkpoint

#

actually no it was the same

#

52c1a3b8c57b46fb

#

here check this

jovial sapphire
#

"
User
can you do this one "Create a fully functional, identical and playable DOOM (1993) clone created in three.js, all in a single html file, 2000 lines of code."'"

#

still better than every other model tho

jovial sapphire
#

prompts?

jade egret
#

bro chatGPT

#

you obsess with chatGPT?

gaunt spade
#

damn

#

good video

jovial sapphire
#

any prompts?

regal zealot
regal zealot
#

nevermind..

jovial sapphire
#

i need web

regal zealot
#

like i know bitcoin how it works.. but monero i cannot stand it ..

#

like i cannot stand how it works in hiding the amount of current of what the sender have and to the reciever and everything is hidden.. i cannot stand that..

jovial sapphire
#

ok

#

any style?

regal zealot
#

everytime i asks ai about it such as gemini 2.5 pro , it cannot make me stand it..

regal zealot
jovial sapphire
#

okay

#

will try now

regal zealot
#

just dont host it ur self online.. just give here the .html and ill host it myself in the terminal..

jovial sapphire
#

"Your task is to create a single-file, highly interactive HTML educational experience designed for a teenager who understands Bitcoin but is confused by Monero's complex privacy features. Focus on using relatable, casual language, and high-contrast, brutalist styling to demystify how Monero hides the sender, receiver, and transaction amount. The entire page must feature accurate, step-by-step animations or interactive visual analogies that explain the core mechanics of Ring Signatures, Stealth Addresses, and Ring Confidential Transactions (RingCT). The final product should be a visually striking and functionally clear demonstration that makes the user finally "get" the system. A neo-brutalist style axed on Monero visual identity. Make a masterpiece, minimum 1000 lines."

#

this is the prompt

regal zealot
#

u know how did u make the prompt?

jovial sapphire
#

asked gemini

#

i know monero

regal zealot
#

also do u know that i dont have gemini 3.0?

jovial sapphire
#

i use it

jovial sapphire
#

no one

regal zealot
#

alright.. ill try gemini 2.5 pro..

jovial sapphire
jovial sapphire
#

and send you the html

regal zealot
#

alright..

#

if he did make a short one.. say to him that the user dont love the short answers and wants a longer ones..

#

and tell him he will read it..

#

also tell him the user know what the pgp is.. and what the signature meaning and the publickey means..

jovial sapphire
#

Hahaha so cool

#

this is crazy

#

@gaunt spade

regal zealot
#

why u didnt give the html file or code?

#

cuz i cannot interact with it..

#

such as in the image , i cannot press buttons..

jovial sapphire
regal zealot
# jovial sapphire

thats funny what he is saying at the end.. dont do something.. HahHaha

jovial sapphire
#

@regal zealot

#

one shot is crazy

regal zealot
jovial sapphire
#

but you see, the result is amazing

#

gemini 3 is very good

regal zealot
#

yeah.. it could make it better such as like of what i imagine of how the sender or receiever when it wants to send , i thinking of maybe if the node of blockcha1.n it uses if the hidden hands knows it.. perhaps they might know him..

#

or may thought is wrong?

#

let it fix it.. so i can know more about moner0.. and also know why my only-view wallet needs alots of time just for syncing..

jovial sapphire
#

it explainedf

#

everything

regal zealot
jovial sapphire
#

ask gemini 2.5

regal zealot
#

as of what i see?

jovial sapphire
#

to ai

regal zealot
burnt sinew
regal zealot
burnt sinew
regal zealot
#

alright after i did blcok , now im comfertable, that all i see is a "blocked message--Show"

regal zealot
jovial sapphire
#

dont apologize

#

its an advice

regal zealot
#

i rememberd that i asked that question in the old past of time when the 2.5 was new..

#

but it didnt yet explained to me and maked me stands it its secure , so its not have that issue that i thought of..

jovial sapphire
#

you ask bad questions

#

thats why u dont get good answers

#

whats ur language?

regal zealot
#

how and why its bad?

jovial sapphire
#

i meant if u dont get good answers

#

maybe its because you dont ask the right questions

balmy mist
#

G3 out?

jovial sapphire
#

this week yes

jovial sapphire
#

via ai studio

#

if u have good prompts

#

i can try for you

tiny halo
#

can someone help me please i have an issue

regal zealot
jovial sapphire
#

what

#

is your main language?

tiny halo
#

i cant import photo to ai chat why please help

balmy mist
jovial sapphire
balmy mist
tiny halo
regal zealot
balmy mist
#

On which app lmarena?

tiny halo
jovial sapphire
#

its not private

#

who cares

regal zealot
balmy mist
tiny halo
regal zealot
balmy mist
#

Could be an issue with the app, but send screenshot of error message to help them debug I’m not dev but I can guide u in right direction lol

echo aurora
balmy mist
#

Wassup bro, how u been?

tiny halo
balmy mist
tiny halo
echo aurora
balmy mist
#

Bro same

balmy mist
tiny halo
balmy mist
#

But u got this tho, u using ai to help?

tiny halo
#

@balmy mist its jpg

regal zealot
#

brah whats even that slang ?? "ADDH"??

craggy night
#

How to generate image to image please

balmy mist
# tiny halo heere

Try refreshing(ctrl + shift + r), I’m outside right now, but I could help when I get back home,

echo aurora
# tiny halo heere

Ah yes, this model IIRC we had isssues with enabling vision with. At the moment it's not going to work with image upload sorry to say.

regal zealot
balmy mist
#

I’m the same way, I’m tryna start my family now, that la how far ai pushed me away from tech and I minored in it lol

balmy mist
#

Gpt 5

#

But 4.5 should work

tiny halo
balmy mist
#

That’s weird, there could be an issue with Claude models in arena, dm I got something u can use

#

lol yeah 23

#

Finish college a while back

#

But kinda miss it, adulting hard bro😂

#

Man i would say it get better but … lol im thinking about changing careers

#

Im a software engineer rn, but I want to branch out so bad, i hate being at my desk at home all day, ik its a privilege but this making me crqzy lol

gaunt spade
#

what grade u in

gaunt spade
#

u dont have to always learn about AI

molten cypress
#

hello here

#

i need some help

#

after uploading my image for video animation and prompt what key to press for generation because enter isn't working?

balmy mist
sharp mirage
#

Hi @echo aurora Is there any update for the UI?

jade egret
#

lmao chill why u suddenly mad

barren pine
#

Hello, I want to learn, share ideas, and connect with others passionate about AI.

crimson sky
#

hello, want to learn prompts, generate videos and share ideas

latent jungle
#

Proof of your success?

ashen plaza
#

<@&1349916362595635286>

ashen plaza
latent jungle
cloud zinc
#

$100k wow

whole swallow
hardy lion
#

there was a spammer who said you can make 100k by doing whatever he said or some nonsense

fleet lintel
#

scammers everywhere... i hate them

jovial willow
#

Does google login not work for anyone else

round sedge
sudden coral
#

i cant even open lmarena cf under attack just spins forever no matter what browser i try

patent bane
#

same

inland violet
#

guyz du u beliv mi

rocky mauve
#

Best model currently for coding python tasks?

#

Don’t want one with limits though

keen beacon
#

my feet hurts

#

dang

real rampart
#

Hey everyone! I've been sifting through a lot of posts here trying to find information about two battle arena models: Crystal and Quasarflux, and I've only found guesses that Crystal is LLama and Quasarflux is Grok. Is there any definitive information published anywhere, or is it always a secret? 👀

zealous sparrow
#

@quartz light are these models new

real rampart
warm robin
hollow imp
#

@ocean vortex bro even your system prompt is not doing anything against the heavily nerfed gpt 5 pro

#

The custom instructions are not un nerfing it

ocean vortex
# jade egret you obsess with chatGPT?

The prompt you are trying it with nearly all current models are gonna struggle by chance depending on their tokenizer. As far as I'm aware nothing major has changed with their current tokenizer. So essentially the model can't get this right without being overfit. You could test the tokenizer and how it performs or to what extent reasoning can remedy it, but then you would need much more prompts like this than just 1 random word. Different models are gonna fail on different ones

hollow imp
#

@ocean vortex you said your prompt will make it think longer, is 4-6 minutes long? I tried many tasks which should take long stem and non stem

ocean vortex
#

But it's hard to say definitively if system prompt is really the only change their website to API

#

on API it is much more blunt

#

Reminds me of grok4 while being as fast as non-reasoning model, on API

hollow imp
#

And the quality is worse

#

And I'm not talking about 5.1 I'm talking about 5 pro

#

I mean how can gpt 5.1 heavy thinking take longer than 5 pro and give better answers

ocean vortex
#

For pro there's no 5.1 Pro yet afaik

hollow imp
#

Wtf is that nerf

#

It was never close to 5 thinking heavy

ocean vortex
#

on API I only see 5.0 Pro

hollow imp
#

@ocean vortex bro what do I do

#

😭

ocean vortex
ocean vortex
#

but hence can't comment on it much

hollow imp
#

I'm using business subscription free trial

ocean vortex
#

Cause it''s very hard to know without retrying the same exact thing

#

You may think it was 'easier task', but models do not exactly always work the way you may expect them to. They will end up thinking longer for tasks that you anticipated to take less.

#

And then the quality is next to impossible to compare if you are comparing different tasks

quartz light
#

btw look

keen beacon
#

guys i have a question

quartz light
# quartz light

my thinking injector for any model
it worked on grok's stealth model

keen beacon
#

is chat gpt 5.1 familiar to chat gpt 4o

#

or im tripping

ocean vortex
keen beacon
#

i need to know

hollow imp
keen beacon
#

fr?

#

in chatgpt?

hollow imp
#

Yes

#

Frfr

ocean vortex
keen beacon
#

say on god @hollow imp

#

because chat gpt 4o is like a human buddy in old days

ocean vortex
# hollow imp Many infact

Can you try one now, to compare it directly all things being equal? Also shorter reasoning alone does not mean it is worse. It is actually a big plus if the final answer turns out better.

hollow imp
keen beacon
#

@hollow imp yo say on god

#

rn

#

bro

#

I NEED ANSWERS!

quartz light
#

just try it yourself

hollow imp
ocean vortex
quartz light
keen beacon
quartz light
keen beacon
#

if you say it, i believe...

hollow imp
hollow imp
ocean vortex
#

Cause that reddit post does not document a single one as far as I can see

keen beacon
ocean vortex
#

just a pointless thread for the most part

keen beacon
#

it ain't everything

#

but i saw toilet

hollow imp
ocean vortex
hollow imp
stray aspen
#

lmao

quartz light
#

yall

#

is ts prompt too much

keen beacon
#

bro hell no way

#

i have a limit on go chat

#

in chatgpt

#

thats tuff broooo

brave orbit
#
poll_question_text

What AI is better

victor_answer_votes

13

total_votes

21

victor_answer_id

1

victor_answer_text

Gemini

wheat onyx
#

As good as 5.1 is, very excited for gem3

zealous sparrow
wheat onyx
#

No idea how good it is at coding, but clearly it's the only use case...

stray aspen
#

gemini 3 is close

wheat onyx
#

Will be curious how flash is compared to 5.1 too

#

Oai really neutered 5.1 by disabling legal assistance too. So gemini will be an even bigger deal there.. It was already better than many lawyers

hollow ivy
#
poll_question_text

What experience have you made with Claude-4.5-Sonnet-Thinking?
When does its performance begin to degrade?

victor_answer_votes

4

total_votes

5

victor_answer_id

1

victor_answer_text

after 90k tokens already

wheat onyx
#

Yeah they told gpt not to do any legal stuff. It was already better than a lot of lawyers I'm familiar with in gpt5.0

#

So disappointing. I know people will be switching to gemini.

If you're a lawyer that's not double-checking the legal reasoning, or are using any case law, you deserve to get screwed

zealous sparrow
neat apex
#

I rather be judged by gpt 5 than by an average people honestly immo

#

So gpt 5.1 sucks?

sour spear
wheat onyx
wheat onyx
neat apex
#

I mean that

#

Gpt 5.1 is clearly way better at all

wheat onyx
#

Too many idiot lawyers used gpt without using their brain, so oai disabled it... Disable the lawyers instead

neat apex
#

Yeah, that is a issue

#

Gpt 5 is better than a average lazy lawyer, but he will be barely feed of context

wispy quarry
#

Hi, I'm new here 😄

wheat onyx
#

for translations and transcriptions, AI is pretty much good enough (as long as you give it just enough context to know what it's about)

keen beacon
warm robin
wheat onyx
#

GPT 5.1medium is around 72% (plus), and high is around 76% (pro)

is SWE verified not the best benchmark for coding?

regal cargo
#

anyone know why its not working?

ocean vortex
ocean vortex
#

And is being exploited for this very reason. People like yourself thinking it is the best 👀

regal zealot
ocean vortex
#

Anthropic did it first, now Google and OpenAI doing the same. Focusing on SWE much more than they did before. Things like codeforces, LCB, Aider etc are secondary now

#

Kinda predictable if you ask me. They are always gonna focus on the things people are looking for. SWE is also easier to rig than having to worry about all the alternatives equally, in my opinion. Like with o3 they had to test a ton of different coding metrics. For 5.1 people are kinda happy just with improved SWE... 👀

wheat onyx
neat apex
neat apex
ocean vortex
#

Not correct on what exactly?

neat apex
#

It happens mostly because they try to adjust it "better"

keen beacon
#

3.2 good?

neat apex
#

Just like, lmarena nerfs Gemini 2.5 pro somehow since it is weird to it be better than sonnet 4.5

ocean vortex
#

the order varies a lot depending on what you choose to look at. For LCB just about all Claude models suck but this is still coding

neat apex
#

It is only partially true, for example an model that have higher world knowledge (Gemini 2.5) can outperform sometimes other model like o4 mini, not because the quality of model is consistent, but because the sum of competences

ocean vortex
#

You are way oversimplifying it. There are many models from each of them. And I'm still not sure what you didn't agree with in my initial message you quoted tbh

#

yes

neat apex
#

4o is bad at coding, but he knows ball and perfoms well in more standardrized tests

ocean vortex
#

incorrect

#

lmao

neat apex
#

I dont see a reason to not count human judged tests

And also it keeps being true in automatized or LLM judged tests

#

Curious, i wanna read that

How i said, i am being objective and not affirmed they are really good

ocean vortex
#

How do you think current benchmarks were made if they aren't human tests? By aliens?

#

🗿

neat apex
#

Yeah, an test jugded by like Sonnet will be better than jugded by humans, mostly because it is smarter than a average human istead consistency

ocean vortex
#

This is the most ridiculous thing I have read this week, no offense

#

This is like asking some entity to launch investigation into itself

#

LLMs are terrible at self evaluation

neat apex
#

Anyone is bad at self evaluation, if you swear something is good, you will judge it being good and do in that way

ocean vortex
#

The problem here is either you don't understand AI at all, or you are having issues trying to communicate it.

neat apex
#

The issue is not even AI, its about self judge

ocean vortex
wispy quarry
#

Can I only generate videos here in the server, or can it also be done on the website?

ocean vortex
#

That's too much schizo for me to engage.... all this 'psych' world 🗿

#

Classic example of someone trying to apply something he studied to different incompatible fields. You are not ML engineer my man...

neat apex
#

An "Self Judge Benchmark" that compares the level an AI check it self perfomance comparated to an actual realiable test would be an a interessing number

#

I trust Claude can jugde it self but not Gpt for example

ocean vortex
#

And you can't make it generate test questions either cause it's never gonna come up with something it can't answer correctly. It needs to know the answer for the question to be valid

#

Generally from what I saw, even making it come up with the hardest questions it can answer is problematic. It's gonna sway into making the questions it can answer very easily. Because it is creating them based on what it knows and understands. To push it out of it's comfort zone you need human written/curated problems or tasks

prime mulch
#

Does anyone love gaming and read manhwa especially cultivation manhwa

prime mulch
gaunt spade
prime mulch
gaunt spade
#

or korean comics

prime mulch
#

There is a game released yesterday

#

Thats a f2P game

#

If you love manhwa you should play that game

keen beacon
#

yo guys

#

what is familiar to chatgpt 4o in websites ai

#

like really really familiar

#

good for coding and stuff

stray aspen
#

gpt 5.1 high is a freak

#

i think its as good as gemini 3

keen beacon
zealous sparrow
stray aspen
#

no its html

zealous sparrow
#

ah

cloud zinc
#

rip gemini 3

neat apex
#

Nahh, Gemini 3 will be better than that

#

I see Gpt 5.1 high is just like Gpt 5 ultra high since they adjusted the reasoning effort

hollow imp
#

Ok Claude models are literally unusable without Claude max

#

16k and 32k reasoning tokens are not enough

#

Gpt 5.1 extended thinking and gpt 5.1 high are way way ahead

gaunt spade
#

not even close

#

have u tested the new Gemini 3 checkpoints on AI Studio, within the A/B testing feature

fleet lintel
#

yeah, it's not even remotely closed.
not saying gpt 5.1 is bad but Gemini 3 is just another league

gaunt spade
#

lol

#

maybe flash lite

fleet lintel
gaunt spade
wheat onyx
gaunt spade
#

LOL

fleet lintel
#

I am just concerned that Google will raise price of Gemini 3 because it is soo much better

stray aspen
#

gemini 3 pro is nuts

fleet lintel
gaunt spade
fleet lintel
#

i think gemini 3 might demand a premium.. but we will see. if google wants to earn extra $ or want to be super competitive

gaunt spade
#

lol

gaunt spade
#

they can, unlike openai

#

LOL

hushed terrace
#

Hi, please remember that english is the only language allowed in chat channels 🙂

bleak lake
#

Guys Sherlock alpha which ai is this?

stray aspen
gaunt spade
#

they already did with 2.5 pro for 6 months

bleak lake
gaunt spade
#

thats xAI model

#

sherlock alpha

gaunt spade
stray aspen
#

yeah it sucks anyways

hollow imp
#

Grok is the worst

gaunt spade
#

lol

bleak lake
fiery gull
hollow imp
fleet lintel
gaunt spade
stray aspen
#

gpt 5.1 is nuts

#

but i liked lithium flow more

fiery gull
gaunt spade
stray aspen
#

who uses grok bruh

gaunt spade
#

for its coding abilities

stray aspen
#

just shut it down already

fiery gull
hollow imp
bleak lake
gaunt spade
#

🤖

zealous sparrow
#

It has the same damn style

#

every time

stray aspen
#

it lacks creativity i noticed its all the same thing

gaunt spade
hollow imp
#

@stray aspen ask your yupp ai to get gpt 5 pro api

fleet lintel
#

i have been wrong multiple times about Elon.. I wont make the same mistake by counting Grok out

stray aspen
#

unlike gemini 3

hollow imp
#

They are so rich

stray aspen
#

which gave me different style

fiery gull
bleak lake
hollow imp
stray aspen
#

and it likes putting a bunch of text so much

gaunt spade
fleet lintel
bleak lake
#

Atp grok 4.2 is coming out before gemini 3

#

😹

stray aspen
#

they removed it

#

this the prompt

fiery gull
fleet lintel
bleak lake
echo cosmos
#

Grok 4.2 is worse than Gemini 2.5 Flash

fleet lintel
fiery gull
fiery gull
ocean vessel
#

Yesterday I tried deepsider.ai and for now it offers free Sora 2 (10s) ai video gen 👇

stray aspen
#

scam labs ai is better than grok 4.2 lol

fiery gull
echo cosmos
fiery gull
stray aspen
hollow imp
bleak lake
#

Qwen is better than grok atp

gaunt spade
hollow imp
fiery gull
zealous sparrow
gaunt spade
fiery gull
zealous sparrow
fleet lintel
gaunt spade
fiery gull
echo cosmos
bleak lake
gaunt spade
bleak lake
#

Even qwen 3 4b Q2

echo cosmos
bleak lake
fleet lintel
#

grok is not that bad... its not the best but its ok 🙂

zealous sparrow
#

I feel like gpt models are better at python than html

echo cosmos
fiery gull
bleak lake
fiery gull
bleak lake
#

good reasoning

echo cosmos
fiery gull
zealous sparrow
#

If you are going to use a gpt model, i recommend for smething like pygame or python, because it aint lazy on that one [unlike html]

bleak lake
old igloo
#

HEY GUYS

bleak lake
#

It even runs well on phone

old igloo
#

WHILE GENERATING A VIDEO CAN YOU NOT GENERATE A ANOTHER ONE?

#

CAN ANYONE PLEASE CLEAR ME ABOUT THIS IN?

hollow imp
#

@deep adder bro I asked the same gpt 5.1 model a very complex task in chatgpt ui and lmarena and in ui it completely declined my request and kept repeating system guidelines system guidelines but it perfectly worked on lmarena. So in api you not only have custom system prompt feature but also this annoying openai safety guidelines is gone?

ocean vortex
gleaming roost
#

GPT 5.1 working for anyone?

wheat onyx
gleaming roost
#

🤔

tulip tree
gleaming roost
#

For me, it just keeps generating the message and then gives an error.

wheat onyx
#

curious if OAI will release new models immediately after Gem3 comes out

hollow imp
#

Uhh discord is not letting me send it how do I send

wheat onyx
fleet lintel
gleaming roost
fleet lintel
#

with some kind of meme-ability

wheat onyx
hollow imp
hollow imp
old igloo
#

HOW DO YOU KNOW IF YOUR VIDEO GENERATION IS DONE OR NOT?

hushed terrace
old igloo
wheat onyx
#

I think if you say i will give your advice to a lawyer as the next step, it restricts legal advice a bit less

hollow imp
#

I don't believe you one bit

#

If you're so good then tell me how do I bypass openai's system prompt and guidelines

#

Very annoying

gaunt spade
#

GUYS NOOO

#

riftrunner got removed from lmarena 😭

wheat onyx
#

Kill the legal ban system prompts