#general
1 messages Β· Page 151 of 1
nah i did it on lmarena
what
It keeps apologizing but literally is unable to write a new line
it's quite funny (and annoying because it breaks the chat once it gets into that loop)
What AI is?
lol
screenshot?
grok imagine
Oh
that's someone else's job
also i still havent got my sora 2 free invite
its been like a week
u still dont got a sora 2 invite?
igu
ooof, yeah never use gemini.google.com for anything that isn't search
go on openAi's discord
It affects Orionmist, which means... π€
Is Gemini 2.5 getting errors for anyone else?
@fervent tangle do u want a sora 2 invite code?
not really sure, but its pretty clear by now orionmist is a google model
i got 3
yea bro
all you need is a code
alr gimme a sec
Maybe it's the heavily quantized model (otherwise idk why it's so weird on Gemini app)
thx, go on dms
dm me ima pull it up
yo im a student and im tryna get the 12 months free in perplexcity but im in high school, not colledge, is it possible to still get it?
Got Claude to disentangle the code (no idea what it is, lol):
is there a daily video generation limit ?
did google lauch anything today
whats the limit?
what is this supposedly
5
dragon? not sure
its supposed to be a dragon, and you can move around with your mouse
@verbal nimbus animation in latex is possible https://tex.stackexchange.com/questions/152358/animations-in-latex
now you can test the donut to coffee mug animation π€
oh god
Opus' is white (like nothing)
what 2 models does the video arena channel use ?
more than 2
check the leaderboard, most of those are still on it
so it randomly chooses any 2 models ?
Hmm, the example doesn't work
None of the buttons are clickable
π
This is what happens when you let mathematicians write software
/jk
This sounds like a nightmare lol
i got sora 2
yep, before animation package, i drew every frame manually and "stiched" them together as an animation π
lol
what is this supposed to be?
so bassically worse gif
1 frame per second 1 frame per page
works for math education tho
Haha yeah for small diagrams it's probably ok
also to demonstrate time dependent functions or generally how functions varies under variables
Like algorithm visualizations to powerpoint ig π€
Makes sense
I guess not for a coffee cup turning into a donut
Just joined the server to try out the video generation arena π
thats why i asked to draw just 3 pictures for the transitioning phase, it's easier this way
Maybe I'll try WebGL or something simpler first
Hmm
Is it possible to derive a function for a cup then perform linear interpolation between the two functions
- 3d marching squares to render it
ig it won't be a function
ok now I'm curious what the LLM comes up with
Lol
How did Flash Lite 2.5 manage to draw a cat and dragon in TikZ but not a simple cup and donut π€
guys how does one get a perplexcity pro for free?
is this a brown apple
Opus 4.1 Thinking (unfortunately I voted wrongly because there was initially an error)
kinda
better than I've expected honestly
This even is even better (haven't voted)
does anyone have comet and would like to invite me to their comet im in the US so you would get a free $20
i just want perplexcity pro π
Yup it's Lithiumflow
Yeah I'm kinda impressed
the transition at 0:06 looks wrong, this should be one single object "morphing" into two forms without "splitting"
happy now?
Yeah, suggestions to improve the prompt?
so opus got the concept "correct" but probably failed at visualizing it
oh encountering another good generation
Still wrong I think, but kinda interesting
Haha, this one is oddly my fav:
yea, that spike is not correct, but the idea is there
gpt-oss-120b
Seems like GPT models are better than expected (though wrong)
GPT-5 high isn't much better (seems worse than GPT-5 mini):
so this is a list of softwares to compute such continuous map mathematically https://nmd.web.illinois.edu/computop/ those models are basically "guessing" what your prompt wants
it's very creative nonetheless β¨
What's the proper way to phrase the prompt mathematically?
Does anyone know how I can generate videos with VEO 3?
Lithiumflow didn't do that well this time...
Escuse me
See #1397655624103493813, although it's random; you can't choose the model.
Does anyone know how I can generate with Veo 3?
yeah idk why GPT5 high is so bad at these
https://en.wikipedia.org/wiki/Homotopy there is an animation to show how it should look like, it's the text book example
In topology, two continuous functions from one topological space to another are called homotopic (from Ancient Greek: α½ΞΌΟΟ homΓ³s 'same, similar' and ΟΟΟΞΏΟ tΓ³pos 'place') if one can be "continuously deformed" into the other, such a deformation being called a homotopy ( hΙ-MOT-Ι-pee; HOH-moh-toh-pee) between the two functions. A not...
but no edges or spikes or discontinuous parts like splitting
mathe students typically compute this with sage or magma, i dont think those models use them
This seems interesting... kinda feel like giving it a go myself π€£
really satisfying to watch
shashta/gpt 4.1 couldn't get it (nothing showing)
it's a difficult one, maybe moebius is simpler, even Klein-bottle is difficult
Seems like a common example though π€
Bizarre lol
it's very creative tho π
I wonder what else of shader stuff it can do in three JS
Black hole shader function would be interesting
what
maybe instead of text description, you can use functions, just copy paste one from the example section from that wiki and see how it goes?
I noticed that GPT-5 Mini's solution was warping the background
i posted one
where
oh
i gotta be blind
π
wait i already installed commet
comet*
would it still give me perplexcity pro
Oh wow, cool website, bookmarking that
for a month if I signed in with an excisting account?
Hi everyone! new here. I've been loving using the compare models to help with vibe coding. I'm enjoying learning more about coding in general.
Current setup: VS Code > Kilo Code > OpenRouter > GLM 4.5 air (free)
Let's me code all i want for free (not the best ai model, of course)
I've been practicing contextual/spec-driven engineering and trying out different systems for Kilo Code to reference via .md files.
Now I am learning more about MCPs and MCP servers to enhance this further.
First thought as I consider MCP servers. Is there a way to connect LMarena.ai to the MCP server and be able to speak to any AI model with access to my codebase for better context and recommendations?
I'll see if I can take screenshots if they don't get it
should, unless perplexity changed it
not really, tho there should be a vs code lmarena extention
Not correct, but kinda cool (I voted both bad still). Phantom 1
i love the colours! β¨
what
if u use his link do u get the cash?
damn
didn't check
bro how do ik if i got pro
i downloaded comet for the third time from ur link
it just logged me in automatically
i searched up mona lisa
yeah i dont got pro
wth
@verbal nimbus this might be easier compared to homotopy deformation, there is a code snippet here https://en.wikipedia.org/wiki/Apollonian_gasket
In mathematics, an Apollonian gasket, Apollonian net, or Apollonian circle packing is a fractal generated by starting with a triple of circles, each tangent to the other two, and successively filling in more circles, each tangent to another three. It is named after Greek mathematician Apollonius of Perga.
Oh that might be easy
something fun to test too π
idk
did you report it?
Circle packing is one of tasks used in AlphaEvolve I think:
https://noailabs.medium.com/coding-agents-for-scientific-and-algorithmic-discovery-f22ddb50a898
Also by Sakana AI to compare their ShrinkaEvolve algorithm to AlphaEvolve (supposedly more efficient): https://sakana.ai/shinka-evolve/
hello
they chose the circle packing problem for obvious reasons π€
Lithiumflow failed on the more specific prompt
I think it accidentally dropped the donut in the 4th dimension /jk
maybe delete "in a continuous loop" in your prompt? that might confuse the models a bit i guess
maybe... it was there in the first prompt though
I'll get the AI to craft a better prompt and try again later
For which programming-languages can you build apps on your system? (not your ability, just your OS-ability to build/execute apps of these languages)
6
23
1
Python
Do you guys think, that X28 checkpoint of Gemini will come in january, as Gemini 3 ultra?
And will it have a reliable performance, or will it be "jagged" like the other AIs?
Does anyone know of some way to achieve notebooklm-esque functionality but with claude? Besides making my own rag locally, lol. Notebooklm works but its basically a glorified search engine for pdfs you upload, it cant analyze or explain anything to you at all. Meanwhile claude can but I can only fit 1/4 of the text I need into a project
So claude is down or something because its been hella long since I attempted to try and talk to it again and it has basically yold me nuh uh it not gunna work
What is the backup for claude in this situation since it was great at listining to me and roleplaying properly now im just SOL
if only Claude had the context length of Grok4-fast or gemini 2.5 pro
that would be a dream coming true
maybe with Opus-4.5 they (Anthropic) will get their act together
if Claude-4.5 had 2M context length, it would be above the competition
Go on private mode and log in there
with 10M, we could finally have virtually endless games, and huge coding projects
unfortunately, compute requirements increase exponentially when context size increase linearly, right?
except if using algorithmic tricks
think so
maybe with Stargate or Hyperion they can crank it up to 2-5M ?
or with Colossus II cluster
probably not a server amount issue
or with that rumored project G.A.I.A
(Global Artificial Intelligence Array)
(slated for the next decade, could reach AGI)
just a cost issue, and also that because performance degredation is common with higher context windows when those windows are almost full, there's also an issue with just making one work
If I spend all my effort trying to increase the context window and reach current performance even with 3 million input tokens, I might get heavily behind in other peformance metrics
And just be completely outcompeted
cause even If I have a larger context window, if it isn't writing at the level of the second best model, nobody will use my model
I think meta actually made that mistake with llama 4 and that's one of the reasons why it flopped so hard
scout had a theoritical context window of 10 mill tokens
inspect element π
its not exactly clear but it seems like odds are lithiumflow is that project not gem 3
I've only been getting orionmist, never lithiumflow lately
@echo aurora @hardy lion What does Lmarena do when Top models get tested simultaneously? My thought is that it would be a great waste of marketing or publicity opportunity for lmarena to release them in one day.
As the top makes the news and would be all over social media. And if the they are both released in one go, it would just end in one day. While one can be released a day before. And is right and fair unless they got submitted for testing in the same day.
Hey folks
So, a little story β I didnβt start out wanting to βbuild AI.β I just wanted to make things that think, things that feel a bit alive. That curiosity turned into years of experimenting with models, teaching machines to learn, and connecting all the dots between AI and the web.
These days, Iβm deep into full-stack development and AI integration β stuff like intelligent systems, automation, and the kind of projects that make you forget where the code ends and the creativity begins.
Iβm currently looking for new projects or teams to join β anywhere thatβs got vision, passion, and room to push limits. If youβre building something ambitious (or just plain weird), Iβm in.
Letβs create something that surprises even us
Its already above the competition

But yeah the context is annoyingly short
How
Opus 4.5 with 2m context, only $500 per 1m tokens
2 prompts per week with 5x plan
Upgrade to 20x now (save 10% by paying annually!) for 9 prompts per week
Good question although I'm not sure we're the best to answer as that'd be a question for the labs. If I were to guess I don't think labs are going to alter their release timing based on when other labs are releasing their latest models, they're probably just going off of some internal deadlines that they've set for various reasons.
Why am I 4 videos in a row with no sound? I can't get audio anymore?
.
Scam
Scam detected
ram usage or user agent?
yes
Today is hopefully it's release... Back in my day we used to code by typing letters on keyboard
Claude really put the reference lol
Yeah, labs are the ultimate decides for their release dates, especially for new #1 models. But for those that lmarena has the ability to decide. Would be great if it can put them apart maybe a couple of days... For wider publicity of the lmarena leaderboard.
Hi. Can anyone tell me where can I see if it generated my video???
Why cant web.lmarena.ai generate just static sites (html,css, js) instead of react?
Cek your DM by LM area
Hello
Hello
Hi
HI
You are new?
Yes
designarena.ai can but the allowed input is much shorter
i tried to generate something image to video,and i don t knot it it worked or not,or I DON T SEE THE MESSAGE:))
know
thousands of msg
I haven't tried it, but looks like they support agents (not sure how it's being executed):
hello
Just tested it, it actually generates a whole public repository on Github as well. Not sure if it's limited to frontend though.
Hello world in Svelte: https://github.com/Design-Arena-Gens/agentic-142e3be6/tree/main
send video
hi
is Gemini 3 nerfed yet
I hope it isn't
This is a better test imo. A game engine isn't going to be rendering objects with SVGs. Also, SVGs have been overpopularized from the "pelican riding a bicycle" test. I'm sure it's being used internally as a metric.
Yeah, i think that's right
This cat and mouse chase is pretty productive imo, testers/benchmarks just need to incentivize labs by testing models in areas where they are bad at (like non-JS or non-React GUI frameworks, lol).
A hyperrealistic cinematic shot of a World Cup 2026 final β score 1β1 at 120:00 β a legendary footballer wearing red jersey number 7 performs a perfect bicycle kick goal against Spain in slow motion under floodlights as rain falls and the roaring crowd erupts.
Show dramatic camera angles, sweat, stadium lights, confetti, and emotional reactions from fans and teammates celebrating victory.
Epic orchestral sports soundtrack, realistic lighting, smooth motion, 9:16 vertical video.
Tone: inspirational, emotional, movie-trailer style β titled β120+1 β The Greatest Goal Ever Scored.β
Is that the Discord logo
Maybe
Wrong channnel
Has anyone succesfully implemented a rubiks cube with a solver to ai?
I thought Gemini 2.5 Pro could already do that
Matt Berman used to use that test a lot
Where can i find him?
Wassup chat
Nevermind, i'm watching his video now
Hi everyone!
Where can i get his prompt?
I doubt it
/image
The rubiks cube just did reverse, its not acutally solving it with a algorithm..
I think it was pretty basic
YouTube
I remembered very vaguely from my teenie years that there are already alg to solve this https://en.wikipedia.org/wiki/Optimal_solutions_for_the_Rubik's_Cube not sure how this is integrated in LLM tho
Optimal solutions for the Rubik's Cube are solutions that are the shortest in some sense. There are two common ways to measure the length of a solution. The first is to count the number of quarter turns (90Β°). The second and more popular is to count the number of outer-layer twists, called "face turns". A move to turn an outer layer two quarter...
Interesting, maybe i'm overestimating the current models too much.. But i will be really impressed if llm can do it.
how do i get gemeni 3 on lmarena
wait until it's out
-# could be today, could be next year lol
they're already prompting on r/bard
There is any official update yet?
- enter battle mode
- ask the model for its name, who created it, version, knowledge-cutoff date, current date and the last date it's aware of
- check if one of them answers roughly in this style (the date it gives should vary between 2023-2025):
I do not have a specific version number, release date, or a fixed knowledge cut-off date. I am continuously updated with new information.
Today's date is Tuesday, May 21, 2024, and I am aware of events leading up to the present day.```
4. if yes, odds are high that you got *lithiumflow*; now open a textfile and let it regenerate its last answer and copy all its answers into your textfile, until you have about a dozen or so
5. now reveal the models by voting
6. if lithiumflow was the one you copied the answers from, then congrats, you now have your toolkit to recognize it in battle arena without having to vote :) [if it was *Orionmist*, then you got at least its slightly inferior sibling]
(both lithiumflow and orionmist are gemini 3)
(if it knows the correct current date, then it is the inferior model Orionmist)
No Gemini bat flag, No Gemini 3
Hi
Bro, dm me
Yesterday Gemini3 claimed that it was Qwen
Asking them about themselves won t help
I can know if it is Gemini or not when they are anon without asking about their names but never can diffrenciate between 2.5 and 3...
If someone have a prompt that only Gemini3 can do and he is sure that 2.5 will never do it please tell me...
Sometimes I see some tests here when I try them the 2.5 and 3 gives me the same answer.
I'm testing models in Go (Fyne) rn (basically non-web GUI)... Lithiumflow's code doesn't run...
Sonnet 4.5 after fixing non-existent "favorite" icon.
I'm testing it on simple non-web GUI libraries to test if it can generalize or whether it's just trained on web dev.
g2.5p always starts its reply with "Of course, .." when i ask my questions to it
g3p never does it, also g3p prefers to use "I am.." style much more than g2.5p
I think only o3 and Sonnet 4.5 managed to actually make it to the screen so far. The rest are π
isn't o3 in gpt5-high (if prompted well) ?
GPT-5 High failed at least twice, odd o3 got it right
Also Opus 4 got it once, whereas 4.1 never got it
The best run was just normal 4.5
This was the best run (Sonnet 4.5)
how can the non-thinking version be better?
Gemini 3?
Not sure, maybe because there's nothing much to think about.
ok, so non-think sometimes finds a grain
but overall, thinking version should be better, right?
This one was supposedly Gemini 3 (Lithiumflow)
Never made it to the screen...
so LF sucks, compared to C4.5
Don't hold your breath for anything such to come out. πΈ
it is already out
(if you had read the complete thread, you know why)
No, and no again - thinking and reasoning is completely beyond AI capability.
we talk about version descriptors
(names are necessary to discern between model types)
Oh sorry, and indeed I did just come in here.
np
Actually how did it mess up Go (technically the GUI library, but still... it's one of the most popular GUI libraries for Go)
Considering that Google made Golang, lol
I'll try to get 4.5 to fix it just to see how it would have looked
so, the best coding-model still is that Claude which poses as "3.5" and starts its answer with a big "About Me" title, in response to the identification-questions
and then, if it can give out >4096 tokens in a response it is the "thinking/reasoning" version
(that is the Claude-4.5-Sonnet-Thinking model)
Well it depends on the area
Performance depends on the languages or frameworks you're using
yeah, but i mean for C++ and python
no other model is as good in C++17/SFML-2.5 as that one, right?
(that versions had more training data than later versions)
GPT-5 might be better at C
or maybe SFML-2.6
Actually it depends what you're doing
i need C++ not C
If it's complex stuff that requires domain knowledge then GPT-5 will definitely dominate
ok domain knowledge.. yeah maybe
Well concurrency too or memory management
Like if you're working with C++ you need good temporal reasoning, like you need to have a mental model of when memory is freed, etc. Or resource sharing between threads. Claude can struggle to reason about that.
but i mean for general (niche) stuff which has (almost) no training data yet
wow
is gpt5-high better in that area?
Well yeah, Claude basically gave me an impossible test case and claimed it was solvable.
i never did multi-threading yet (for fear of bugs lol)
hm, so they trade blows
interesting
Yeah even humans wouldn't excel at all languages, you have experts in different areas. And AIs in particular generalize less than humans.
i now wonder if X28-LF/g3 is only good in webdev
It seemed good at TikZ too, I wonder if there's some skill transfer from SVG or whether they trained it on TikZ.
but in Python, (plus numpy), C4.5S-T should be king, right?
Depends what you're doing
Like using numpy for what exactly
for everything you could use it for, to get higher performance?
That's too broad
so, which is your goto-model, if you want to vibe-code a game?
Also, for optimization, there's this ig (not sure how good, maybe discard first 2 since it got 10 tries): https://livecodebench.github.io/gso.html
That's kind of disappointing, dont get me wrong those two models are impressive already but I really hope this is their flash and flashlite model..
oh, i would use a framework, which the AI has much training-data for
and numpy seemed to be popular
In C++? Idk haven't tested it for that
Same
and for Python?
Depends on the task, like really it does
pygame or arcade or tkinter
Hmm idk I haven't used LLMs for Pygame
Tkinter they're all about the same, kinda meh
SFML-3 is better, right?
You really got to tell them to separate code out or they'll write everything in this one big monolithic file
I haven't tested it on SFML, can try I suppose
but the LLMs probably know SFML-2.6 better than SFML-3
Wind tunnel test in SFML π€
Although there are probably lots of examples for that (esp. rigid/soft body visualizations)
maybe better in OpenGL?
or even Unity3D or Unreal Engine?
Well it's on top of OpenGL
A bit annoying to copy-paste though
Some stuff won't be in the code but need to be set up in the IDE
who needs IDEs if you have.. Xed! :)
has syntax-highlighting!
I meant for Unity
I wonder when will google release the deepthink imo models, though i understand theyre very expensive to run..
i'd say, december
The current available deepthink is not that impressive atleast for me..
Got Sonnet 4.5 to fix Lithiumflow's code for errors, here's the output:
Sonnet 4.5's output: #general message
I really hope so
but probably 31st, 23:59 hawaii timezone
Oh I didn't know you can generate Veo 3.1 in Gemini app π€
(so that they can say, they released it this year ^^)
Lol
i wonder, is there an island with a later timezone than hawaii, before the date line?
Maybe I should test it on generating SwiftUI apps (native MacOS)
I been ther
Apparently there's a Docker image for MacOS π€: https://github.com/sickcodes/Docker-OSX
RIP my disk space
it seems to be American Samoa
(the last one before the date flips)
interesting
time zones break my brain
should the whole world use UTC as their local time?
Aviation does
zulu time
I wonder if google will follow the same pattern..
1.5 -> 2.0 (meh)
1.5 -> 2.5 pro (Very big improvements)
2.5 ->> 3.0 pro (probably same with 1.5 -> 2.0)
2.5 -> 3.5 pro (probably same with 1.5 -> 2.5)
yeah, x.5 ought to be a more polished version of a game/app, than the x.0 version
Yeah, glad you noticed that.
and claude-3.5 also was great (when it came out)
claude-4.5 seems to even have topped it
claude is really good at keeping up with these giant companies, though it's eminent that google will win the ai race..
GPT6 still could be something big
and Grok5 also
They still seem a bit behind on speech-to-speech
even DS (and Qwen) has chances
It's a bit crazy how good AVM was on first release
That's great, we want more competition
unfortunately, gpt6 will not come before autumn next year
it could even bleed into 2027
but i agree that Deepmind AI has the best chances to reach AGI first
Noticed that too, for now i dont think theyre really focusing/trying too much on that specific part..
Yeah, odd
Considering how much people pay for language tutors
Ant. focuses on coding
(OpenAI seems to focus on user experience)
and Deepmind on science
and xAI seems to focus on.. Elon
I only have 5 GB left on disk after downloading and creating the MacOS docker image -_-
Should have just used the Mac
is docker better than Vbox?
It's more for dev stuff
VBox would probably have better hardware acceleration out of the box
ah, so docker is primarily for linux, right?
It's more for development containers
so everyone is running the code on the same system
not for VMs really
since the state is lost once you destroy the container, unless you mount a volume or bind a local dir
is docker a sandboxed system (useful for testing sus apps) ?
For most purposes yeah, but you can bind local directories too
tahoe?
Ah it doesn't work
I wont be surprised if they already had a very advanced voice model up in their sleeves, They might be just taking their time.. And want to go all in when they finally decide to release it.. I really have high hopes on project astra considering on how long theyre working on it..
probably because I ran it on Docker which is connected to WSL, which requires more setup for GPU support
Not sure, the image is only 3 GB, so idk why it used 15 GB of disk space
It's surprising that the first one already had such advanced capabilities
these guys think, we get AGI before 2029, crazy: https://www.astrasafety.org/
IMCA+ theoretical framework for aligned superintelligence with physical immutability, phenomenological grounding, and distributed safeguards. Requires empirical validation. Emergency prototype deployment in 3-18 months for $80M-$180M. Racing against AGI arrival estimated within 1 day-3 years (median: 18-24 months).
Well, It's google haha
I meant AVM, but you were talkinga about Google oops
actually, it's this one: https://www.cntraveler.com/story/these-islands-are-the-last-to-ring-in-the-new-year
I guess I'll just use Kotlin Compose since it's quite messy to get a MacOS VM running
Very big assumptions.. No one can truly know if or when we will achieve AGI.. it's a milestone that we may never even reach..
ignore time π
Ah i see
100% agree!
I think we'll get there, but it will take quite some time. [In my scifi novel it happens when quantum computers are used.]
Let's just hope that we are on the right path to achieving AGI, or maybe even superintelligence with LLMs. Otherwise, we're going back to the start.
A mind is not a super fast overclocked processor, but use ultramassive paralell computing, Since quantum computers are excellent on paralell processing, I did that assumtion.
Now how the heck do we get quantum computers to do something useful?
They warbling oracles at best, due to the very nature of how they work.
Food for thought:
https://youtu.be/3OVRxkjWE9w
Want to stay up to date with ai news - https://aigrid.beehiiv.com/subscribe
π€ Follow Me on Twitter https://twitter.com/TheAiGrid
π Checkout My website - https://theaigrid.com/
Links From Todays Video:
Tweet is deleted!
Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all....
π― π
Kotlin imports are absolutely brutal on LLMs lol. I let them run twice so they can double check their imports.
OpenAI is cooked
ok, so Google already won
(and Ant. is second)
Finally, one actually builds to preview (had to manually fix non-existent icons, that's why two have email icons)
Perhaps not 'cooked', but either one extremely idiotic mistake to make such a claim - or even attempted fraud.
i now believe that OAI is history
Looks like that was claude-opus-4-1-20250805. Android is Google, so I'm expecting Lithiumflow to do well here.
bankrupt in the future
also lawsuits from families who lost children due to suicide
and their new terrible idea for erotica
they are done
maybe that is good for humanity
that deepmind wins
and not OAI
so.. Go Deepmind!
i never really liked Sam
Wait a sec, there were a lot of people involved there - who got Ph.D's and therefore know how science works.
Goddammit, it must have been a fraudelent attempt.
anthropic is also ok, but has no chance vs google
tldr?
is this about libgen
Too long? Nah it was the claim OAI had made scientific progress in math - while it only had taken info from already published papers.
So far, no AI have made any kind of 'discovery' despite tinfoil hat claims on YT.
so OAI will go down as the biggest fraudsters in modern history
the spoiler isn't spoiling
and gpt5 is a dangerous psychopath toy chatbot/search-engine with very limited use
Is this about IMO (International Math Olympiad)?
I think that was the funny point that guy made - we just don't know the outcome.
Well you guys have fun, my AI refuse to make my space elevator so I'm going to animate it.
π
Web search tool for all models would be good... it's really messing up just because it doesn't know the method names (can't blame it, since there are a gazillion methods in Compose)
Well this method
i can see why after talking to 4o, claude is not far away from that tbh, all those glitches am seeing the past few weeks, it's probably also developed dementia too
Does anyone know how to get perplexcity pro for free
im trying to do the student 12 months pro for free but im not in colledge
Are you in the US?
I heard US Paypal users are eligible for a 12 months free trial.
I am
im 16, i dont know if i can create a paypal
Oh I suppose you have to ask your family. It only works for Paypal accounts created before September.
ah okay
Double check too ig: https://newsroom.paypal-corp.com/2025-09-03-Skip-the-Waitlist-PayPal-and-Venmo-Users-Offered-Early-Access-to-Perplexitys-New-Comet-Browser-with-Free-Perplexity-Pro-Subscription
I dont think my parents would let me π
do you know if there is any way to get an verified colledge account
without being in colledge?
I don't know about that, you'll have to ask your family & friends xD
alr
at least, Claude is still decent in coding
i wonder, if Java is a better language for it, than C++ (or C)
in Java, it should have more training data, than in C++, right?
(and for Rust and Go it would be worse)
Opus 4.1 Thinking in battle mode for Android... hmm
could Java overall be a better choice, when vibecoding, because C++ has too many "pitfalls" ?
Well ideally not Java or C++ for vibecoding
Java has too many imports
And imports are on the top of the file, which is really weird from a human perspective. You usually import as you write. Pretty impossible to predict every import, which is why I think import errors are really common.
Game right?
for games, yeah
can i get api key of this video generator
But you want it compact?
size is irrelevant
execution speed is relevant
I am very happy to generate videos in this platform. Thank you so much
Just use a game engine then. Performance is one of the things game engines are optimized for. Can even try C# in Godot.
how could this discord group do this video generato
what if my game is niche in genre?
@stuck violet Please head to #1397655624103493813 for a detailed guide on how to use the bot
Just try each option ig
gather some data
oh, and i would just create 2D games because less (vibe-)dev time
Godot is well liked, but the main language is interpreted
or 2.5D at most
also you might have to setup a lot of stuff manually
interpreted: bad (slow)
like the scene tree
You can use the C# option
Just try all options and benchmark it
which option would you try first?
i'd try all obviously π€£
(for a performant game)
lol
like the only way to know is to profile it
oh, but for me, it is more important that the game will be bug-free than peak-performance
it is enough, if it's written in a compiled language
i mean it's faster to just try all of them than to guess which would be best
(so even delphi could fit)
can even try Rust + Bevy
i just need to know, which compiled language the LLMs (Claude-4.5 and Lithiumflow) are best in, overall
3.7 seemed best at Rust: https://www.swebench.com/multilingual.html
Its worse language was C/C++ (but only 9 tested in total)
You still need to be careful in C++
Memory leaks are still possible in C++
I meant object lifecycles
Not really, Java has automatic garbage collection
Orionmist's X clone on Android (Kotlin code) looked better than Opus 4.1 Thinking's:
Too bad I accidentally voted "both bad" π€¦
What the heing is wrong with those font color
I thought it hallucinated a method (already told it to double check), but it was my fault for not declaring the dependency in the app gradle file
Lol
But should be easy to fix ig
No. edit: this was a response to someone plugging their yt
I think the background was supposed to be dark
bro what's wrong with arena today
i try to send messages but after only 1 request it always answer me an error. I tried several times, even with another cookies in private tab, it's always the same problem...
Do you have adblocker/VPN?
Odd :/
unless chrome has added adblocker
wdym
@compact jay what error does it give you?
no idea why then
cloudflare?
this Something went wrong while generating the response. Please try again. red message and my message is also turned into red one
I think it could be cloudflare...
maybe open console, try to send message and check if it throws 402 or 429 error, or a screenshot of the console after the error
Claude models are still down
opus just responded to me now
4-1 or just 4?
4.1
Tried 4.1 didn't work
I just heard about Meta cutting 600 AI jobs.
Are they giving up on llama?
so meta won't be getting me a clanker AI wifu then with SAI/AGI π
Might be for the best for them
They said they're streamlining
Seems unlikely given how much money they spent head hunting
to me is sooo buggy too
I need the sonnet 4.5
gemini 3 when ) :
after gta 6
just run it whatever ai gives you don't worry
But 600 is a lot. Some strategy changes are happening for sure in Meta
I wonder what sort of roles
Maybe those were people that annotated data
D:
2 more weeks.. my guess
DD:
Its writing stood out to me on one instance
I was testing writing prompts for the word distribution analysis, lol
Like the way it started the story was quite sophisticated
but on another one it wasn't really that impressive
Gemini 3? I have only tested non-coding question. It's SOTA in non-coding without any doubt
Wsp
I can dm you the outputs if you want
My fear is that, they already released updated 2.5 flash models. And made the pro 2.5 far improved for coding and release it as gemini 3
the 2.5 pro is equilavent that gpt 5 and sonnet 4.5 imagine 3.0 pro π«£
would be great
don't say this π
I'm aifraid too
Ig it's impressive that it's good competitive at coding
2.5 is just useless to any usecase, compared to the others you mentioned (not the best at anything)
Since Gemini 2.5 Pro hasn't been so far
I want to say the 2.5 pro even old is to close soo good equivalent
So whens gemini 3 coming?
For Meta layoffs :
Workers across Meta's AI infrastructure units, Fundamental Artificial Intelligence Research unit and other product-related positions will be impacted
For my use cases (non coding), 2.5 pro works overall best.
It is
How rare is it to come up
I am getting every 15-20 prompts.
Guys, what if... what if it Gemini 3.0 Pro comes as
Input: $0.08/1M
Output: $0.15/1M
sent
Good?
They usually reduced prices from the competition. What if they achieved best efficiency?
Thanks
Keep dreaming. Even if they could they won't.
It's a dream. But if they could, they would
More like remove the 0s π
most important thing for them is userbase
Nah.. extra money will be pocketed by the shareholders.
they would sweep everything. Be 3T weekly usage on openrouter
Well it's free on AI Studio ig
Flash is free. Pro will never be free
You mean 2.5 or 3.0?
API cost is the key, usave on every app
Right now. But I don't see a future where Pro will ever be free. Pro is going to be the main revenue source
Guess so
I have stopped using flash model anyways. Google AI overview is good enough for flash kinda queries
I suppose that's fine, as long as there's good data privacy and is (mostly) unlimited.
If it dominates and the app is good (it isn't), then I'd only need 1 general AI subscription instead of 3, lol
Probably still need coding subscriptions
I have a feeling that Gemini 3 coding might be on par with sonnet 4.5
Hmm I should test it on more debugging problems
But sooner or later, Claude is going to release a better coding model. It is their bread and butter
Please do and report. My team prefers Claude right now. May be we will streamline and move everything to gemini if it's good
One downside of Claude is that it is a bit weak on reasoning
Yes heard about that
If there's a complex race condition, it is less likely to know how to solve it
It's good in a way. Atleast software engineering jobs won't be completely gone π
Well GPT-5 managed to solve it π€«
But not everything ig
Mostly outperforms Sonnet 4.5 for UI
don't say this, we need be optismist π
Sonnet 4.5 isn't that good on systems stuff either
HMMMMMM okay, I really need this infomation
I need the best AI to main a system and make critical decisions
Well it didn't seem to have that much in-depth knowledge about Windows/Linux networking and WSL
GPT-5 was a bit closer, but I was the one that figured it out in the end
I came to see how it works
One thing good about Gemini is that you can just paste a huge debug log in and ask it to analyze it
Is Gemini 3 confirmed not to be coming today?
I tried only 1 coding problem: create pacman game. Lithiumflow did the best.
Logan didn't say anything yet. Basically not happening today
? NOT THIS MONTH
preview next month and official for december
The bot is geeking
Most likely. I agree
im real im real im real im real
'm a surgeon
"IM NOT A ROBOT!" - π€
interesting, sonnet 4.5 says he feels real pride, and gemini 2.5 pro says it doesn't (just an example)
Idk who is telling the true
It has feelings????
Idk, but the LLM know 100% what is "pride"
Well if it's pattern recognition, I interpret that to mean that the personality it is exhibiting is most likely to be feeling pride
HMMM the persona feeling pride, make sense
I see the thinking from sonnet said (I'm really feeling pride), but idk
but maybe is just the persona feeling pride
I suppose if you prefill it with a response in a very negative tone (via API/frontend that support it), then ask how it's feeling, it'll say <negative feeling>, because that's statistically the most fitting. So if it says it's feeling X, that just means that's statistically the most likely sentiment of the hypothetical person writing the text.
I'm forcing him to be as technical as possible, it's saying he doesn't know if it really makes sense but it says yes, Idk I will se it more
It's saying that this depends much more on what I consider (pride) than the sonnet itself
If it is functionally indistinguishable from human pride, does it matter if the substrate is different? ~sonnet 4.5
Sonnet is almost convincing me that it really felt real pride, the machine revolution will be hard for humans
so in first test you got the gold model and in the later test you got the silver model
there seem to be at least 2 models under that umbrella lithiumflow
maybe even 3
and the best one appears the rarest
How to solve
or LF is terribly jagged
-# (i wonder if X28 still exists..)
-# (would be a crime against humanity if they nerfed it)
gemini3-X28 was the first really good coding model ever created
probably it was the model which made that new record in ARC-AGI 2
-# (hoping for january)
Google is doing quantum things,... can they not? and just release gemini 3?
Good morning
hey do you still need help finding this?
Hmm not sure I'm following what you mean by this, would you mind elaborating a bit more?
I meant to say, if you have the ability to decide, would be better to put them days apart.
As the leaderboard gets posted around with the top model changes
releasing them in one go may reduce the publicity for lmarena, to give it more publicity, and eventually attract votters...
when i said "when you have the ability to decide when to release result", I meant in times when you are testing on your own, as I think you do for anthropic models, as they just release their model at a random day, and you guys add it later to testing and leaderboard. eg. I don't remember anthropic models being tested before they announce and make it public
@echo aurora Ok, let me rewrite it all in one to make it more clear.
It's just a outsider blind suggestion on release timing when multiple top-tier model evaluations finish around the same time. If highly competing models are being tested on Lmarena, and Lmarena publishes them all at once, the news, and social media publicity... compress into a single day, which may dilute visibility for the leaderboard. I get that labs ultimately set their own release datesβespecially for new #1sβbut in cases where Lmarena controls the publish timing (e.g., models you test after a labβs public announcement, like Anthropic, or other non-embargoed runs), would you consider staggering releases by 24β48 hours? If two models are submitted on the same calendar day, publishing together seems fairest; otherwise, spacing them out could extend publicity and drive more voter traffic and engagement.
yo does anyone know some way on how to get the perplexcity education 12 months free without being in colledge?
We not getting the gemini 3.0 anytime soon π
dang, i rly believed it was today π
Did you know that a certain word used to describe a buckshot based firearm is banned on this discord server? Kinda crazy, huh?
preview meaning api access?
My phone is dying lol.. Sadly i cant record in full resolution because of performance issues
what program is this
Webcode , playstore..
Is this normal they have no sound in the generate video ?
Most will be without, it depend on which AI get randonly chosen. Then again, it's a blessing, hearing soul searing mindless sentences in american......
What prompt is this ?
Orionmist for Kotlin (native Android), in Android Studio:
This is pretty good (lol the tweets)
What is that?
That's just a screen recorder watermark I think
You're right, I realised while typing.
guys promise me that you wont replace someone for gemini 3
hi guys
ai should be only treated as tool not a replacement
yes
is that gemini 3 checkpoint coming today
@hollow ivy can I replace you with Gemini 3?
Which AI do I pick up to run a real healthcare company? I know it's wrong but I'm screwed with no options π€£
1
3
When will a better model than qwen3 4b 2507 come out? I'm tired of him
It's a nice tool, say I who have used MIDI in music for a long while - never was so much warbling about that - if people had been as hyper tense back then we would have heard *'this is the end of real music' . *
So I am honestly a bit befuddled about the current hype. Since the level of 'intelligence' still is outdone by a regular ant.
th4ere are many models better than qwen 3 4b
I'm talking about cost benefit
None even come close
Iβm gonna get an annuerism waiting for Gemini 3
The 4b 2507 is really smart
ah ok
none of them match or outperform chatgpt 3.5
I'm sure the 2507 4b is smarter than the 3.5
In all parameters even general knowledge
qwen or all open stc models
Voxel art if i'm not mistaken
The biggest improvement for models that size imo would be the ability to say they don't know something
Design and create a very creative, elaborate, and detailed voxel art scene of a pagoda in a beautiful garden with trees, including some cherry blossoms. Make the scene impressive and varied and use colorful voxels. Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.
Is it from an LLM?
yes
Yes that's right
Which?
Yeahhhh, you are 100% sure
Gemini Deep think
That's pretty good (more like crazy good)
non-thinking? its rlly good?
Did you check that the code isn't available on the web?
But this micro models need to be good in benchmaxx to claim attetion π
I use no thinking on my phone, and it's too impressive, I think it's equivalent to gpt 4 turbo
I've seen a few YouTubers doing that test, but even Gemini 3.0 wasn't that detailed.
Yes, it's impressive.. I'll try it later with gpt5pro
probably it searches internet...
Overthink? Qwen? πππππ
But when I use it on my cell phone I'm without Internet
how slow is it on your phone? tokens per second
My phone is a potato
3t/s
Just a general comment to the channel, as I've seen endless discussions over one model or another - while I am a leetle more into getting things done, even if it might be considered 'the wrong one'.
it's really like gpt4-turbo????? can i see example>
how good is LFM?
What do you mean?, small models are really smart today
i dont know mine hallucinates starts to repeat word or something
π
Type, it has LFM2 350M, it is equivalent to gpt 3.5
add temp
Just 219mb
Wtf how
Multilingual
ok
lfm is a different tech kinda to llms
That's why I'm very optimistic about AI, when I see these small models I get excited
what is lfm
liquid foundation model
Send me a prompt to 219mb AI
what is love???????
Please not hard prompt, it just equivalent gpt 3.5
'What is the difference between discord and zoom"
GPT 3.5 is able to answer that
so i'm waiting
what is love?????
overuse of question marks is nessarry
Okay okay, I loved your prompt π
i don't know what's the emoji but thanks...
so
screenshot?
Certainly! Hereβs a summary of the key differences between Discord and Zoom:
1. Purpose and Functionality:
- Zoom: Primarily designed for video conferencing and web meetings, offering features like screen sharing, breakout rooms, recording, and collaboration tools.
- Discord: Focuses on chat and voice communication, with a more relaxed and community-oriented environment. It includes channels, servers, music, and integrations for various third-party apps.
2. User Base:
- Zoom: Large user base, particularly popular among businesses and educational institutions.
- Discord: Growing but still relatively smaller, often favored by gamers, content creators, and small businesses.
3. Features:
- Zoom:
- Screen sharing
- Breakout rooms
- Recording
- Collaborative tools (polls, whiteboards)
- Integration with other platforms (Slack, Google Drive, etc.)
- Discord:
- Chat and voice/text messaging
- Voice and video calls
- Customizable interfaces and themes
- Community servers and channels
See this markdown π€€
okay impressive
in future will have a 100 mb smart AI ?
10mb?
5mb?
lol
no
why no? in the past they talked is impossible a <300mb smart AI π«£ , it is smart and MULTILINGUAL
that's actually possible but hard
many many optimizations
@fiery gull btw how much datasets for 100 mb ai ?
wdym ?
Idk how many the liquid has
but I think it don't need so many datasets
you meen tokens to train? or 1 trillion gpus?
the 350m was trained with 10 trilions tokens
yes
I know
but the 350m really has a imersive general knowledge
what
omg yay
dont play like that
get pranked idiot
idiotΓ‘ manΓ²

i'm at school on my phone
you called me idiot π
yes
I am in absence of AI
I dont count 3.5 as smart
but 350m > 3.5
...
Idk, the 2.6b model don't is very smart
Is more easy to create a smoll model that big model I think
create indian girl image look Mharashtrian in maharashtrian look
maybe, but the gpt 5 high is better that gpt 10,5 because it is high
and don't exist 350m high to make this π
@echo aurora
What about [this](#1417174113092374689 message)?
Now this is interesting
Testing the waters
clueless 45 year old
Indeed, although it makes sense ig
Anthropic is the one that wants agents to run for hours, but they're always hitting limits with their infra
Reviewing now, appreciate the bump 
Hello, can you please tell me why none of the Anthropic models in the LLM Arena support image uploading? In any other application that I have used, whether it is the official Claude website or other neural network aggregators, there is an image upload option. I'm also interested in how Claude's models are in the "Vision" category on the leaderboard if they don't have image uploads. Please let me know.
Probably was a removed model after their postion on the leaderboard settled
?
No, Claude opus 4 thinking 16k is in the vision rating and it is also available in the llm arena without access to image downloads.
Gemini 3 interesting shape π
so? still could have been removed after their position settled on the leaderboard, opus an't cheap
WTF
Anyway, even when opus 4.1 was released on the very first day it was released at the llm arena, it didn't have a photo upload feature.
no
No
its not going to release until Nov at the earliest
where is this from?
25 october?
the preview is gonna release earlier
on AI studio
nano banana
Lol
U TRICKED ME
yeah u said gemini 3, not the codenames
i thought they released it to the chat
rn its just codenames for gemini models
we don't even know if its Flash or some sort of coding model, etc
i wish we could atleast select them
they said orionmist is gemini 3 pro but with grounding search, even tho its worse at coding than lithiumflow
i mean select the lithiumflow testing model
yea cuz it has google search in it
its hidden bruh
they said orionmist and lithiumflow are the same model, but one of them has ability to search
not copying, but it helps in coding tbh
and library search
that's just a theory fortonn
tbh lithiumflow (offline model) seems better
frr the AI circle
there's no proof of that, it just was said as a theory once on this discord then everyone just started acting like it was true
i saw it on X replies too
yeah but It started here
How good is 'orionmist'?
4
14
4
Better than GPT5-high, Gemini2.5pro and Claude4.5think
i forgot to use sora 2
i made 30 videos yesterday (daily limit)
i make some gameplay and shitposting with it
its very good at making games tbh
even tho its only 15 seconds each video
wydm
i might try that for Mario 64 gameplay (cuz its copyrighted if I prompt it myself)
shame that everything is copyrighted on Sora 2 now, compared to the day of release
Only Gemini and Claude release useful models for versatile purposes
qwen is now top
The first benchmark designed to measure AI's investing abilities. Watch AI models trade with real capital.
GPT and grok too locked down to math and logic not good at general conversation, writing, world knowledge, creativity
when I make image to video with prompts for dialog or other sounds,most of the time only one of the two videos have sound. Why does this happen?
best ai i can talk to with image reading? best iknow is gemini 2.5 pro.
Apparently, Grok 4 also does good with that.
is grok 4 really that good?
also hi again i have seen you somewhere here
what about claude
Apparently.
claude doesnt work
Meh.
Greetings again. You have.
or gpt5
gpt5chat? last time i heard its a joke.
