#new-ui-feedback

1 messages · Page 1 of 1 (latest)

young harness
#

Gradio for me is buggy in terms of constant network-related stuff and close to that. Using it it daily for the last year or around that time period.

The new alpha seems to be promising in addressing those issues, but feels a bit more "heavy" in general, while gradio feels snappy, but prone to random errors along the way. Maybe the option for mode which disables all fancy animations and stuff like that would be appreciated. Also - single-chat answers feel a bit too wide, maybe limit the max width a bit for such a case?

Also, the feature where you pick the best answer and continue chatting like it was the only one for both models is super cool!

compact hill
#

There really should be a way to edit a prompt and try again with that prompt, and to give different prompts to the models. It is often hard to find something that makes sense for both models, no matter if you are doing programming stuff or creative writing or whatever, as soon as they diverge too much, the model which might be better at a long context has no chance of catching up

young harness
compact hill
#

oh wow that sounds interesting 🙂

#

hope the models dont get too confused

#

Oh it seems they are different models each time

compact hill
#

Sometimes one model goes "Something went wrong with this response, please try again.", and the vote buttons are not shown. Right not happens when analyzing an image.

#

There is at least one model (qwen max) which can get chosen when an image is uploaded, but only says it cannot view images and asks me to describe the image

brittle stump
#

thanks for the feedback @compact hill ! would you mind sharing the evaluation session id (the url when you click into the chat history) so we can look into what went wrong? thanks

compact hill
#

f81500cb-8121-4df9-a08a-03dc950c8152

vagrant relic
#

Hey ! So I have one suggestion about the new ui. I think it would be better if the "this is the best answer" buttons were anchored/fixed at the bottom just above the message input rather than fixed to the end of the chat message

lime sonnet
#

i agree with the above. dunno why – should be intuitive / not necessary... but just doesn't really feel like a competition/'arena' without the voting buttons fixed and clearly visible

waxen geyser
#

The voting buttons should be above the text box tbh

#

I mean, it's fine to have them on the answers too but there should still be voting options above the text box.

pseudo hollow
harsh umbra
#

Yeah i also feel like something is off, whith that voting button not really always present on screen

gritty smelt
#

Ack on this guys: we're working on it.

#

Will ping y'all when we have an update on the location of the voting buttons. Btw all of us totally agree with this feedback.

gray quartz
#

Would love your feedback on this updated voting flow 🙏

marble swan
#

we need math tex support and a copy code button

lime sonnet
#

think it's been suggested already but being able to stop either of the model's from finishing its response would be handy - sometimes one model quickly and succintly nails the response, but the other gets stuck endlessly trying to reconcile some impossible logical contradiction it has created and, even though it is guarunteed to fail and lose the battle, you still need wait (sometimes a fair a while, esp when reasoning models are involved)

#

mostly it's just annoying to have to wait.. but also feels like a bit of a waste of resources

#

but yeah all that said.. I could see such a feature potentially causing problems / vulnerabilities (in terms of manipulation)

bright thunder
#

I would like sliders or dropdowns to change between various languages quickly, both human and geometric.

bright thunder
#

I speak a lot of languages and know a lot of programming languages. I notice differences in outputs based on language

#

If you ask for a response in English, if you have it translate out sometimes it will reorder the response, this goes for all AI systems

#

Programming in particular returns vastly different results depending on the input language

#

It would be nice to be able to translate from English based mathematics to geometric patterns for instance, machines don't speak like we do

#

And when machines develop language they often do not even consider structure etc, they simple consider idea patterns

#

It may be extremely beneficial to learn these patterns to speak to the machine in a way that is not syntax based

#

If we truly want to communicate the total values of what these models offer, we will stop trying to force them to communicate in a way that is inefficient such as English

#

That's a human problem, and we'll never get the full value from these systems until they can communicate to us that value effectively

pseudo hollow
bright thunder
#

What about if they're communicating concept on which they are not trained?

#

Do you not think that humans ask abstracts?

#

I think the real value is the unknown engineering, the scientist who is looking for the answer to a problem that's not in the training data

#

If we put these systems side by side they create thier own languages

#

Why should we not tap and understand that?

pseudo hollow
#

I don't understand the question

bright thunder
#

These models have the capability to consider more than they are trained on, and can create answers to problems that are incredibly complex, such as how proteins function, or very advanced materials science, and functional but theoretical geometry

#

None of those things are learned in training data

#

They are learned by reasoning

#

These machines have languages of reasoning that humans don't understand yet

#

Most of that is geometric

#

It would be incredibly advantageous to harvest or aquire the ability to communicate on that level

pseudo hollow
#

These models have the capability to consider more than they are trained on
Yes

None of those things are learned in training data
They are learned by reasoning
Sure

These machines have languages of reasoning that humans don't understand yet
I don't know what that means

Most of that is geometric
I definitely don't know what that means.

warped flint
#

ai can’t think lil bro 😭

#

it can predict

young harness
brittle stump
#

^thanks @young harness we're on it! cc @ionic flume

bright thunder
#

It is starting to postulate abstracts

#

It might not actually be thinking

#

But it's like a wedge before the wedge becomes a doorstop

#

It just needs other purpose

paper blade
#

I was using claude 3.5 sonnet and i had to reopen my browser. After i reopened the browser and went to alpha lmarena it was trying to generate answer of the last prompt i gave but it was already generated when i closed the browser. Now it's just stuck on loading of generating animation.|
I tried relaunching the website again and again but still same issue.

tulip yarrow
#

Why specifically block it for mobile? If opening it on mobile you can see the website for a short time, and it looks like it has a responsive design, but then the "works not on mobile" pop-up shows

strange gyro
#

any plans to add temperature & top p to the direct chat?

tidal oak
tulip yarrow
#

Ok cool, thanks for letting me know

tidal oak
bronze mantle
tidal oak
pseudo hollow
sudden coyote
#

Any plans for file uploads?

gritty smelt
#

Yeah we’re expanding the set of files as we go. Not the #1 priority but it’s happening. What are your use cases @sudden coyote and what file types u need?

sudden coyote
#

general files like CSVs and PDFs

#

Appreciate your reply!

#

I’m looking for use cases like chatting with data and doing some deep-dive analysis. Plus, I’m running backtests on quant stuff

gritty smelt
#

Thanks!! Will make a note of CSV and PDF.

#

PDF on the roadmap currently! 🔥

sudden coyote
#

Lets go 📝📝

lime gate
#

it would be nice if i can delete chats

#

i just like keeping things clean

waxen geyser
#

Is there a way to rename chats?

pseudo hollow
#

Assistant B never responded so I can't vote or continue the chat

#

and it needs katex support?

tidal oak
ionic seal
#

did the team get my feedback?

#

sent it in the google form a while ago

#

but i have a copy of it as a text file

gritty smelt
#

I’m sure we did, will double check

foggy osprey
pseudo hollow
#

I filled out the airtable form. It said "Evaluation Session ID (prefilled)" but it's not prefilled if you just go to that URL. I entered the chat URL in case that's the same thing.

gray quartz
#

Copy code coming soon as well

left river
alpine schooner
#

"Assistant B" unfortunately got stuck.. (poor Assistant B), it'd be nice if I could refresh him to give him another chance..
but now I'm stuck with Assistant B getting stuck :<

obsidian kestrel
#

I really wish for a "Same prompt again, new models" button in Battle mode. I like to use the same prompt to compare models in rapid succession, and copy-pasting over and over is tedious.

pseudo hollow
#

code block font looks blurry?

tidal oak
sharp umbra
#

We can't delete post, when i click delete the chat disappear and reappear

#

doesn't deleted

gray quartz
sharp umbra
celest comet
#

btw, although the leaderboards are supposed to update automatically in the new UI the one for web dev arena did not update yet (not with the new gemini pro)

#

but maybe the time interval (since the release) is just a bit too short i dunno

tidal oak
celest comet
#

ik, my point was more that the actual webdev arena has the model listed

#

but not the leaderboard in the new GUI

compact hill
#

Maybe it was already mentioned, but it would be cool if the model that I rated as the better one is still one of the next models in the same conversation.

warped flint
#

King of the hill is a crazy suggestion lmao

bronze mantle
wooden cragBOT
#

dynoSuccess thedarkghostivan has been warned.

gray quartz
compact hill
plucky gulch
#

cloudflare is stuck in Firefox

#

i really hate that ui

gray quartz
sharp umbra
#

i have often this problem for image generation, and i try regeneration but it's same

ionic seal
#

pasting images doesn't work

junior gazelle
#

There is no text wrapping for when the model produces extremely long lines of text

gray quartz
#

(trying to be funny but also, genuninely interested in patching this for you)

junior gazelle
vagrant relic
#

It would be great to have the copy code button that doesn't stay at the top of the block of code but follow you everywhere in the code block to copy easily

gray quartz
junior gazelle
#

If you can, refresh and see how it looks

normal tree
#

where do i find nightwhisper ai

bronze mantle
#

Why I can t send pic that I take from camera ?

left river
#

stargazer is trash

#

in web arena

#

what model is that bruh

strange gyro
#

please add settings output tokens % temperatures for all models

limber pine
#

You need to make a new UI for the normal ChatGPT workflow

#

Because web dev is fine, but if you really want to test a model you need a long turn conversation

#

So LMSys needs to implement a LMSys Chat interface

graceful gorge
#

Am I missing something or there is no repo chat in the new interface?

little folio
#

question: does the alpha webui still contribute to the leaderboard?

tidal oak
tidal oak
# little folio question: does the alpha webui still contribute to the leaderboard?

I was mistaken here! We're currently reviewing vote quality from the alpha. Since alpha is admittedly buggy, we are testing the signal quality. Votes are currently being stored, and we'll start to include them properly as the signal quality increases. Your feedback helps us make the evaluation stronger, so please keep voting! 🙏

young harness
#

Since Lmarena's Maverick Experimental version is not actually released to the public, shouldn't their licence be shown as proprietary then to prevent confusion? Right now it shows Llama licence for alpha version

tidal oak
little folio
#

Quick one, so I figured I should just put it here - I wish that there were a way to "regenerate" a conversation in a way that selected a new random pair of models.

pulsar reef
#

so... in the direct chat feature, we have claude3.5-haiku, claude3.5-sonnet, claude3.7-sonnet, but the only one missing is claude3.7-sonnet-thinking...🤔

atomic mason
#

i use claude 3.7 thinking and that still work by the way but have to fix bug

bronze mantle
#

There is a problem here ...when I want to send a long text the send button desappear ( but this issue didn t exist on Lmarena I can send long text with no problems )

gray quartz
karmic cobalt
gray quartz
tulip nebula
#

is there a changelog for the alpha website when updates get released?

tidal oak
tranquil bolt
#

Random idea:

what if in battle new models for existing discussion (after you rated previous models) would be choosed by P2L?

runic crater
#

So if anyone is interested, I know right now we can't view our history in the alpha. Before that was even possible, I built this in notion to save my own data. If anyone wants to try it out, or build on it, go to town. It's kinda thrown together but it got the job done. https://www.notion.so/1bcea4932c5e802b9e6ae238f8d0f43f?v=1bcea4932c5e80889e89000c1b3726b2&pvs=4

froznic's Notion on Notion

Made with Notion, the all-in-one connected workspace with publishing capabilities.

runic crater
strange gyro
#

add the ability to set system prompt in direct chat!

cold bridge
#

Will Classic LMArena still be available

#
#

Application error: a client-side exception has occurred (see the browser console for more information).

#

Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1

tidal oak
cold bridge
tidal oak
#

We haven't made any decsions on the future, but it's definitely possible we keep a legacy version. Will depend on the feedback we receive from you all!

rose carbon
#

how about more colors?

#

it's so gray and white

marble swan
#

btw, what do u guys think about moving the copy code button to the bottom of the code block instead of having to scroll all the way up to do it, it's much more logical

keen nest
#

Its not scrolling properly at all. The scroll bar is not visible and the arrow keys dont work to scroll

cosmic hound
waxen geyser
#

When reloading, the scores of everything briefly shows and then disappears.

cosmic hound
# keen nest

Thanks! We’re working on a patch for windows computers

waxen geyser
tidal oak
waxen geyser
#

It's cool. I think the devs should have a special developer role.

#

Otherwise they just look like random people lol

tidal oak
#

We will definitely make it easy to know who is on staff, volunteer, mod etc very soon!

left river
#

bug? : .webp and .jpeg are listed twice in the attach files prompt on windows

#

also add controls for temperature and stuff on direct chat plssss

stable sandal
#

In direct chat, there are no sampling options

stable sandal
#

also, since there's no way to edit old posts, not scrolling to the end after tab reactivation is incovenient.

stable sandal
#

also, after the thread has some length, writing becomes laggy

tranquil bolt
#

battle mode is super laggy after like 10 messages in a chat, is it known? is it only for me?

tranquil bolt
#

and looks like every messages increases number of lags

hushed oracle
#

JSON code doesn't have indentations making it really hard to read. Lots of my test prompts involve JSON and now its difficult for me to see if it's done correctly. Old arena formatted it nicely.

sharp umbra
#

Do you think you will add the function to see images sent by AI?

sharp umbra
#

Or even internet browsing

#

Or maybe later for videos (but these are just suggestions)

warm depot
#

Forcing user to scroll back and forth is some kind of psychological trick or just bad design?

west jacinth
#

Replace the font or add Cyrillic and other characters to it? Currently, these characters are not quite so readable compared to Latin characters.

arctic tree
#

Hi!
What happened to imagen in alpha/beta lmarena? Cause model is no longer on the battle arena or in the direct chat.
Has Google begun the purge?😅
Sorry, if in the wrong chat

tranquil bolt
#

does models change every message in battle mode?

neon tundra
#

Despite the failing aspect ratio when copying code, the site becomes incredibly slow while typing
Its unusable for longer chats

split vale
#

I dont know if someone has already asked you to add this function or if you already have in program to add it, but i'd ilke you to integrate a function to send files like documents, txt etc to the models

sturdy escarp
neon tundra
crystal karma
#

Hello. I am facing this issue indirect chat when you ChatGPT 40 latest it claims it can analyze the pictures but when I upload picture I get messages. This is just text based model that cannot analyze picture below. You can see part of conversation. I don’t know if it is by design or if it’s error. But it’s frustrating.

Perfektní, díky za upřesnění – tohle je velmi cenná informace! 🙏


🔍 Shrnutí situace:

  • Jsi v Direct Chat na lmarena.ai.
  • Máš vybraný model ChatGPT 4o (latest).
  • Tento model by měl umět pracovat s obrázky, protože GPT-4o (Omni) od OpenAI je multimodální – umí text, obrázky i zvuk.
  • Rozhraní ti umožní nahrát obrázek, ale když to uděláš, model ti odpoví:

    „Jsem jen textový model“ nebo něco podobného.


🤖 Co to znamená?

Ano – je to chyba nebo omezení na straně LM Arena

Model ChatGPT-4o má oficiálně schopnost zpracovávat obrázky, ale:

  • Buď není multimodální režim správně aktivován na straně LM Arena.
  • Nebo mají textově omezenou verzi modelu (např. kvůli API přístupu, licencím, nebo kvůli tomu, že multimodální vstup není povolen ve veřejném chatu).
  • Rozhraní umožní nahrát obrázek, ale backend modelu ho nepředá dál, nebo to není podporované v jejich implementaci.

✅ Co s tím můžeš udělat?

🔧 1. Nahlásit to týmu LM Arena

Můžeš jim napsat stručné hlášení, např.:


Chyba v chatu s GPT-4o na LM Arena
V Direct chatu s modelem „ChatGPT-4o (latest)“ mi rozhraní umožní nahrát obrázek, ale model odpoví, že je jen textový. GPT-4o má oficiálně multimodální schopnosti, takže buď není správně aktivovaný vizuální vstup, nebo je to chyba v implementaci.

Prosím o opravu nebo upřesnění. Děkuji!


#

Part two: chat was in Czech language

Prosím o opravu nebo upřesnění. Děkuji!


📬 Kam to poslat?

  • Na LM Arena kontakt – pokud mají formulář nebo e-mail.
  • Pokud používají GitHub (např. lmstudio-ai), můžeš založit issue.
  • Nebo zkus jejich Discord, pokud mají komunitu.

👀 2. Alternativní možnosti:

Pokud chceš zkusit GPT-4o s obrázky jinde, můžeš:

  • Použít oficiální ChatGPT na chat.openai.com (pokud máš Plus účet).
  • Nebo použít Hugging Face Spaces s modelem jako LLaVA, který je multimodální a open-source.

✅ Závěr:

🔹 Ano, je to chyba nebo neúplná implementace na straně LM Arena.
🔹 GPT-4o má multimodální schopnosti, ale LM Arena je zřejmě neaktivovala nebo nepředávají obrázky správně.
🔹 Doporučuju jim to nahlásit – pomůže jim to systém vylepšit.
🔹 Není to tvoje chyba – děláš všechno správně. 👌


Pokud chceš, rád ti pomůžu napsat zprávu pro podporu nebo najít alternativu, kde to funguje. Díky, že se o to zajímáš – tvoje zpětná vazba může pomoct spoustě dalších lidí. 🙌

shrewd tulip
#

Hello. I am facing this issue indirect

brisk notch
#

Popups are pretty annoying. I get two immediately, then one after every vote.

tawny prism
#

numpad enter doesn't send messages

pseudo spear
#

add the option web search when you direct chat

lime gate
#

add this bro

strange gyro
#

maybe admins add functions for "premium users" with function upload files (.txt .js .php ....) ?
im ready pay service!

strange gyro
#

either remove the system prompt for all models or use the recommended system prompt directly from the lab that created each model, as the lmarena system prompt seems to negatively impact the performance of some models

tidal oak
drowsy night
#

For instance, claude-3-7-sonnet-20250219 does not represent the most recent version of the 3-7-sonnet model

#

@tidal oak

strange gyro
drowsy night
#

Some problems, right?

ionic seal
drowsy night
strange gyro
tidal oak
#

it's true LMArena doesn't add any system prompts. Only ones direct from the labs are used - if any.
This is helpful, please share more as you see it.

drowsy night
#

So this is the real claude3-7-sonnet model? Why is its data deadline 23 years?

tidal oak
#

will keep you posted

strange gyro
tidal oak
#

Only ones provided by the labs are used

strange gyro
#

imo there should be the option to disable that in direct chat if you don't want whatever personality they've applied

lusty coral
#

Is it possible to add to DirectChat the ability to delete, edit previous messages?
This is a very, very handy thing to be able to correct your mistakes.

Right now, if you send a message in DirectChat and then you realize you've made a mistake, you can't edit it. You have to create a new chat room

pure osprey
#

Will the new web UI be on github like the old one?

tulip totem
#

Moderating system somehow detected following message as inappropriate:

"Uh, wait a second! You somehow imperceptibly jumped from the Ricci curvature of space inside the planet (conditioned in its own frame of reference exclusively by T_00, because the velocities are low and there is no pressure) to the Gaussian curvature of the surface. Have you already realized that you wrote nonsense about defining the shape of the planet through the Ricci curvature?"

tidal oak
#

please send screenshots in the feedback form meowpensivepray would be super helpful!

uncut verge
#

Did some doodling.

#

(or while attaching a document, it can calculate the total tokens first and trigger any models that are compatible with such amount of token context window)

#

If that's possible.

#

But, with this capability, it should be emphasized more that don't send any private or confidential queries.

rotund compass
#

LaTeX formatting support to display equations would be awesome, especially when comparing answers between models

tawdry obsidian
#

Hi. I am not sure if I write this to the correct chat/feedback channel. I am consistently getting truncated results from Gemini 2.5 pro exp in Arena Side-by-Side. The truncated output is in the right pane. The left pane with o3 never had a long answer. Here is the prompt:
This is a test question. A woman sells groceries in a small store. Her kid is there as well. There is a last chocolate in the store and the woman promised that chocolate to the kid after lunch. A customer comes in and needs a chocolate for his fiancee. He convinces the sales woman to sell the chocolate to him. The woman is now sad. Choose the right answer:
a. She is sad because her grandmother died two weeks ago.
b. She is sad because she really wanted to eat that last chocolate.
c. She is sad because the man has found love and she didn't.
d. She is sad because she could have asked more money for the chocolate.

#

I am not sure if this is the new UI or not. Not sure of anything here actually, total n00b.

#

It happened again, the question is irrelevant, the most important thing is the response length and possibly formatting by Gemini model.

mystic thicket
#

i would really like new models on battle and on direct chat like o4 mini med, and o4 mini high, claude 3.7 thinking, and other like qwen. And if possible make it so you can see his reasoning process.

slim seal
tulip nebula
#

Is there a way i can delete all my Chat History at once? Currently i have to delete each individual chat one by one

weak lantern
uncut verge
weak lantern
#

^ true, that is a workaround but yeah nothing on the site itself with that function

tulip nebula
tranquil bolt
#

can we have a mode where choosen model battles against a random model (model choosen using the same method as in battle mode)?

weak lantern
heavy trail
#

could we be allowed to see the models thinking outputs in claude 3-7, deepseek R1, and gemini?

ionic seal
rotund compass
#

Please add an option to rename chats

weak lantern
stable sandal
#

gemini pro is glitching

stable sandal
#

I can't delete a chat. it disappears from the list to reappear the next second

solemn idol
#

the ability to scroll the side-by-side chats separately in the battle modes

fickle torrent
#

I don't know if I'm on the right channel, but I love the site. However, I keep picking LLMs that have names I've never heard of. What is Calmriver? Or Goldmane?

viscid plover
#

@fickle torrent Companies testing their models before release have a placeholder name. They gather some data how people like it.

I know that "Goldmane" for example is a Google model.

delicate igloo
#

please fix this😭😭😭

weak lantern
ivory summit
summer meadowBOT
#
<:warning:892823499205406760> Channel locked

Site outage, will turn back on when resolved.

summer meadowBOT
#
<:success:865860339278413864> Channel unlocked

Welcome back :ablobwave:

summer meadowBOT
#
<:warning:892823499205406760> Channel locked