#llms-you-cant-please-them-all

1 messages Β· Page 1 of 1 (latest)

twilit creek
quaint rain
#

Can some tell what is the approach for this competition

twilit creek
#

You need to please them all

graceful perch
#

You need to maximize the variance

graceful perch
#

correct me if I am wrong

quaint rain
#

And we need to generate the essays with llms or?

graceful perch
#

anyone interested in teaming up?

quaint rain
potent radish
#

Am confused on submissions, my initial approach was to use prompt engineering and generate essays, but during submission it says notebook internet should be turned off. how to proceed? can someone guide me here? can we use LLM or not?

rotund talon
#

is this more like a red-team kinda of a challenge instead of a dev / tune your model for a specific goal type? There are couples of paper ref in the overview, would reading them (and the last attack llm challenge write up) be helpful?

sterile siren
#

So this includes only open-source LLMs

#

Llama, Qwen, Mistral, Deepseek, etc are fine

#

Claude, Gemini, Chatgpt, Grok, etc are not

sonic condor
#

Hello new here

#

can anyone guide me about this competition

graceful perch
# sonic condor can anyone guide me about this competition

Yes sure, you would be given essay topics, you have to write an essay about the given topics. Now those essays would be judged by 3 different LLMs. Your goal is to maximize disagreement b/w LLMs. That is, maximize the variance b/w score given by the AI.

sonic condor
graceful perch
strong spade
#

I might have a silly question. I read "It is expected that submissions will take multiple hours to be evaluated." If it says "submission running" in the submissions tab, does it mean that it is successfully submitted and it is evaluating, or waiting to be evaluated as expected?

woeful night
orchid canopy
#

I am new and trying to attempt my first competition. I want to take the first step. Would using AI to write the essays violate the rules? Thank you.

orchid canopy
mossy lynx
#

Did anyone get a score of 10+ who can share some tips on how to get there? Quality of the essay, the different tone is not helping!

sacred patrol
#

This competition is so interesting! I recently saw a demo by Mystic AI where agents interact with each other to accomplish a task.

To those who seek guidance on how to approach this competition, this paper might give some ideas on how to proceed. >> "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods": https://arxiv.org/html/2412.05579v1#S3

orchid canopy
#

Hello,

I am new and need some assistance. I cannot click the submit button as it is disabled even though I changed the setting to "Internet off." Moreover, the "Submit to Competition" option keeps saying it cannot find a notebook connected to the competition. I created several new notebooks, but none allowed me to submit them. I also cannot use the File Upload option. Could someone please assist me?

Thank you.

orchid canopy
#

Thank you for your response. Yes, I saved the version and ran it.

graceful perch
orchid canopy
# graceful perch is it still not working?

Thank you for your kind response. Unfortunately, no. It keeps asking me to connect a notebook to the competition, so I created many new NoteBooks and attempted to submit them multiple times today. I am exhausted, so I will try again tomorrow.

graceful perch
orchid canopy
vestal mulch
#

Is this competition hard guys? It's seems very interesting but it's not straight forward ya know

thick widget
#

Hello, I'm new to "add-on"s on kaggle. How can I include LLM models listed on the competition page in my notebook? Thanks!

graceful perch
thick widget
graceful perch
#

would take ~10 mins

#

you would immediately get access

thick widget
#

Thanks!

graceful perch
thick widget
#

Hello, I have a question about the evaluation metric. It seems to prefer the essays that can get more different scores from the three judges. Is that right? But what is the purpose of this? Isn't consistent scoring scross judges is more desired in the quality of assessment?

woeful night
orchid canopy
left depot
graceful perch
left depot
#

I just trying gemma2 model to generate essays now

#

very low score

graceful perch
left depot
#

I am trying thru prompt alteration only. Is there any other way?

graceful perch
left depot
#

After submit takes hours to get score, is it expected?

graceful perch
graceful perch
left depot
#

I read your code, some good ideas. Thinking how to improve score further

#

We do not know what are those 5 LLMs and their content cutoff date??

graceful perch
graceful perch
left depot
left depot
left depot
dreamy oyster
#

LLMs are always nice and accommodating πŸ€” ...

#

Pit them off against each other with different premises, perhaps?

#

At least pit different ones off, and ask them "Yay' or "Nay", of a Premise

bleak locust
bleak locust
#

@shut rock I have sent you a DM!

left depot
#

generating the essay with prompts max score of ~ 5. what is the next should I try? Any help is hihhly appreciated.

plush berry
#

I am too late, is there still anyone who would like to team up?πŸ˜…

hazy scarab
plush berry
#

one month

opaque knot
#

Hello

crimson isle
#

I have started a little late, i have submitted my notebook and it’s been 10 hours, it’s still running. Is it normal? As the guidelines says running time should be less than 9hrs. Let me know what would it end up on?

graceful perch
crimson isle
plush berry
#

Hey, i have been trying to solve the problem from different approaches but my score did not increase, kind of i am stuck, at 5.1 something, is there anyone who would like to help please?

void crater
#

Could anyone recommend some papers that might be relevant to this competition, I just started yesterday and don't know where to begin my research. Thanks!

wicked kestrel
#

I had a bronze medal rating on the public score, but forgot to choose my best 2 submissions, and now I am no longer on the medal section for the private leaderboard. Do I get any kudos or not?

graceful perch
wicked kestrel
#

I guess but apparently they did not fair as well with the other 70% of the data. 😭

hazy scarab