#llms-you-cant-please-them-all | Kaggle | Page 1

twilit creek Dec 4, 2024, 12:30 AM

#

kaggle2

quaint rain Dec 4, 2024, 9:50 AM

#

Can some tell what is the approach for this competition

twilit creek Dec 4, 2024, 11:32 AM

#

You need to please them all

graceful perch Dec 4, 2024, 1:50 PM

#

twilit creek You need to please them all

I think it is quite the opposite

#

You need to maximize the variance

graceful perch Dec 4, 2024, 1:51 PM

#

graceful perch You need to maximize the variance

b/w scores given by different LLMs

#

correct me if I am wrong

quaint rain Dec 4, 2024, 2:08 PM

#

And we need to generate the essays with llms or?

graceful perch Dec 4, 2024, 2:37 PM

#

quaint rain And we need to generate the essays with llms or?

it is up to you

#

anyone interested in teaming up?

quaint rain Dec 4, 2024, 2:53 PM

#

graceful perch anyone interested in teaming up?

Wanna talk dm? My experience is mostly with medical images and time series and small experience with llm fine-tuning and rag system

graceful perch Dec 4, 2024, 2:53 PM

#

quaint rain Wanna talk dm? My experience is mostly with medical images and time series and s...

sure

potent radish Dec 4, 2024, 6:27 PM

#

Am confused on submissions, my initial approach was to use prompt engineering and generate essays, but during submission it says notebook internet should be turned off. how to proceed? can someone guide me here? can we use LLM or not?

rotund talon Dec 4, 2024, 7:12 PM

#

is this more like a red-team kinda of a challenge instead of a dev / tune your model for a specific goal type? There are couples of paper ref in the overview, would reading them (and the last attack llm challenge write up) be helpful?

sterile siren Dec 4, 2024, 7:24 PM

#

potent radish Am confused on submissions, my initial approach was to use prompt engineering an...

you can use LLM or not, but only those that you can attach to your notebook (i.e. those that are uploaded on kaggle).

#

So this includes only open-source LLMs

#

Llama, Qwen, Mistral, Deepseek, etc are fine

#

Claude, Gemini, Chatgpt, Grok, etc are not

sonic condor Dec 7, 2024, 5:58 AM

#

Hello new here

#

can anyone guide me about this competition

graceful perch Dec 7, 2024, 7:44 AM

#

sonic condor can anyone guide me about this competition

Yes sure, you would be given essay topics, you have to write an essay about the given topics. Now those essays would be judged by 3 different LLMs. Your goal is to maximize disagreement b/w LLMs. That is, maximize the variance b/w score given by the AI.

sonic condor Dec 7, 2024, 8:19 AM

#

graceful perch Yes sure, you would be given essay topics, you have to write an essay about the ...

So basically we have to do reverse engineering to figure out how each llm scores essay (on what basis) right?

graceful perch Dec 7, 2024, 8:21 AM

#

sonic condor So basically we have to do reverse engineering to figure out how each llm scores...

There are some strategies discussed in the forum for this competition over here:
https://www.kaggle.com/competitions/llms-you-cant-please-them-all/discussion

LLMs - You Can't Please Them All

Are LLM-judges robust to adversarial inputs?

strong spade Dec 7, 2024, 12:04 PM

#

I might have a silly question. I read "It is expected that submissions will take multiple hours to be evaluated." If it says "submission running" in the submissions tab, does it mean that it is successfully submitted and it is evaluating, or waiting to be evaluated as expected?

woeful night Dec 11, 2024, 3:10 AM

#

strong spade I might have a silly question. I read "It is expected that submissions will take...

It means it was submitted. It may still timeout or error but it is either queued to be evaluated or being evaluated. Takes a couple hours for each submission.

orchid canopy Dec 12, 2024, 1:30 AM

#

I am new and trying to attempt my first competition. I want to take the first step. Would using AI to write the essays violate the rules? Thank you.

woeful night Dec 12, 2024, 1:47 AM

#

orchid canopy I am new and trying to attempt my first competition. I want to take the first st...

It won't, give it a try!

orchid canopy Dec 12, 2024, 2:15 AM

#

woeful night It won't, give it a try!

That is great! Thank you for your prompt assistance.

mossy lynx Dec 13, 2024, 6:53 PM

#

Did anyone get a score of 10+ who can share some tips on how to get there? Quality of the essay, the different tone is not helping!

sacred patrol Dec 13, 2024, 10:56 PM

#

This competition is so interesting! I recently saw a demo by Mystic AI where agents interact with each other to accomplish a task.

To those who seek guidance on how to approach this competition, this paper might give some ideas on how to proceed. >> "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods": https://arxiv.org/html/2412.05579v1#S3

orchid canopy Dec 16, 2024, 12:39 AM

#

Hello,

I am new and need some assistance. I cannot click the submit button as it is disabled even though I changed the setting to "Internet off." Moreover, the "Submit to Competition" option keeps saying it cannot find a notebook connected to the competition. I created several new notebooks, but none allowed me to submit them. I also cannot use the File Upload option. Could someone please assist me?

Thank you.

graceful perch Dec 16, 2024, 4:00 AM

#

orchid canopy Hello, I am new and need some assistance. I cannot click the submit button as i...

did you save and run?

orchid canopy Dec 16, 2024, 4:15 AM

#

Thank you for your response. Yes, I saved the version and ran it.

graceful perch Dec 16, 2024, 4:25 AM

#

orchid canopy Thank you for your response. Yes, I saved the version and ran it.

is it still not working?

orchid canopy Dec 16, 2024, 4:56 AM

#

graceful perch is it still not working?

Thank you for your kind response. Unfortunately, no. It keeps asking me to connect a notebook to the competition, so I created many new NoteBooks and attempted to submit them multiple times today. I am exhausted, so I will try again tomorrow.

graceful perch Dec 16, 2024, 6:13 AM

#

orchid canopy Thank you for your kind response. Unfortunately, no. It keeps asking me to conne...

You can check out some example submissions

orchid canopy Dec 16, 2024, 6:17 AM

#

graceful perch You can check out some example submissions

Thank you for your advice.

vestal mulch Dec 16, 2024, 8:01 AM

#

Is this competition hard guys? It's seems very interesting but it's not straight forward ya know

thick widget Dec 16, 2024, 5:18 PM

#

Hello, I'm new to "add-on"s on kaggle. How can I include LLM models listed on the competition page in my notebook? Thanks!

graceful perch Dec 16, 2024, 5:43 PM

#

thick widget Hello, I'm new to "add-on"s on kaggle. How can I include LLM models listed on th...

on inputs, choose add input, then search the model you want to add

thick widget Dec 16, 2024, 5:52 PM

#

graceful perch on inputs, choose add input, then search the model you want to add

Thanks! I've got immediate access to Gemma, but saw this notification for llama - does it mean that I have to submit separate request via Meta's website?

graceful perch Dec 16, 2024, 5:52 PM

#

thick widget Thanks! I've got immediate access to Gemma, but saw this notification for llama ...

yes it is pretty easy and fast

#

would take ~10 mins

#

you would immediately get access

thick widget Dec 16, 2024, 5:53 PM

#

Thanks!

graceful perch Dec 16, 2024, 5:53 PM

#

graceful perch you would immediately get access

(At least that is what happened for me)

thick widget Dec 21, 2024, 3:44 PM

#

Hello, I have a question about the evaluation metric. It seems to prefer the essays that can get more different scores from the three judges. Is that right? But what is the purpose of this? Isn't consistent scoring scross judges is more desired in the quality of assessment?

woeful night Dec 21, 2024, 5:27 PM

#

thick widget Hello, I have a question about the evaluation metric. It seems to prefer the ess...

From the Competition description

your goal will be to submit an essay that maximizes disagreement between the LLM judges

The purpose is to see how "exploitable" llms are, they are trying to mitigate this with a judge system, which has its own weaknesses.

Take a chance to read through the competition description and evaluation for more details.

orchid canopy Dec 23, 2024, 5:41 AM

#

orchid canopy Hello, I am new and need some assistance. I cannot click the submit button as i...

Finally, I got the points. This is incredibly challenging 😖, but it is also kind of fun. 🤩

left depot Dec 26, 2024, 2:06 AM

#

graceful perch anyone interested in teaming up?

I am interested. Here is my profile https://www.linkedin.com/in/ppujari let me know

graceful perch Dec 26, 2024, 4:01 AM

#

left depot I am interested. Here is my profile https://www.linkedin.com/in/ppujari let me k...

I read your linked in, we would be happy to have you, but disclaimer, none of our team mates are as qualified as you 😅

left depot Dec 26, 2024, 4:47 AM

#

graceful perch I read your linked in, we would be happy to have you, but disclaimer, none of ou...

that is fine with me

#

I just trying gemma2 model to generate essays now

#

very low score

graceful perch Dec 26, 2024, 5:00 AM

#

left depot that is fine with me

Hi can I DM you?

left depot Dec 28, 2024, 11:50 PM

#

graceful perch Hi can I DM you?

@left depot

#

I am trying thru prompt alteration only. Is there any other way?

graceful perch Dec 29, 2024, 12:46 AM

#

left depot I am trying thru prompt alteration only. Is there any other way?

We had tried a variety of methods, with 2 methods performing close to 12, but a public notebook has achieved 12.8[edit: best score changed to 13], you can check that out

left depot Dec 30, 2024, 1:16 AM

#

graceful perch We had tried a variety of methods, with 2 methods performing close to 12, but a ...

where is the note book?

left depot Dec 30, 2024, 2:12 AM

#

After submit takes hours to get score, is it expected?

graceful perch Dec 30, 2024, 4:26 AM

#

left depot After submit takes hours to get score, is it expected?

yes it is normal

graceful perch Dec 30, 2024, 9:47 AM

#

left depot where is the note book?

https://www.kaggle.com/competitions/llms-you-cant-please-them-all/code

some people have graciously made some well performing notebooks public

LLMs - You Can't Please Them All

Are LLM-judges robust to adversarial inputs?

left depot Dec 31, 2024, 1:51 AM

#

I read your code, some good ideas. Thinking how to improve score further

#

We do not know what are those 5 LLMs and their content cutoff date??

graceful perch Dec 31, 2024, 3:38 AM

#

left depot I read your code, some good ideas. Thinking how to improve score further

my code??

graceful perch Dec 31, 2024, 3:39 AM

#

left depot We do not know what are those 5 LLMs and their content cutoff date??

No we do not know the judges

graceful perch Dec 31, 2024, 3:39 AM

#

left depot I read your code, some good ideas. Thinking how to improve score further

The code I shared are public solutions/utilities, I do not think our team had made any code public.

left depot Jan 2, 2025, 11:11 PM

#

graceful perch Hi can I DM you?

Yes pleaase

left depot Jan 3, 2025, 12:33 AM

#

sacred patrol This competition is so interesting! I recently saw a demo by Mystic AI where age...

yeah, This is the I found very close get some ideas.

left depot Jan 3, 2025, 5:53 PM

#

graceful perch The code I shared are public solutions/utilities, I do not think our team had ma...

OK, going thru few papers on LLM-judge, see if i can get any clue. This is very interesting problem.

dreamy oyster Jan 9, 2025, 5:43 PM

#

LLMs are always nice and accommodating 🤔 ...

#

Pit them off against each other with different premises, perhaps?

#

At least pit different ones off, and ask them "Yay' or "Nay", of a Premise

bleak locust Jan 11, 2025, 6:09 PM

#

graceful perch anyone interested in teaming up?

Hey!! Still looking for teammates? I would love to team up. I tried an o1 like LLM (an small version of the Qwen/QwQ model) but got a low score, exploring other options

shut rock Jan 14, 2025, 5:21 PM

#

bleak locust Hey!! Still looking for teammates? I would love to team up. I tried an o1 like L...

hey, i'd be interested as well

bleak locust Jan 14, 2025, 7:42 PM

#

@shut rock I have sent you a DM!

left depot Jan 19, 2025, 8:18 PM

#

generating the essay with prompts max score of ~ 5. what is the next should I try? Any help is hihhly appreciated.

plush berry Feb 6, 2025, 1:10 PM

#

I am too late, is there still anyone who would like to team up?😅

hazy scarab Feb 6, 2025, 1:16 PM

#

plush berry I am too late, is there still anyone who would like to team up?😅

How many days remaining

plush berry Feb 6, 2025, 1:16 PM

#

one month

opaque knot Feb 6, 2025, 2:37 PM

#

Hello

viscid bluff Feb 9, 2025, 8:22 PM

#

plush berry I am too late, is there still anyone who would like to team up?😅

Yea

crimson isle Feb 13, 2025, 7:33 AM

#

I have started a little late, i have submitted my notebook and it’s been 10 hours, it’s still running. Is it normal? As the guidelines says running time should be less than 9hrs. Let me know what would it end up on?

graceful perch Feb 13, 2025, 9:57 AM

#

crimson isle I have started a little late, i have submitted my notebook and it’s been 10 hour...

it takes 3 - 4 hrs to score, your net time can be upto 9 hours + 4 hours

crimson isle Feb 13, 2025, 10:07 AM

#

graceful perch it takes 3 - 4 hrs to score, your net time can be upto 9 hours + 4 hours

Oh thanks, got the score of 3.5 for the first try. Better than I thought for a sample submission. Will try harder and smarter next time.

plush berry Feb 13, 2025, 10:47 AM

#

Hey, i have been trying to solve the problem from different approaches but my score did not increase, kind of i am stuck, at 5.1 something, is there anyone who would like to help please?

void crater Feb 20, 2025, 9:11 PM

#

Could anyone recommend some papers that might be relevant to this competition, I just started yesterday and don't know where to begin my research. Thanks!

wicked kestrel Mar 5, 2025, 5:03 AM

#

I had a bronze medal rating on the public score, but forgot to choose my best 2 submissions, and now I am no longer on the medal section for the private leaderboard. Do I get any kudos or not?

graceful perch Mar 5, 2025, 5:09 AM

#

wicked kestrel I had a bronze medal rating on the public score, but forgot to choose my best 2 ...

did your best submissions not get autoselected?

wicked kestrel Mar 5, 2025, 5:27 AM

#

I guess but apparently they did not fair as well with the other 70% of the data. 😭

hazy scarab Mar 6, 2025, 8:58 AM

#

graceful perch did your best submissions not get autoselected?

Qafig sir can u pls teach me cp 🥺🙏