#llms-you-cant-please-them-all
1 messages Β· Page 1 of 1 (latest)
Can some tell what is the approach for this competition
You need to please them all
I think it is quite the opposite
You need to maximize the variance
b/w scores given by different LLMs
correct me if I am wrong
And we need to generate the essays with llms or?
it is up to you
anyone interested in teaming up?
Wanna talk dm? My experience is mostly with medical images and time series and small experience with llm fine-tuning and rag system
sure
Am confused on submissions, my initial approach was to use prompt engineering and generate essays, but during submission it says notebook internet should be turned off. how to proceed? can someone guide me here? can we use LLM or not?
is this more like a red-team kinda of a challenge instead of a dev / tune your model for a specific goal type? There are couples of paper ref in the overview, would reading them (and the last attack llm challenge write up) be helpful?
you can use LLM or not, but only those that you can attach to your notebook (i.e. those that are uploaded on kaggle).
So this includes only open-source LLMs
Llama, Qwen, Mistral, Deepseek, etc are fine
Claude, Gemini, Chatgpt, Grok, etc are not
Yes sure, you would be given essay topics, you have to write an essay about the given topics. Now those essays would be judged by 3 different LLMs. Your goal is to maximize disagreement b/w LLMs. That is, maximize the variance b/w score given by the AI.
So basically we have to do reverse engineering to figure out how each llm scores essay (on what basis) right?
There are some strategies discussed in the forum for this competition over here:
https://www.kaggle.com/competitions/llms-you-cant-please-them-all/discussion
I might have a silly question. I read "It is expected that submissions will take multiple hours to be evaluated." If it says "submission running" in the submissions tab, does it mean that it is successfully submitted and it is evaluating, or waiting to be evaluated as expected?
It means it was submitted. It may still timeout or error but it is either queued to be evaluated or being evaluated. Takes a couple hours for each submission.
I am new and trying to attempt my first competition. I want to take the first step. Would using AI to write the essays violate the rules? Thank you.
It won't, give it a try!
That is great! Thank you for your prompt assistance.
Did anyone get a score of 10+ who can share some tips on how to get there? Quality of the essay, the different tone is not helping!
This competition is so interesting! I recently saw a demo by Mystic AI where agents interact with each other to accomplish a task.
To those who seek guidance on how to approach this competition, this paper might give some ideas on how to proceed. >> "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods": https://arxiv.org/html/2412.05579v1#S3
Hello,
I am new and need some assistance. I cannot click the submit button as it is disabled even though I changed the setting to "Internet off." Moreover, the "Submit to Competition" option keeps saying it cannot find a notebook connected to the competition. I created several new notebooks, but none allowed me to submit them. I also cannot use the File Upload option. Could someone please assist me?
Thank you.
did you save and run?
Thank you for your response. Yes, I saved the version and ran it.
is it still not working?
Thank you for your kind response. Unfortunately, no. It keeps asking me to connect a notebook to the competition, so I created many new NoteBooks and attempted to submit them multiple times today. I am exhausted, so I will try again tomorrow.
You can check out some example submissions
Thank you for your advice.
Is this competition hard guys? It's seems very interesting but it's not straight forward ya know
Hello, I'm new to "add-on"s on kaggle. How can I include LLM models listed on the competition page in my notebook? Thanks!
on inputs, choose add input, then search the model you want to add
Thanks! I've got immediate access to Gemma, but saw this notification for llama - does it mean that I have to submit separate request via Meta's website?
yes it is pretty easy and fast
would take ~10 mins
you would immediately get access
Thanks!
(At least that is what happened for me)
Hello, I have a question about the evaluation metric. It seems to prefer the essays that can get more different scores from the three judges. Is that right? But what is the purpose of this? Isn't consistent scoring scross judges is more desired in the quality of assessment?
From the Competition description
your goal will be to submit an essay that maximizes disagreement between the LLM judges
The purpose is to see how "exploitable" llms are, they are trying to mitigate this with a judge system, which has its own weaknesses.
Take a chance to read through the competition description and evaluation for more details.
Finally, I got the points. This is incredibly challenging π, but it is also kind of fun. π€©
I am interested. Here is my profile https://www.linkedin.com/in/ppujari let me know
I read your linked in, we would be happy to have you, but disclaimer, none of our team mates are as qualified as you π
that is fine with me
I just trying gemma2 model to generate essays now
very low score
Hi can I DM you?
@left depot
I am trying thru prompt alteration only. Is there any other way?
We had tried a variety of methods, with 2 methods performing close to 12, but a public notebook has achieved 12.8[edit: best score changed to 13], you can check that out
where is the note book?
After submit takes hours to get score, is it expected?
yes it is normal
https://www.kaggle.com/competitions/llms-you-cant-please-them-all/code
some people have graciously made some well performing notebooks public
I read your code, some good ideas. Thinking how to improve score further
We do not know what are those 5 LLMs and their content cutoff date??
my code??
No we do not know the judges
The code I shared are public solutions/utilities, I do not think our team had made any code public.
Yes pleaase
yeah, This is the I found very close get some ideas.
OK, going thru few papers on LLM-judge, see if i can get any clue. This is very interesting problem.
LLMs are always nice and accommodating π€ ...
Pit them off against each other with different premises, perhaps?
At least pit different ones off, and ask them "Yay' or "Nay", of a Premise
Hey!! Still looking for teammates? I would love to team up. I tried an o1 like LLM (an small version of the Qwen/QwQ model) but got a low score, exploring other options
hey, i'd be interested as well
@shut rock I have sent you a DM!
generating the essay with prompts max score of ~ 5. what is the next should I try? Any help is hihhly appreciated.
I am too late, is there still anyone who would like to team up?π
How many days remaining
one month
Hello
I have started a little late, i have submitted my notebook and itβs been 10 hours, itβs still running. Is it normal? As the guidelines says running time should be less than 9hrs. Let me know what would it end up on?
it takes 3 - 4 hrs to score, your net time can be upto 9 hours + 4 hours
Oh thanks, got the score of 3.5 for the first try. Better than I thought for a sample submission. Will try harder and smarter next time.
Hey, i have been trying to solve the problem from different approaches but my score did not increase, kind of i am stuck, at 5.1 something, is there anyone who would like to help please?
Could anyone recommend some papers that might be relevant to this competition, I just started yesterday and don't know where to begin my research. Thanks!
I had a bronze medal rating on the public score, but forgot to choose my best 2 submissions, and now I am no longer on the medal section for the private leaderboard. Do I get any kudos or not?
did your best submissions not get autoselected?
I guess but apparently they did not fair as well with the other 70% of the data. π
Qafig sir can u pls teach me cp π₯Ίπ
