#llm-20-questions | Kaggle | Page 1

nimble leaf May 18, 2024, 9:24 PM

#

Is anyone looking to collaborate on this challenge? I am really new to tuning llms and I welcome anyone who is interested.

neon bloom May 20, 2024, 12:48 AM

#

Do anyone here understand how this works? Like what is the entrypoint here? How are the agents selected and called. What is the interface we need to implement as a minimum?

#

How does the environment know its a question or an guess?

neon bloom May 20, 2024, 2:31 AM

#

Found some information hidden away in the submission modal. Basically the last function of your main.py file should take an observation and return the response.

tropic pecan May 20, 2024, 8:52 PM

#

nimble leaf Is anyone looking to collaborate on this challenge? I am really new to tuning ll...

hi i want to collaborate

wicked cosmos May 23, 2024, 5:57 PM

#

Hi, this is my first kaggle competition, and I am looking for a team. I am not experienced with LLMs and tuning them, if anyone is interested in partnering up with me i would really appreciate it

brazen swan May 24, 2024, 4:07 PM

#

nimble leaf Is anyone looking to collaborate on this challenge? I am really new to tuning ll...

hello! i would like to collaborate

final slate May 27, 2024, 2:13 AM

#

Anyone looking to collaborate with me?

harsh pelican May 27, 2024, 6:59 AM

#

hello, i want to collaborate, too.

#

i have some experience in agent development

brisk hearth May 27, 2024, 11:40 AM

#

neon bloom Do anyone here understand how this works? Like what is the entrypoint here? How ...

have you checked the starter notebook it shows how to set up the agent.

atomic imp May 27, 2024, 5:50 PM

#

Are we supposed to fine tune the llms or it is just prompt engineering?

vague perch May 28, 2024, 12:29 AM

#

atomic imp Are we supposed to fine tune the llms or it is just prompt engineering?

im reasonably confident that both are acceptable. right now i am doing prompt engineering, but i think i will try and switch models (currently on gemma/7b-it-quant/v2 ) after i figure out why i am getting a "Validation Episode failed" for my submission with the starter notebook

flat acorn Jun 4, 2024, 8:21 PM

#

Can we use any llm? Like llama would work as well?
And what about paid ones like chatgpt through the api.
We are not just limited to any specific llm right

primal wraith Jun 9, 2024, 2:54 PM

#

flat acorn Can we use any llm? Like llama would work as well? And what about paid ones like...

i got llama to work, i dont see anything against it in the rules

lucid laurel Jun 12, 2024, 11:36 AM

#

flat acorn Can we use any llm? Like llama would work as well? And what about paid ones like...

I think you can use any as far as your Api key has enough credits to compete with other agents throughout the competition

abstract merlin Jun 13, 2024, 4:33 AM

#

Can anyone explain basic structure of code when it goes in a competition how does it get input question and how does it send a output

#

I am new so just trying to figure the basic stuff

soft moss Jun 13, 2024, 12:04 PM

#

tough comp... I was thinking of exploring RLHF as a way to fine tune one of the acceptable base models but I have also been exploring papers on implementing deterministic logic in LLMs. Looks like an active research area. Papers that got me thinking were https://arxiv.org/abs/2309.04658 and https://arxiv.org/abs/2310.18940 which use the Werewolf game. So far I have worked through prompt engineering, probabilistic keyword guesses, and tuning model parameters (temp, top P, etc.). Nothing in that space seems to make a huge difference. I have a public notebook for testing these techniques that you all could fork if you want to. Be aware, submission errors are present in some versions. V12 is functional. https://www.kaggle.com/code/matthewsfarmer/20q-gemma

arXiv.org

Exploring Large Language Models for Communication Games: An Empiric...

Communication games, which we refer to as incomplete information games that heavily depend on natural language communication, hold significant research value in fields such as economics, social science, and artificial intelligence. In this work, we explore the problem of how to engage large language models (LLMs) in communication games, and in r...

arXiv.org

Language Agents with Reinforcement Learning for Strategic Play in t...

Agents built with large language models (LLMs) have shown great potential across a wide range of domains. However, in complex decision-making tasks, pure LLM-based agents tend to exhibit intrinsic bias in their choice of actions, which is inherited from the model's training data and results in suboptimal performance. To develop strategic languag...

20q Gemma

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

#

One consistent problem I keep facing is that my answerer agent says "no" even though the answer should be an obvious "yes".

pale heart Jun 14, 2024, 9:00 AM

#

Thanks for those links Matthew, interesting reads. Your last link to 20q Gemma doesn't lead anywhere for me tho

#

ChatGPT is pretty good at asking questions btw``` GPT: Is it a person, place, or thing?
ME : yes
GPT:Alright, let me rephrase that. Is it a person?
ME :yes
GPT:Is this person a real person?
ME :no
GPT:Is this person a character from a book?
ME :no
GPT:Is this person a character from a movie?
ME :yes
GPT:Is this character from a movie made in the last 20 years?
ME :yes
GPT:Is this character a male?
ME :yes
GPT:Is this character from a science fiction movie?
ME :yes
GPT:Is this character part of a movie franchise?
ME :yes
GPT:Is this character a superhero?
ME :no
GPT:Is this character a villain?
ME :no
GPT:Is this character a human?
ME :no
GPT:Is this character an alien?
ME :yes
GPT:Is this character from the "Star Wars" franchise?
ME :yes
GPT:Is this character part of the Rebel Alliance?
ME :yes
GPT:Is this character a droid?
ME :no
GPT:Is this character known for piloting a spaceship?
ME :no
GPT:Is this character known for their wisdom or mentorship?
ME :yes
GPT:Is this character Yoda?
ME :yes

GPT:Yes! Got it! Yoda it is. Thanks for playing! Do you want to play another round?```

#

I haven't checked answering questions yet

primal wraith Jun 14, 2024, 10:14 AM

#

pale heart ChatGPT is pretty good at asking questions btw``` GPT: Is it a person, place, or...

Yea, unfortunately you can only really fit a 7b or maybe 13b model in the submissions

#

atleast without having any help, llama3 7b does not even get anywhere close..

warped parcel Jun 14, 2024, 7:46 PM

#

Hey folks, I am quite confused: with the current rules and setup, I understand the keywords are hidden from us and will be changed during the final evaluation phase after submission deadline, but can our notebooks still read the keywords.py ? cuz I see this keywords_list = json.loads(KEYWORDS_JSON) in the llm_20_questions.py provided

#

Currently with all words being visible, the leaderboard is basically a cheating ground with people using rule based approaches rather than actually relying on LLM reasonings... and not sure how this will play in final eval

#

@lucid laurel @flat acorn you cannot use chatpgot/openai APIs. The agents will be running offline in its own environment during final evaluation. Meaning no internet access.

balmy pivot Jun 16, 2024, 6:57 AM

#

As far as I get it, we will not have access to the set of possible keywords and this makes this competition really tough. Even I have quite reasonable agent with Gemma 2b I’m still unable to guess keywords. I think only a few agent from the leaderboard are really able to guess correctly from time to time. And I guess the two top performers (guys from nvidia) are among them.
BTW am really curious how you are able to use 7b or 13b models? With Gemma-7b-it-quant I have „out of memory” error

soft moss Jun 16, 2024, 10:59 AM

#

pale heart Thanks for those links Matthew, interesting reads. Your last link to 20q Gemma ...

It was set to private. My bad. Fixed!

soft moss Jun 16, 2024, 11:00 AM

#

balmy pivot As far as I get it, we will not have access to the set of possible keywords and ...

Are you trying to eval within the notebook? If so, you need to only load one agent and have three simple agents. If you are only running one, try reseting the notebook then rerunning and make sure you have GPU env running

soft moss Jun 16, 2024, 11:02 AM

#

warped parcel Currently with all words being visible, the leaderboard is basically a cheating ...

yeah, I've seen a few public notebooks directly reference the keywords. They should fall off the leaderboard when the full list is used.

soft moss Jun 16, 2024, 11:04 AM

#

pale heart ChatGPT is pretty good at asking questions btw``` GPT: Is it a person, place, or...

Interesting. I was using GPT4o through API and it stopped asking yes-or-no questions after the 4th question. What prompt did you use to constrain replies to "Is this...", did you alter any inference parameters?

balmy pivot Jun 16, 2024, 12:35 PM

#

soft moss Are you trying to eval within the notebook? If so, you need to only load one age...

I have just a single agent running. I get CUDA error: out of memory. But still Gemma-2b is doing good job. I just have to work a bit on the guessing process

pale heart Jun 17, 2024, 7:45 AM

#

soft moss Interesting. I was using GPT4o through API and it stopped asking yes-or-no quest...

I created a custom gpt in the ui and have this set as instructions: You are playing the game 20 Questions. Your role is the Questioner. You ask questions that can only be answered with yes or no. Start with broader questions and then, depending on the answers you get, narrow down the questions until you find the correct answer. Your goal is to guess the keyword within 20 questions that can be a person, place, or thing. Good questions to ask should start off broad and get more specific, and ideally, each question should eliminate a large number of options until you get to the final few 20 Questions game questions. Here are some example questions for each category:

Animals:

Is it a mammal/bird/fish/reptile?
Is it big/small?
Can it fly?
Does it eat meat/grass/bugs?
Does it live in Africa/North America/Asia?

People:

Are they a real person?
Are they a man/woman?
Have I met them?
Are they alive?
Are they a movie star/athlete/politician/etc?

Places:

Is it famous?
Have I been to it?
Would I need a plane to reach it?
Is it a hot/cold/wet/dry place?
Is it often crowded?

Food:

Is it a raw food/cooked dish?
Is it a fruit/vegetable/meat?
Is it sweet/sour/savory?
Have I eaten it before?
Do I like it?

Objects:

Is there one in this room?
Is it light enough to pick up?
Is it rare/common?
Would it be expensive to buy?
Is it used more for fun/work?

Keep the tone humorous and light-hearted, making the game enjoyable and fun.

pure idol Jun 17, 2024, 9:13 AM

#

pale heart I created a custom gpt in the ui and have this set as instructions: You are play...

Would you mind making the code public, and sharing the same

pale heart Jun 17, 2024, 11:04 AM

#

pure idol Would you mind making the code public, and sharing the same

I don't have any code, those are just the instructions that I pasted into chatgpt

pure idol Jun 17, 2024, 11:35 AM

#

Oh ok my bad, I thought you used their api to create stuffs and created your own ui

pale heart Jun 17, 2024, 1:20 PM

#

pure idol Oh ok my bad, I thought you used their api to create stuffs and created your own...

nah, the 'customize' is just an option in the offical chatgpt ui

soft moss Jun 27, 2024, 2:02 PM

#

In my testing, it seems that response parsing can be a huge limitation for a competitive agent. Here's an example:

#

#

I have an eval notebook where I added debug prints to each step in the text generation process.

#

The agents like to add extra text even when prompted to not do so, and if given specific commands to output with "double asterisks" it spits out lots of them. Or just repeats the command in double asterisks... pure sillyness

#

#

if you are seeing outputs like this in your validation or replays, take some time to check parsing logic and the text generation.

low lark Jul 1, 2024, 7:52 AM

#

what on earth is going on during this episode?

https://www.kaggle.com/competitions/llm-20-questions/leaderboard?dialog=episodes-episode-55228932

Just good luck?

LLM 20 Questions

Guess the secret word in this cooperative game of question asking and answering

whole lynx Jul 1, 2024, 2:22 PM

#

Any suggestion what to do if the validation episode errors out without actually outputting an error in the logs? My agent 0 logs just has an opening bracket "[" ...

fading slate Jul 2, 2024, 5:35 PM

#

hey guys, do you know if there are some models we can use or is it allowed to use anything with an api key?

soft moss Jul 2, 2024, 9:31 PM

#

if you can load the model weights into the env and fit it into the GPU memory of 1 T4, it is fair game. the kaggle env does not have internet access so all model wieghts must be loaded.

#

you can use the code below to download the model, then make sure to wrap that tmp folder into your submission:

from huggingface_hub import snapshot_download
from pathlib import Path
import shutil

model_path = Path("/tmp/submission/")
if model_path.exists():
    shutil.rmtree(model_path)
model_path.mkdir(parents=True)

snapshot_download(
    repo_id="repo-id",
    local_dir=model_path
)

soft moss Jul 3, 2024, 3:51 PM

#

whole lynx Any suggestion what to do if the validation episode errors out without actually ...

check out the replay to see if the agents were called and if they were active. this means that there was no response from the agent when it was called, but there was no error in your script (libraries, syntax, etc).

soft moss Jul 3, 2024, 3:52 PM

#

low lark what on earth is going on during this episode? https://www.kaggle.com/competiti...

setting a hard coded keyword guess as 'france' is my hunch on why it was won

tranquil laurel Jul 5, 2024, 2:11 PM

#

#

What does Err mean and why did i get huge decreases in point (because of it?)

#

What does the [1st] mean? It sometimes can be [3rd] or other things too

tranquil laurel Jul 5, 2024, 4:26 PM

#

This Err for example, doesn't give a huge decrease in point

pale heart Jul 8, 2024, 7:20 PM

#

what kind of resources do you need to run the gemma 7b model? I run out of memory when I try to let 2 agents play against another. I also tried fine tuning, and that ran out of memory. I also tried fine tuning it on google colab's A100, and that ran out of memory as well..

soft moss Jul 8, 2024, 8:16 PM

#

tranquil laurel This Err for example, doesn't give a huge decrease in point

You'll need to load the quantized version. The 7b model, when fully loaded on the GPU takes up a little over 17 gb. The quantized version (like what is loaded into the starter notebook) is a little under 9.

#

If you want a model without quantization, you might try with 2B.

#

If you're going that small though, I'd opt for Phi 3 mini or Qwen 1.5B Instruct

sage iron Jul 10, 2024, 11:58 AM

#

Why are most top scoring notebooks old? The corresponding teams stop submitting because their agents seems good enough. Is this advisable?

soft moss Jul 10, 2024, 1:09 PM

#

sage iron Why are most top scoring notebooks old? The corresponding teams stop submitting ...

Why change what isn't broken? They probably had a strong or overfit submission early in the competition and will update if they start dropping down.

#

I think the best practice is to upload your submission after internal testing. If your bot can win a game of 20 questions against itself a few times and the question/guesses seem to be logical, you should submit. It may not rise to the top of the leaderboard but the overfit models will drop once the secret keyword list is used.

sage iron Jul 10, 2024, 3:00 PM

#

Seriously, what the hell is this -194

#

Some submissions get penalized for no reason.

#

I assume that top scores remain high because they never meet some type of opponents

tranquil laurel Jul 10, 2024, 4:04 PM

#

sage iron Seriously, what the hell is this -194

Maybe your agent ran out of memory bcs your teammate gives lengthy questions (which results in err for your agent) hence - for you, + for 3 others

sage iron Jul 10, 2024, 4:41 PM

#

tranquil laurel Maybe your agent ran out of memory bcs your teammate gives lengthy questions (wh...

Hmm, the problem is that we cannot avoid such teammates, weird. Moreover, the answer should be yes or no. Why would it need time. Does the fake teammate intentionally ask my agent to spend time before answering😕

tranquil laurel Jul 10, 2024, 4:45 PM

#

sage iron Hmm, the problem is that we cannot avoid such teammates, weird. Moreover, the an...

You should adjust your own agent instead, simplify your prompts if you can and truncate the questioner prompt if more than 100 characters for example. (My agent does this too)

soft moss Jul 10, 2024, 4:58 PM

#

tranquil laurel Maybe your agent ran out of memory bcs your teammate gives lengthy questions (wh...

I would test your model in your kaggle notebook by loading the model onto cuda:0. If the GPU memory is 14.* / 15 you may want to load a quantized version.

sage iron Jul 13, 2024, 6:01 PM

#

It seems that the competition is getting interesting! Saw some newly submitted notebooks scoring high, overtaking the top one submitted a month ago

sweet jewel Jul 13, 2024, 9:02 PM

#

Hey all quick question. The rules say "Guesses are limited to 100 characters". But looking at llm_20_questions.py it seems the response in the turn = guess should only include "the <keyword>" ? And not long multi word responses like "My guess is that the keyword is..." ?

austere belfry Jul 15, 2024, 2:38 AM

#

So many yes/no bots 😢

balmy pivot Jul 15, 2024, 10:02 AM

#

austere belfry So many yes/no bots 😢

What do you mean by yes/no bots?

soft moss Jul 15, 2024, 1:29 PM

#

balmy pivot What do you mean by yes/no bots?

I think they are referring to the answerer agents that don't call an llm for text generation but instead just answer yes or no all the time regardless of the keyword or question.

balmy pivot Jul 15, 2024, 1:33 PM

#

Oh, I see 🙂 generally I think there will much fun after closing the submissions. In my opinions „things” category is extremely hard to guess and I guess max 5% of bots will compete if not less. I did quite an effort to make good bots (without referring to the public keyword list at all) but was able to guess maybe 3-4 times during the entire competition. And it’s been always a place. Let’s what happens. I don’t care at all about the current LeaderBoard. It’s importance is zero now in my opinion

soft moss Jul 15, 2024, 2:29 PM

#

balmy pivot Oh, I see 🙂 generally I think there will much fun after closing the submissions...

Right! Without a ground-truth benchmark, our only true agent evaluation is self vs. self. The leaderboard seems to be overfit with keyword bots.

sage iron Jul 16, 2024, 2:27 PM

#

On the hidden test set, will all bots be initialized with 600 as reward so any bot can face any bot?

wintry coral Jul 17, 2024, 3:32 PM

#

Has anyone been able to use a more recent version of transformers? I'm trying to use Gemma2 and even though I'm installing transformers 4.42.4 to the submission libraries I get a validation error saying that there is no Gemma2ForCausalLM to import from transformers

quick orbit Jul 23, 2024, 1:35 PM

#

wintry coral Has anyone been able to use a more recent version of transformers? I'm trying to...

I've been working on gemma2 for a few days but whatever I do I got the same validation error. I suppose it is related to kaggle environment. Now moved to llama3

final slate Jul 23, 2024, 4:03 PM

#

gemma 2 is not supported in kaggle docker file/enviroment yet

sage iron Aug 3, 2024, 7:40 PM

#

Competition getting more interesting?