#llm-20-questions

1 messages · Page 1 of 1 (latest)

nimble leaf
#

Is anyone looking to collaborate on this challenge? I am really new to tuning llms and I welcome anyone who is interested.

neon bloom
#

Do anyone here understand how this works? Like what is the entrypoint here? How are the agents selected and called. What is the interface we need to implement as a minimum?

#

How does the environment know its a question or an guess?

neon bloom
#

Found some information hidden away in the submission modal. Basically the last function of your main.py file should take an observation and return the response.

wicked cosmos
#

Hi, this is my first kaggle competition, and I am looking for a team. I am not experienced with LLMs and tuning them, if anyone is interested in partnering up with me i would really appreciate it

brazen swan
final slate
#

Anyone looking to collaborate with me?

harsh pelican
#

hello, i want to collaborate, too.

#

i have some experience in agent development

brisk hearth
atomic imp
#

Are we supposed to fine tune the llms or it is just prompt engineering?

vague perch
flat acorn
#

Can we use any llm? Like llama would work as well?
And what about paid ones like chatgpt through the api.
We are not just limited to any specific llm right

primal wraith
lucid laurel
abstract merlin
#

Can anyone explain basic structure of code when it goes in a competition how does it get input question and how does it send a output

#

I am new so just trying to figure the basic stuff

soft moss
#

tough comp... I was thinking of exploring RLHF as a way to fine tune one of the acceptable base models but I have also been exploring papers on implementing deterministic logic in LLMs. Looks like an active research area. Papers that got me thinking were https://arxiv.org/abs/2309.04658 and https://arxiv.org/abs/2310.18940 which use the Werewolf game. So far I have worked through prompt engineering, probabilistic keyword guesses, and tuning model parameters (temp, top P, etc.). Nothing in that space seems to make a huge difference. I have a public notebook for testing these techniques that you all could fork if you want to. Be aware, submission errors are present in some versions. V12 is functional. https://www.kaggle.com/code/matthewsfarmer/20q-gemma

#

One consistent problem I keep facing is that my answerer agent says "no" even though the answer should be an obvious "yes".

pale heart
#

Thanks for those links Matthew, interesting reads. Your last link to 20q Gemma doesn't lead anywhere for me tho

#

ChatGPT is pretty good at asking questions btw``` GPT: Is it a person, place, or thing?
ME : yes
GPT:Alright, let me rephrase that. Is it a person?
ME :yes
GPT:Is this person a real person?
ME :no
GPT:Is this person a character from a book?
ME :no
GPT:Is this person a character from a movie?
ME :yes
GPT:Is this character from a movie made in the last 20 years?
ME :yes
GPT:Is this character a male?
ME :yes
GPT:Is this character from a science fiction movie?
ME :yes
GPT:Is this character part of a movie franchise?
ME :yes
GPT:Is this character a superhero?
ME :no
GPT:Is this character a villain?
ME :no
GPT:Is this character a human?
ME :no
GPT:Is this character an alien?
ME :yes
GPT:Is this character from the "Star Wars" franchise?
ME :yes
GPT:Is this character part of the Rebel Alliance?
ME :yes
GPT:Is this character a droid?
ME :no
GPT:Is this character known for piloting a spaceship?
ME :no
GPT:Is this character known for their wisdom or mentorship?
ME :yes
GPT:Is this character Yoda?
ME :yes

GPT:Yes! Got it! Yoda it is. Thanks for playing! Do you want to play another round?```

#

I haven't checked answering questions yet

primal wraith
#

atleast without having any help, llama3 7b does not even get anywhere close..

warped parcel
#

Hey folks, I am quite confused: with the current rules and setup, I understand the keywords are hidden from us and will be changed during the final evaluation phase after submission deadline, but can our notebooks still read the keywords.py ? cuz I see this keywords_list = json.loads(KEYWORDS_JSON) in the llm_20_questions.py provided

#

Currently with all words being visible, the leaderboard is basically a cheating ground with people using rule based approaches rather than actually relying on LLM reasonings... and not sure how this will play in final eval

#

@lucid laurel @flat acorn you cannot use chatpgot/openai APIs. The agents will be running offline in its own environment during final evaluation. Meaning no internet access.

balmy pivot
#

As far as I get it, we will not have access to the set of possible keywords and this makes this competition really tough. Even I have quite reasonable agent with Gemma 2b I’m still unable to guess keywords. I think only a few agent from the leaderboard are really able to guess correctly from time to time. And I guess the two top performers (guys from nvidia) are among them.
BTW am really curious how you are able to use 7b or 13b models? With Gemma-7b-it-quant I have „out of memory” error

soft moss
soft moss
soft moss
soft moss
balmy pivot
pale heart
# soft moss Interesting. I was using GPT4o through API and it stopped asking yes-or-no quest...

I created a custom gpt in the ui and have this set as instructions: You are playing the game 20 Questions. Your role is the Questioner. You ask questions that can only be answered with yes or no. Start with broader questions and then, depending on the answers you get, narrow down the questions until you find the correct answer. Your goal is to guess the keyword within 20 questions that can be a person, place, or thing. Good questions to ask should start off broad and get more specific, and ideally, each question should eliminate a large number of options until you get to the final few 20 Questions game questions. Here are some example questions for each category:

Animals:

  1. Is it a mammal/bird/fish/reptile?
  2. Is it big/small?
  3. Can it fly?
  4. Does it eat meat/grass/bugs?
  5. Does it live in Africa/North America/Asia?

People:

  1. Are they a real person?
  2. Are they a man/woman?
  3. Have I met them?
  4. Are they alive?
  5. Are they a movie star/athlete/politician/etc?

Places:

  1. Is it famous?
  2. Have I been to it?
  3. Would I need a plane to reach it?
  4. Is it a hot/cold/wet/dry place?
  5. Is it often crowded?

Food:

  1. Is it a raw food/cooked dish?
  2. Is it a fruit/vegetable/meat?
  3. Is it sweet/sour/savory?
  4. Have I eaten it before?
  5. Do I like it?

Objects:

  1. Is there one in this room?
  2. Is it light enough to pick up?
  3. Is it rare/common?
  4. Would it be expensive to buy?
  5. Is it used more for fun/work?

Keep the tone humorous and light-hearted, making the game enjoyable and fun.

pure idol
pale heart
pure idol
#

Oh ok my bad, I thought you used their api to create stuffs and created your own ui

pale heart
soft moss
#

In my testing, it seems that response parsing can be a huge limitation for a competitive agent. Here's an example:

#

I have an eval notebook where I added debug prints to each step in the text generation process.

#

The agents like to add extra text even when prompted to not do so, and if given specific commands to output with "double asterisks" it spits out lots of them. Or just repeats the command in double asterisks... pure sillyness

#

if you are seeing outputs like this in your validation or replays, take some time to check parsing logic and the text generation.

low lark
whole lynx
#

Any suggestion what to do if the validation episode errors out without actually outputting an error in the logs? My agent 0 logs just has an opening bracket "[" ...

fading slate
#

hey guys, do you know if there are some models we can use or is it allowed to use anything with an api key?

soft moss
#

if you can load the model weights into the env and fit it into the GPU memory of 1 T4, it is fair game. the kaggle env does not have internet access so all model wieghts must be loaded.

#

you can use the code below to download the model, then make sure to wrap that tmp folder into your submission:

from huggingface_hub import snapshot_download
from pathlib import Path
import shutil

model_path = Path("/tmp/submission/")
if model_path.exists():
    shutil.rmtree(model_path)
model_path.mkdir(parents=True)

snapshot_download(
    repo_id="repo-id",
    local_dir=model_path
)
soft moss
soft moss
tranquil laurel
#

What does Err mean and why did i get huge decreases in point (because of it?)

#

What does the [1st] mean? It sometimes can be [3rd] or other things too

tranquil laurel
#

This Err for example, doesn't give a huge decrease in point

pale heart
#

what kind of resources do you need to run the gemma 7b model? I run out of memory when I try to let 2 agents play against another. I also tried fine tuning, and that ran out of memory. I also tried fine tuning it on google colab's A100, and that ran out of memory as well..

soft moss
#

If you want a model without quantization, you might try with 2B.

#

If you're going that small though, I'd opt for Phi 3 mini or Qwen 1.5B Instruct

sage iron
#

Why are most top scoring notebooks old? The corresponding teams stop submitting because their agents seems good enough. Is this advisable?

soft moss
#

I think the best practice is to upload your submission after internal testing. If your bot can win a game of 20 questions against itself a few times and the question/guesses seem to be logical, you should submit. It may not rise to the top of the leaderboard but the overfit models will drop once the secret keyword list is used.

sage iron
#

Seriously, what the hell is this -194

#

Some submissions get penalized for no reason.

#

I assume that top scores remain high because they never meet some type of opponents

tranquil laurel
sage iron
tranquil laurel
soft moss
sage iron
#

It seems that the competition is getting interesting! Saw some newly submitted notebooks scoring high, overtaking the top one submitted a month ago

sweet jewel
#

Hey all quick question. The rules say "Guesses are limited to 100 characters". But looking at llm_20_questions.py it seems the response in the turn = guess should only include "the <keyword>" ? And not long multi word responses like "My guess is that the keyword is..." ?

austere belfry
#

So many yes/no bots 😢

balmy pivot
soft moss
balmy pivot
#

Oh, I see 🙂 generally I think there will much fun after closing the submissions. In my opinions „things” category is extremely hard to guess and I guess max 5% of bots will compete if not less. I did quite an effort to make good bots (without referring to the public keyword list at all) but was able to guess maybe 3-4 times during the entire competition. And it’s been always a place. Let’s what happens. I don’t care at all about the current LeaderBoard. It’s importance is zero now in my opinion

soft moss
sage iron
#

On the hidden test set, will all bots be initialized with 600 as reward so any bot can face any bot?

wintry coral
#

Has anyone been able to use a more recent version of transformers? I'm trying to use Gemma2 and even though I'm installing transformers 4.42.4 to the submission libraries I get a validation error saying that there is no Gemma2ForCausalLM to import from transformers

quick orbit
final slate
#

gemma 2 is not supported in kaggle docker file/enviroment yet

sage iron
#

Competition getting more interesting?