#Need Help in determining if this is possible

1 messages · Page 1 of 1 (latest)

steady scarab
#

We are tasked to create a prototype of a game-playing LLM Agent designed to play the game Damath.

Basically, Damath is a variation of Checkers with operators on the usable squares, and values on each pieces.
A scoring is done by capturing a piece, with the formula being

{your piece}{operator}{enemy piece}

For example, if my piece has a value of 5, then I captured an enemy piece with a value of 3, landing on the square "x", then I get 5 * 3 = 15 score for that move

The game ends if there are no valid moves left for the last moving player. The one with the higher score wins.

Our adviser instructed us to create a dataset containing information about the game, as well as the mechanics of the game, specifically on how the piece movement works, then use that dataset to fine-tune the model.

Our experience with that was unsuccessful, so we tried creating an agent using LangChain that uses tool calling.
We broke down the problem into smaller steps, so we are currently at the board state -> valid moves part.

However, our adviser might be against using tool calls as he wants the LLM to do the "thinking" part.

Here are my questions. If you are answering a specific question from here, please indicate what question you're answering using this format:

Question #:
<your text here>

Question #:
...

More details from us:

  • Limited resources (4GB GPU)
  • Small models (using Ollama to run Llama3/3.1/3.2/3 groq tool use, deepseek distilled 8b params)
  • Tried creating a prompt to instruct GPT o3 mini model on ChatGPT website, successfully gotten valid moves from a board state
steady scarab
# steady scarab We are tasked to create a prototype of a game-playing LLM Agent designed to play...

Findings:

  • By providing an instruction on how to find valid moves given a board state, we have seen that it is successful on ChatGPT o3 mini model, and unsuccessful on small models such as Llama3/3.1/3.2/3 groq tool use, deepseek distilled 8b params
  • On the deepseek's case, using the same instruction as what I used on chatgpt did not work. So we broke it down to smaller steps, by finding the red pieces. It worked one time, still need more testing.

Questions:

  1. Can fine-tuning really solve the problem? At the moment, I am thinking that it might not work because our model is small + our dataset is small considering it just contains information and mechanics about the game.
  2. If not, can you give me online resources that tackle about how fine-tuning is not advised on small models + small datasets? I found some discussions, but was unsuccessful on finding concrete evidence to support this claim.
  3. If our adviser is still against on using function tool calling, we were planning on spending money in order to use GPT o3 mini, since we were successful on getting valid moves using ChatGPT o3 mini model. However, I found out some discussions that the OpenAI API performs quite differently/badly from ChatGPT directly. I'm only using o3 mini model because of its reasoning feature. Using other OpenAI models did not give me the correct result. Should I be worried about this? Can I make this work similarly to how ChatGPT works by using LangChain?

I am relatively new (1 yr+) to all of these so I'm not that quite familiar in this field, so any help is appreciated.

steady scarab
#

you can also respond by just focusing on this part:


We can only fine-tune and run small models using Ollama, with a very small dataset

Is this p?
#

Need Help in determining if this is possible

steady scarab
#

.

woven cairn
#

Either you do a finetuning on moves
or reinforcement learning on winnings
or use reasoning models as gpt 3o mini high, or deepseek R1 with some good prompting