#Tips on improving quality of lower parameter model's generation

16 messages · Page 1 of 1 (latest)

main comet
#

All,

I am currently finetuning on a dataset of 15000+ high-quality, human written items in order to generate similar instances. Working with Mistral-22B and Mistral-7B (loss evens out at about ~1.1) returns good (need some minor edits) to really good (need no edits) examples.

I am looking to have the model run on a Raspberry Pi 5 for kicks and giggles. To have an acceptable run-time per token, the model needs to be 2B parameters or less (Gemini-2B, Llama-3.2-1b, etc.). However, the output of these models after similar conditions (loss flattens out to about 1.6) is just garbage and gibberish.

Is the solution here just more epochs? Using DPO on a segment of the dataset and generated outputs from the model? Or adjustment of some of the finetuning parameters? Or should I just bite the bullet and resort to using Mistral-7B and cope with the poor generation time?

I wouldn't be too torn up if I couldn't get this to run as pictured, but your help would be appreciated.

pliant wasp
#

It's hard to understand what you mean with "garbage and gidderish". It does not provide any insight into the outcome of your tune - instead, provide some samples.

If the model doesn't understand your dataset to begin with, it will have a hard time to adapt to it. That being said, if you mean gibberish like pure nonsense or issues with the output, it sounds more to me like a template issue.

If you run an instruct or already tuned model, ensure you got the template in place that the model was already previously tuned on.

If you tune a base model, ensure your template is correctly defined during training and later for inference with the tokenizer.

#

Some models do pick up your dataset from tunes even with bad chat template, but it's undefined outcome and can wierd out even so, but it might bypass your knowledge.

main comet
#

Thanks for your response,

Here's some more information about my issue.

On garbage and gibberish generation I mean that the model is simply unable to generate a valid set of rules for a tabletop RPG. For example:

#

The template is modified from the Alpaca template, pictured below.

#

Is there anything I can do to improve the template?

pliant wasp
#

Oh wait

#

So If I understand you, your exact same template works on bigger models but not the small ones correct?

#

If that's the case:

I see in your sample dataset that the dataset is specailized around the RPG game DnD, so I would suspect that the smaller models don't have much knowledge remembered from the pre-training in this area as the bigger models have ,since this is a very very small area of topic in comparison the amount of knowledge it's been trained on.

What I would suggest is that you create an evaluation set of prompts that you can evaluate the models and see which one performs best on it before you pick one for tuning further.

Otherwise, you could do a continued pre-training to specialize it in your area and then fine tune it on top of that.

#

Also having urls like this might cause hallucinations and issues:

  • **Spells: **Spells: **[Spells: Spells The bagpipe casts the following spells.](https:// www. dndbeyond .com /spells)
#

You should instead do that on your front end to catch any keyword and map them to the URL correctly

#

You want as high quality dataset as possible

#

 * Cantrip: The bagpipe performs an ability as a cantrip. The bagpipe has a 2 foot radius and can cast the following spells: **Cantrips: **Cantrip: The bagpipe performs an ability as a cantrip. The bagpipe has a 2 foot radius and can cast the following spells:
 * **Spells: **Spells: **[Spells: Spells The bagpipe casts the following spells.](https://www.dndbeyond.com/spells) 
 * **Spells: **Cantrip: The bagpipe performs an ability as a cantrip. The bagpipe has a 2 foot radius and can cast the following spells:
 * **Spells: **Spells The bagpipe casts the following spells.```
#

In general for an LLM that might cause very much confusion

#

Especially if it doesnt know the data inside out