plush nova Nov 26, 2024, 9:04 AM

#

I am training a model (Nemo) with an alpaca dataset. Idea is to force the model to answer me with a standard markdown answer.
I noticed an issue in interpreting the newline in the list part, while everywhere else the newline (always present as "\n" in my alpaca dataset) is working fine..

In other words the resulting new model lefts a "\n" in the list like :

Section one

word1 \n
word2\n

Section two

while he was trained on something like :

Section one\n* word1\n* word2\n\n# Section two

So, somehow, he understands the "\n" as newline, but left it also in the output (wrong).
Any idea ?
I tought about using <br>' instead of '\n' but not sure..

languid night Nov 26, 2024, 9:06 AM

#

Most surely in the dataset the newlines are escaped too

#

Check that out

plush nova Nov 26, 2024, 9:09 AM

#

languid night Most surely in the dataset the newlines are escaped too

actually no. The dataset is :

Section one\n* word1\n* word2\n\n# Section two

languid night Nov 26, 2024, 9:11 AM

#

How are you checking the output?

#

Are you using the standard unsloth inference example

plush nova Nov 26, 2024, 9:15 AM

#

languid night Are you using the standard unsloth inference example

The output is observed in LMStudio. It shows the response of the model in a pretty format. If you then click on the response you get the markdown format

languid night Nov 26, 2024, 9:15 AM

#

Then those are escaped newlines

#

Since non escaped are normally formatted

plush nova Nov 26, 2024, 9:17 AM

#

languid night Then those are escaped newlines

yes, but I did not mess up the alpaca format with escaping in the list. In other words he learned to eascape those ?

languid night Nov 26, 2024, 9:17 AM

#

Just apply a easy find on your data to make sure

#

Or directly .replace("\\n", "\n")

plush nova Nov 26, 2024, 9:20 AM

#

languid night Just apply a easy find on your data to make sure

The dataset is quite small and made by some coversion of csv like: the result is entirely "under control" and I can tell that the json has no escape. Unfortunately.

plush nova Nov 26, 2024, 9:34 AM

#

languid night Or directly .replace("\\\n", "\n")

Thanks !!
You shown me the light. Actually I left a "\n" in the prompt, where I explain/reinforce the structure of the markdown I want in output..

Bottomline is that "\n" is processed correctly 🤩

languid night Nov 26, 2024, 10:16 AM

#

plush nova Thanks !! You shown me the light. Actually I left a "\n" in the prompt, where I ...

slothsunglasses