#Fine-tuning for summarization

1 messages · Page 1 of 1 (latest)

keen galleon
#

I have a general question but I will focus on a specific case. Suppose I want to fine-tune Phi-3 to a summarization task. How should the input data structure be?

Is it always better to use the same structure that was pre-trained in the model? For instance, in the Phi-3's paper, it is said that the model was pre-trained used the below template.

So, in order to do the given task, should I use the structure

<|user|>\n {full_text} <|end|>/n <|assistant|> {summarized_text}

or should i use a different template? How can i specifically tell him that i want it to summarize? Should i add <system> to template?

<system> {message_telling_him_to_summarize} <\system> <|user|>\n {full_text} <|end|>/n <|assistant|> {summarized_text}

Can i use different "tags", like switching from <assistant> to <bot> or should I keep the same tag? Is there any paper comparing different types of templates for different tasks?

thick fog
#

Generally you can structure it however you please. I highly recommend trying adding your own tokens to signify the document you wish to summarize, and the output.

#

Although using the chat format may lead the model to converge quickly