I'm a newbie. I want to continue pre-training a large language model with data of more than 7000 book pages. Does Unsloth support this? I really hope that experienced people can assist me with a few issues such as: what format should my book data be in so that it can be trained, should I use an Instruct model or a non-Instruct model to train? training,... I am very grateful and welcome all comments from everyone.
#I want to continue pre-training a large language model with custom data
5 messages · Page 1 of 1 (latest)
absolutely we support it
we wrote an entire blog post about it: https://unsloth.ai/blog/contpretraining
And documentation: https://docs.unsloth.ai/basics/continued-pretraining
Many thanks to @chilly mica for your extremely valuable feedback. I will test, if there is any problem, hope you will help me.