# I am thinking of mixing songs and dialogues into the data set. The lines can be easily separated by a few seconds, but the songs are difficult. What should I do?