Currently, the notebook trains multi-turn conversations in bulk, predicting only the last assistant turn. This misses early-turn supervision and can reduce dialogue coherence, especially on small datasets (~1k examples).
Proposed Feature:
Add support for sequential multi-turn training via a method like dataset.sequentialize(), which:
Converts each conversation into progressive chunks (turns 1–2 → target 2, 1–4 → target 4, … 1–N → target N)
Preserves supervision on all assistant turns
Can be batched for GPU efficiency
Benefits:
Models every assistant turn, not just the last
Improves multi-turn coherence
Particularly helpful for small datasets