#Any guidance on where my time start ends

12 messages · Page 1 of 1 (latest)

velvet magnet
#

Let me know if you get an answer for this. I also need to figure out a policy of adding timestamps.

dusk hazel
#

If we had some example training data from someone experienced that would probably help.

velvet magnet
#

We think it looks a little like this, but haven't verified yet: <|0.0|>I have a cute dog.<|4.4|><|4.8|>She is small but runs so fast.<|9.2|> timestamps should be rounded to nearest 0.02 seconds.

dusk hazel
#

But how does that look w/r/t the starts/ends of the waveforms?

dusk hazel
velvet magnet
#

Oh, you mean where to position the timestamps with respect to the word waveforms within the audio file? I assume we should position them as close to the starting and stopping of audible voice. So assuming there is no background noise and no filler-words that we might want to ignore (I'm not sure it we should ignore filler words or not...), the timestamps would have zero amplitude on one side and the smallest perceivable amplitude on the other. But I'm still not sure how sensitive Whisper training will be to variations or imperfect positions. I hope to find that out, too.

dusk hazel
#

yeah, exactly. I position just at the boundaries, as seen in the spectrogram (which more-clearly shows subtle/low-volume, but important, parts of a vocal pattern; the waveform display may not present these as visibly).

#

I found this, which may be important, but I'm not sure if it relates to fine-tuning yet (I've not investigated). It might relate to the practical functionality of inference though:
https://github.com/openai/whisper/discussions/759#discussioncomment-4934838

GitHub

I am trying to fine-tune the whisper to improve the WER for a simulated telephone records in English. I am using the "small model" and a dataset of around 32 hours in English with the aud...

#


whisper/whisper/audio.py

Line 17 in 7858aa9
 CHUNK_LENGTH = 30 
velvet magnet
#

Good point. What tool do you use to see spectrogram?

#

Oh, I didn't realize Audacity does that, too.