#you just keep track of how much time has
1 messages · Page 1 of 1 (latest)
Means I need to keep track of how much time elapsed since it started the playback...right?
yeah, pretty much
I can try and dig up the code I used for the demo I linked, but I'm not sure I have it anymore - only really relevant if you're using python anyway
Yes please, I am developing iOS App but even python code will help me getting some idea. Really thanks for the info @nimble sand
@scenic patio I've actually lost the code I used for that demo, but on a theoretical level it's pretty simple:
- You track at what time you started playing back the very first audio chunk.
- When you receive a websocket message that contains transcript, you add the end time of the previous transcript. This makes it so you build out a transcript of the entire audio file, rather than the time resetting to 0 on every transcript.
- You use the modified timestamps from step 2 to show the words depending on the time tracked in step 1
Here's the python code I use to adjust the timestamps:
data = json.loads(self.connection.recv()) #Wait for a new websocket message
alignment_data = data.get("normalizedAlignment", None)
if alignment_data is not None:
#This format change is just for the sake of making the data easier to parse.
formatted_list = list()
for i in range(len(alignment_data["chars"])):
new_char = {
"character": alignment_data["chars"][i],
"start_time_ms": alignment_data["charStartTimesMs"][i] + self._current_audio_ms, #Add self._current_audio_ms to each timestamp
"duration_ms": alignment_data["charDurationsMs"][i]
}
formatted_list.append(new_char)
#Update self._current_audio_ms (it's initialized as 0 at the very beginning of the streaming)
self._current_audio_ms = formatted_list[-1]["start_time_ms"] + formatted_list[-1]["duration_ms"]
Got it! But suppose it starts buffering, then we will go out of the sync, right?
yeah, if that happens I don't know if there's much you can do
ideally it shouldn't buffer
but with multilingual v2, it can happen
We have "eleven_monolingual_v1"
what you can probably do then might be to detect when buffering is happening and "pause" the timer ticking up to compensate
yeah that should not buffer
Exactly, if it's buffering, we can pause the timer
I personally haven't even tried implementing that since buffering is so rare, but I'm sure it's doable
Thanks @nimble sand
@nimble sand How frequently you call player observer?
I mean how much time per second, you refresh display?
oh, I personally haven't tried doing something like this
the way I did it was just printing the new word depending on the timing
I'd say you'd probably want to do it fairly often if you want to highlight the currently spoken word
...often enough that you probably want to figure out a better system
than just constantly refreshing