#you just keep track of how much time has

1 messages · Page 1 of 1 (latest)

scenic patio
#

Means I need to keep track of how much time elapsed since it started the playback...right?

nimble sand
#

yeah, pretty much

#

I can try and dig up the code I used for the demo I linked, but I'm not sure I have it anymore - only really relevant if you're using python anyway

scenic patio
#

Yes please, I am developing iOS App but even python code will help me getting some idea. Really thanks for the info @nimble sand

nimble sand
#

@scenic patio I've actually lost the code I used for that demo, but on a theoretical level it's pretty simple:

  1. You track at what time you started playing back the very first audio chunk.
  2. When you receive a websocket message that contains transcript, you add the end time of the previous transcript. This makes it so you build out a transcript of the entire audio file, rather than the time resetting to 0 on every transcript.
  3. You use the modified timestamps from step 2 to show the words depending on the time tracked in step 1

Here's the python code I use to adjust the timestamps:

data = json.loads(self.connection.recv()) #Wait for a new websocket message
alignment_data = data.get("normalizedAlignment", None)
if alignment_data is not None:
    #This format change is just for the sake of making the data easier to parse.
    formatted_list = list()
    for i in range(len(alignment_data["chars"])):
        new_char = {
            "character": alignment_data["chars"][i],
            "start_time_ms": alignment_data["charStartTimesMs"][i] + self._current_audio_ms,    #Add self._current_audio_ms to each timestamp
            "duration_ms": alignment_data["charDurationsMs"][i]
        }
        formatted_list.append(new_char)
    
    #Update self._current_audio_ms (it's initialized as 0 at the very beginning of the streaming)
    self._current_audio_ms = formatted_list[-1]["start_time_ms"] + formatted_list[-1]["duration_ms"]
scenic patio
nimble sand
#

ideally it shouldn't buffer

#

but with multilingual v2, it can happen

scenic patio
#

We have "eleven_monolingual_v1"

nimble sand
#

what you can probably do then might be to detect when buffering is happening and "pause" the timer ticking up to compensate

nimble sand
scenic patio
#

Exactly, if it's buffering, we can pause the timer

nimble sand
#

I personally haven't even tried implementing that since buffering is so rare, but I'm sure it's doable

scenic patio
#

Thanks @nimble sand

scenic patio
#

@nimble sand How frequently you call player observer?

#

I mean how much time per second, you refresh display?

nimble sand
#

I'd say you'd probably want to do it fairly often if you want to highlight the currently spoken word

#

...often enough that you probably want to figure out a better system

#

than just constantly refreshing