you just keep track of how much time has | ElevenLabs | Page 1

scenic patio Feb 19, 2024, 1:01 PM

#

Means I need to keep track of how much time elapsed since it started the playback...right?

nimble sand Feb 19, 2024, 1:07 PM

#

yeah, pretty much

#

I can try and dig up the code I used for the demo I linked, but I'm not sure I have it anymore - only really relevant if you're using python anyway

scenic patio Feb 19, 2024, 1:14 PM

#

Yes please, I am developing iOS App but even python code will help me getting some idea. Really thanks for the info @nimble sand

nimble sand Feb 19, 2024, 1:23 PM

#

@scenic patio I've actually lost the code I used for that demo, but on a theoretical level it's pretty simple:

You track at what time you started playing back the very first audio chunk.
When you receive a websocket message that contains transcript, you add the end time of the previous transcript. This makes it so you build out a transcript of the entire audio file, rather than the time resetting to 0 on every transcript.
You use the modified timestamps from step 2 to show the words depending on the time tracked in step 1

Here's the python code I use to adjust the timestamps:

data = json.loads(self.connection.recv()) #Wait for a new websocket message
alignment_data = data.get("normalizedAlignment", None)
if alignment_data is not None:
    #This format change is just for the sake of making the data easier to parse.
    formatted_list = list()
    for i in range(len(alignment_data["chars"])):
        new_char = {
            "character": alignment_data["chars"][i],
            "start_time_ms": alignment_data["charStartTimesMs"][i] + self._current_audio_ms,    #Add self._current_audio_ms to each timestamp
            "duration_ms": alignment_data["charDurationsMs"][i]
        }
        formatted_list.append(new_char)
    
    #Update self._current_audio_ms (it's initialized as 0 at the very beginning of the streaming)
    self._current_audio_ms = formatted_list[-1]["start_time_ms"] + formatted_list[-1]["duration_ms"]

scenic patio Feb 19, 2024, 1:30 PM

#

nimble sand <@1207988830557315103> I've actually lost the code I used for that demo, but on ...

Got it! But suppose it starts buffering, then we will go out of the sync, right?

nimble sand Feb 19, 2024, 1:30 PM

#

scenic patio Got it! But suppose it starts buffering, then we will go out of the sync, right?

yeah, if that happens I don't know if there's much you can do

#

ideally it shouldn't buffer

#

but with multilingual v2, it can happen

scenic patio Feb 19, 2024, 1:31 PM

#

We have "eleven_monolingual_v1"

nimble sand Feb 19, 2024, 1:31 PM

#

what you can probably do then might be to detect when buffering is happening and "pause" the timer ticking up to compensate

nimble sand Feb 19, 2024, 1:31 PM

#

scenic patio We have "eleven_monolingual_v1"

yeah that should not buffer

scenic patio Feb 19, 2024, 1:31 PM

#

Exactly, if it's buffering, we can pause the timer

nimble sand Feb 19, 2024, 1:32 PM

#

I personally haven't even tried implementing that since buffering is so rare, but I'm sure it's doable

scenic patio Feb 19, 2024, 1:33 PM

#

Thanks @nimble sand

scenic patio Feb 19, 2024, 1:56 PM

#

@nimble sand How frequently you call player observer?

#

I mean how much time per second, you refresh display?

nimble sand Feb 19, 2024, 1:57 PM

#

scenic patio I mean how much time per second, you refresh display?

oh, I personally haven't tried doing something like this
the way I did it was just printing the new word depending on the timing

#

I'd say you'd probably want to do it fairly often if you want to highlight the currently spoken word

#

...often enough that you probably want to figure out a better system

#

than just constantly refreshing

#you just keep track of how much time has