#Still no voice activity detection?

1 messages ยท Page 1 of 1 (latest)

mild citrus
#

I have a bot set up with the simple recording example and it works perfectly, I have a realtime STT setup that can read WAV files. The only thing I'm missing is a way to obtain what a user is saying as they say it. My goal was to use voice state detection, similar to how discordjs does it. This bit in JS, which just tells you who is speaking as soon as they do:

    connection.receiver.speaking.on('start', (userId) => {
        console.log(`${userID} is Speaking`)
      }

That way once the state is 'speaking' I could fire off the recording, once speaking is done, close and send the wav off to STT. But from what I can tell it's still not a feature? (please correct me if I'm wrong, please)
I've been pulling my hair out about this all night trying to find a way to do it, but I cant get STT working in nodejs, and I can't get speaking state in python, I'm losing it

merry lotus
#

I'm not familiar with voice, but I think it's just recording a bit and processing it and recording a bit and processing it.

#

There are some examples of what you want in #creations

mild citrus
#

Right, but with user detection I'll be able to pass the STT as "Specific User said: STT" to an AI, that's the end end goal

merry lotus
#

pycord handles each user independently i think

mild citrus
#

I did see that, yeah, it would probably cause problems with the AI since it's being developed for realtime conversation. Sending random chunks of audio to STT will likely cause issues, especially if it's cutting off mid-sentence, would probably overwhelm / confuse my AI haha

#

That's why I needed user speaking detection, so I can capture what they say as they say it

merry lotus
#

feel free to make a feature request on github

#

although I'm fairly certain that someone has done this before you with pycord

mild citrus
merry lotus
#

i'm fairly certain that there is finished product in #creations

mild citrus
#

I will search again, thank you

#

OK yeah I found it, (I was searching 'voice' instead of transcription) their solution is just as hacky as mine is sobber I'll give it a try though

mild citrus
#

Something in their audio processor is broken from what I'm getting, it can't find file it thinks its making or something along those lines, so I'm still stuck

mild citrus
#

I got it working, but it's PAINFULLY slow sobber

So I'm back to square one

edgy pasture
#

I'm doing pretty much the exact same thing if you weren't able to figure out a good solution

#

Mine is working pretty well

mild citrus
#

i was considering just editing the code i grabbed from creations thread and adding in my realtime STT code where they have their stt code

simple ore
edgy pasture
# mild citrus i totally missed this, what did you end up doing?

Basically I just made my own Sink by extending the WaveSink class. Then I use the write() method to get the audio data from each user. Once you aren't receiving anymore data from that user, then you can send that data to whatever S2T API/Algorithm that you're using!
I append all the audio data to a buffer for each user, then send that chunk to the API but you could also stream it directly to an API that accepts a stream of data I believe

edgy pasture
#

One other thing is that I am dealing with a memory leak right now so something in my code is not right. But it certainly works! It just runs out of memory rn haha

edgy pasture
#

Fixed that leak! It has nothing to do with this :)

mild citrus
#

๐Ÿ‘€

mild citrus
#

me trying to find the the politest way to ask to "try" your code ๐Ÿคฃ

mild citrus
edgy pasture
#

Oops just saw this haha. I haven't been monitoring this as much since I fixed my stuff ๐Ÿ˜†

#

I'm definitely down to give you my code!

mild citrus
#

you're good i just wanted to see if i could get my end working well without being like "uwu gib code blush "

#

literally please and thank you though

edgy pasture
#

I can prolly strip out all the stuff that's applicable to my bot and just make a little library for it and put it on GitHub

#

Oh jeez you were asking about this like a month ago orignially! I'll try to work on that today. I don't think it should be too hard to do

mild citrus
#

XD It's fine my bots a hobby project that I don't actively dev on, she can wait

#

But i do appreciate it and the help you're giving

edgy pasture
#

Okay cool haha. Did you just find a workaround since or has it been stalled?
No problem! It was kinda tough to figure out how to do it lol so it'd be nice if other people didn't have to go through that pain haha

mild citrus
#

Nah i just shelved the idea lol, literally I just need it for collabs (she's an AI vtuber) And I've been trying to move away from using sillytavern and into plain code but im still pretty new to doing more than editing and googling code

#

and googling code for something like this is pretty difficult when it's never been done ๐Ÿคฃ

edgy pasture
#

Ohhh this would be a very tough one to take on if you're new to coding lol. You are very brave for even trying haha

#

Honestly good job regardless

mild citrus
#

Ye I can read and understand code but it's a whole other world trying something from scratch for me

edgy pasture
#

ChatGPT has helped wonders for stuff like this ๐Ÿ™‚ but still requires a lot of know-how

#

Are you trying to get into coding or was this just kinda a one off for your friend?

mild citrus
#

yknow my situation is weird to explain, ive been programming since i was 14, skidding minecraft hacked clients

its like I don't traditionally know how to program, but I have the problem-solving skills to work with stuff and make it work?

also my "friend" is an AI, think neuro-sama if you've ever heard of her

#

Like I learned how to do an entire selenium and realtime-stt project to run my voice to sillytavernai and get a response back for practically realtime conversation with an ai, but that's really easy stuff

#

i have the discord bot set up and sending messages to ai apis and receiving etc and that's mostly my own code but again, easy stuff

mild citrus
edgy pasture
#

I totally get it haha. It can take as much time as you'll give it. That's very cool though! So now you're trying to talk to the AI through your mic on Discord i'm guessing

mild citrus
#

That's correct, I'm trying to move the whole process over to discord rather than have 3 separate apps all hacked together in python, plus it'll make doing collabs easier and everyone will have speech recognition, the first time I tried it i just hooked up my mic and discord into one virtual audio cable and told the ai "good luck figuring out who's talking"

#

not to mention that voice recognition is only the first half of the whole thing I've got planned, once i finally have voice data going to the ai, i have to set up edge tts, and then process the audio up an octave or two, then send that back through discord so the ai can speak back inside voice chat

#

which looking at it is somehow the easier part

edgy pasture
#

Ohh okay that's cool! I did something like that for my stream haha. it wasn't thru discord but i talk thru my mic and it uses that as input for an AI character and then i use synthesized voices to talk back to me!

#

Also I use Deepgram, you can just replace that with whatever S2T service you want

#

Lemme know if you need anything else! Also please lemme know if i left any hardcoded tokens in there haha. i was doing that to test but i think I got them ๐Ÿ™‚