Still no voice activity detection? | Pycord | Page 1

mild citrus Jan 24, 2024, 4:42 AM

#

I have a bot set up with the simple recording example and it works perfectly, I have a realtime STT setup that can read WAV files. The only thing I'm missing is a way to obtain what a user is saying as they say it. My goal was to use voice state detection, similar to how discordjs does it. This bit in JS, which just tells you who is speaking as soon as they do:

    connection.receiver.speaking.on('start', (userId) => {
        console.log(`${userID} is Speaking`)
      }

That way once the state is 'speaking' I could fire off the recording, once speaking is done, close and send the wav off to STT. But from what I can tell it's still not a feature? (please correct me if I'm wrong, please)
I've been pulling my hair out about this all night trying to find a way to do it, but I cant get STT working in nodejs, and I can't get speaking state in python, I'm losing it

merry lotus Jan 24, 2024, 4:44 AM

#

I'm not familiar with voice, but I think it's just recording a bit and processing it and recording a bit and processing it.

#

There are some examples of what you want in #creations

mild citrus Jan 24, 2024, 4:46 AM

#

Right, but with user detection I'll be able to pass the STT as "Specific User said: STT" to an AI, that's the end end goal

merry lotus Jan 24, 2024, 4:46 AM

#

pycord handles each user independently i think

mild citrus Jan 24, 2024, 4:53 AM

#

I did see that, yeah, it would probably cause problems with the AI since it's being developed for realtime conversation. Sending random chunks of audio to STT will likely cause issues, especially if it's cutting off mid-sentence, would probably overwhelm / confuse my AI haha

#

That's why I needed user speaking detection, so I can capture what they say as they say it

merry lotus Jan 24, 2024, 4:56 AM

#

feel free to make a feature request on github

#

although I'm fairly certain that someone has done this before you with pycord

mild citrus Jan 24, 2024, 4:57 AM

#

merry lotus although I'm fairly certain that someone has done this before you with pycord

I read a few threads and there was one "I tried" thread but then they never updated on whether it worked or not, which was months ago sobber

merry lotus Jan 24, 2024, 4:58 AM

#

i'm fairly certain that there is finished product in #creations

mild citrus Jan 24, 2024, 5:00 AM

#

I will search again, thank you

#

OK yeah I found it, (I was searching 'voice' instead of transcription) their solution is just as hacky as mine is sobber I'll give it a try though

mild citrus Jan 24, 2024, 5:36 AM

#

Something in their audio processor is broken from what I'm getting, it can't find file it thinks its making or something along those lines, so I'm still stuck

mild citrus Jan 24, 2024, 3:16 PM

#

I got it working, but it's PAINFULLY slow sobber

So I'm back to square one

edgy pasture Feb 3, 2024, 10:41 PM

#

I'm doing pretty much the exact same thing if you weren't able to figure out a good solution

#

Mine is working pretty well

mild citrus Feb 12, 2024, 1:42 AM

#

edgy pasture Mine is working pretty well

i totally missed this, what did you end up doing?

#

i was considering just editing the code i grabbed from creations thread and adding in my realtime STT code where they have their stt code

simple ore Feb 16, 2024, 8:08 PM

#

edgy pasture Mine is working pretty well

If you are willing to contribute to py-cord that would be great rooBless

edgy pasture Feb 16, 2024, 8:24 PM

#

mild citrus i totally missed this, what did you end up doing?

Basically I just made my own Sink by extending the WaveSink class. Then I use the write() method to get the audio data from each user. Once you aren't receiving anymore data from that user, then you can send that data to whatever S2T API/Algorithm that you're using!
I append all the audio data to a buffer for each user, then send that chunk to the API but you could also stream it directly to an API that accepts a stream of data I believe

edgy pasture Feb 16, 2024, 8:25 PM

#

simple ore If you are willing to contribute to py-cord that would be great <:rooBless:88152...

Well the basics of it uses everything py-cord offers. Figuring that out was a bit tough though haha

#

One other thing is that I am dealing with a memory leak right now so something in my code is not right. But it certainly works! It just runs out of memory rn haha

edgy pasture Feb 17, 2024, 10:42 AM

#

Fixed that leak! It has nothing to do with this :)

mild citrus Feb 17, 2024, 3:46 PM

#

👀

mild citrus Feb 17, 2024, 4:07 PM

#

me trying to find the the politest way to ask to "try" your code 🤣

mild citrus Feb 19, 2024, 11:36 PM

#

edgy pasture Basically I just made my own Sink by extending the WaveSink class. Then I use th...

Would it be possible for me to use / demo this? I finally got the other guys code working and it has a fatal flaw that it wont update and transcribe until it receives another recording so its always one behind and not actually realtime.

edgy pasture Feb 19, 2024, 11:37 PM

#

Oops just saw this haha. I haven't been monitoring this as much since I fixed my stuff 😆

#

I'm definitely down to give you my code!

mild citrus Feb 19, 2024, 11:38 PM

#

you're good i just wanted to see if i could get my end working well without being like "uwu gib code blush "

#

literally please and thank you though

edgy pasture Feb 19, 2024, 11:40 PM

#

I can prolly strip out all the stuff that's applicable to my bot and just make a little library for it and put it on GitHub

#

Oh jeez you were asking about this like a month ago orignially! I'll try to work on that today. I don't think it should be too hard to do

mild citrus Feb 19, 2024, 11:43 PM

#

XD It's fine my bots a hobby project that I don't actively dev on, she can wait

#

But i do appreciate it and the help you're giving

edgy pasture Feb 19, 2024, 11:44 PM

#

Okay cool haha. Did you just find a workaround since or has it been stalled?
No problem! It was kinda tough to figure out how to do it lol so it'd be nice if other people didn't have to go through that pain haha

mild citrus Feb 19, 2024, 11:46 PM

#

Nah i just shelved the idea lol, literally I just need it for collabs (she's an AI vtuber) And I've been trying to move away from using sillytavern and into plain code but im still pretty new to doing more than editing and googling code

#

and googling code for something like this is pretty difficult when it's never been done 🤣

edgy pasture Feb 19, 2024, 11:47 PM

#

Ohhh this would be a very tough one to take on if you're new to coding lol. You are very brave for even trying haha

#

Honestly good job regardless

mild citrus Feb 19, 2024, 11:48 PM

#

Ye I can read and understand code but it's a whole other world trying something from scratch for me

edgy pasture Feb 19, 2024, 11:49 PM

#

ChatGPT has helped wonders for stuff like this 🙂 but still requires a lot of know-how

#

Are you trying to get into coding or was this just kinda a one off for your friend?

mild citrus Feb 19, 2024, 11:52 PM

#

yknow my situation is weird to explain, ive been programming since i was 14, skidding minecraft hacked clients

its like I don't traditionally know how to program, but I have the problem-solving skills to work with stuff and make it work?

also my "friend" is an AI, think neuro-sama if you've ever heard of her

#

Like I learned how to do an entire selenium and realtime-stt project to run my voice to sillytavernai and get a response back for practically realtime conversation with an ai, but that's really easy stuff

#

i have the discord bot set up and sending messages to ai apis and receiving etc and that's mostly my own code but again, easy stuff

mild citrus Feb 19, 2024, 11:56 PM

#

edgy pasture Basically I just made my own Sink by extending the WaveSink class. Then I use th...

im sure with enough research id be able to understand every word of this but at current time its more than my free time allows for 😂 (im a trucker)

edgy pasture Feb 20, 2024, 1:12 AM

#

I totally get it haha. It can take as much time as you'll give it. That's very cool though! So now you're trying to talk to the AI through your mic on Discord i'm guessing

mild citrus Feb 20, 2024, 1:26 AM

#

That's correct, I'm trying to move the whole process over to discord rather than have 3 separate apps all hacked together in python, plus it'll make doing collabs easier and everyone will have speech recognition, the first time I tried it i just hooked up my mic and discord into one virtual audio cable and told the ai "good luck figuring out who's talking"

#

not to mention that voice recognition is only the first half of the whole thing I've got planned, once i finally have voice data going to the ai, i have to set up edge tts, and then process the audio up an octave or two, then send that back through discord so the ai can speak back inside voice chat

#

which looking at it is somehow the easier part

edgy pasture Feb 20, 2024, 10:27 AM

#

Ohh okay that's cool! I did something like that for my stream haha. it wasn't thru discord but i talk thru my mic and it uses that as input for an AI character and then i use synthesized voices to talk back to me!

#

Here's the code btw! https://github.com/baribarton/DiscordRealTimeTranscription

GitHub

GitHub - baribarton/DiscordRealTimeTranscription: Library for trans...

Library for transcribing speech real time using a Discord bot. I use this in my bot Crit Scribbler: https://top.gg/bot/1120586873920831508 - baribarton/DiscordRealTimeTranscription

#

I also found this one online if you want a more straightforward approach: https://github.com/NthnUlmr/DiscordLiveTranscriptionBot

GitHub

GitHub - NthnUlmr/DiscordLiveTranscriptionBot: A discord bot which ...

A discord bot which transcribes your audio in real time using a combination of API calls to other services. - NthnUlmr/DiscordLiveTranscriptionBot

#

Also I use Deepgram, you can just replace that with whatever S2T service you want

#

Lemme know if you need anything else! Also please lemme know if i left any hardcoded tokens in there haha. i was doing that to test but i think I got them 🙂

#Still no voice activity detection?