#ChatGPT Voice Assistant

114 messages · Page 1 of 1 (latest)

sinful zealot
#

I developed a voice assistant that works with the ChatGPT API using Python. I am posting the question to ChatGPT with API using OpenAI Whisper (speech-to-text). I'm voicing the response with google gTTS (text-to-speech). Being able to voice chat with ChatGPT is great

lilac breach
#

are you able to drop a github link? I wanna try this out

candid root
#

that‘s badass

latent charm
#

im trying to get one of these made for me. I want to train it on specific data and have it ask me questions. Im a pilot and want it to quiz me on my aircraft systems. Any idea what something like this would cost? or how many hours should it take?

uneven bluff
#

costs depends on usage right? no of tokens used. I think u need to insert ur own API key for this to work

latent charm
#

I mean for someone to build it for me. So I can upload "train" it for my specific usecase. I want to be able to use it in the car while im driving

lilac breach
#

Honestly I just wanna put something like this in some Alexa hardware

#

Sounds very badass

stiff panther
#

i think if you want a realistic voice you can use the elevenLabs API. they have rlly good voices and i used it for my chatbot

thick merlin
#

nice! I made the same using azure services

teal sigil
thick merlin
sinful zealot
crimson flax
#

hello ai

sinful zealot
sinful zealot
sinful zealot
sinful zealot
sinful zealot
tawny cradle
sinful zealot
# thick merlin https://github.com/ricardoborges/shellGPT

While I was developing the project, I used the google module of the Speech_recognition library. I was able to make up to 50 requests per day! I replaced this library with OpenAI Whisper for speech-to-text conversion 🙂 So I was able to perform unlimited speech-to-text conversion. All that was required was a free OpenAI ChatGPT API key. I recommend using OpenAI Whisper.

sinful zealot
sinful zealot
stiff panther
#

for the google tts

stiff panther
sinful zealot
#

I suggest you look into the Python gTTS module. The paid one is speech-to-text conversion and there is a 50 request per day limit for free usage. However, the text-to-speech conversion process uses the free voice-over available on Google translate.

sinful zealot
stiff panther
#

ohhh its python. srry i only do html js and css lol

#

oh nvm i found a js version

sinful zealot
#

Sorry I don't write javascript code.

stiff panther
#

ok

thick merlin
#

I would like to build something for Android like that, I'm having problem to call open ai API using kotlin

sinful zealot
sinful zealot
thick merlin
#

the whisper doesn't handle ogg audio right?

sinful zealot
thick merlin
#

another thing I'm trying to do is to use whatsapp audio as prompt input, but it's ogg

#

I think the google cloud can speech-to-text directly on ogg

sinful zealot
#

Google speech-to-text is limited to 50 requests per day if you don't have a paid API key. You can use Whisper's subtitle engine for speech-to-text unlimitedly.

#

At least I'm pretty sure my daily query count hasn't hit the limit. I do more than 100 speech-to-text operations per day with Whisper.

#

The project we will build is ChatGPT oriented. There are alternatives to other functions included. But ChatGPT has no alternative! For this reason, the best option to optimize the process would be to get a faster and more detailed response from ChatGPT. 🙂

thick merlin
#

whisper it's not free but is very cheap

#

$0.006 per minute

sinful zealot
#

This fee applies if you're actually developing a product. Free for testing or personal use.

#

The only api key you need is OpenAI API KEY.

#

Whisper does the speech-to-text conversion locally. You just have to include the library in your project. Valid for Python.

sinful zealot
thick merlin
sinful zealot
#

Unfortunately my GPU is not usable by the torch. A new CUDA version.

thick merlin
#

I got, you're using whisper on-premise not as service

#

are you running whisper on CPU?

sinful zealot
#

No. My projects are for hobby purposes. I'm just having some fun 🙂

thick merlin
#

but.. my intel i7 is from 2011 so is very outdated

thick merlin
sinful zealot
#

When the whisper library is included while working with the Python language, it downloads the model used in the first run to my machine. It does the speech-to-text operation locally.

thick merlin
#

that's why i'm using azure services

sinful zealot
#

How long did it take you to convert from speech to text?

#

with Whisper?

thick merlin
#

38 seconds long for 6 seconds of audio

sinful zealot
#

I used Whisper in the video I shared here. speech-to-text process takes place in an average of 5 seconds.

thick merlin
#

how about you?

#

cool

#

it seems that I need a new CPU LoL

sinful zealot
#

Here is a video where I share the time between processes during development. However, you may have to translate it into English.
https://www.youtube.com/watch?v=Qo5FZFCoN14

ChatGPT API ile iletişim kurduğım sesli asistanı optimize ettim. Mümkün olan en hızlı şekilde işlemlerin çalışması için yazdığım kodu her açıdan revize ettim.

Neler Yaptım:

1 - Öncelikle ses kayıt aşamasını elden geçirdim. Daha önce 7 saniyelik süre ile bir soru sorabiliyordum. Bu, kısa sorularda gereksiz bekleme, uzun sorularda ise sürenin...

▶ Play video
sinful zealot
thick merlin
#

thanks

sinful zealot
#

NVIDIA GeForce GTX 1660 Ti

#

NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0

#

You are welcome.

thick merlin
#

what whisper model are you using?

#

base?

sinful zealot
#

yep

sinful zealot
#

I spell-checked the output of the words that OpenAI's Whisper speech-to-text engine detects from our voice, in Turkish and English. According to the results I got in my previous videos and the misunderstanding of my voice questions, I observed that there were erroneous transformations in sentences containing words in different languages. Sorry if my pronunciation is bad. I tried to do this test as much as I could pronounce. https://www.youtube.com/watch?v=OFbiyCupttc

OpenAI'nin geliştirdiği Whisper sesten yazıya motorunun, sesimizden algıladığı kelimelerin çıktısı üzerinde, Türkçe ve İngilizce olarak yazım denetimi gerçekleştirdim. Elde ettiğim sonuçlara ve daha önceki videolarımda sesli sorularımın yanlış anlaşılmasına göre, içerisinde farklı dilde kelimeler bulunan cümlelerde hatalı dönüşümler gerçekleştiğ...

▶ Play video
#

Whisper cannot accurately transcribe sounds containing words in different languages when working with any language.

sinful zealot
#

I changed the voice the voice assistant uses. 🙂 https://www.youtube.com/watch?v=8esy8Io3saU

Daha önce yazıdan konuşmaya çeviri için gTTS kullandığımı belirtmiştim. gTTS, Google'ın Translate sayfasında kullandığı ücretsiz metni seslendirme işlevini kullanır. Tek bir ses tipi vardır. Ancak tüm dillerde seslendirme yapabilir. Bu sesi kullanmak istemediğim için, ffmpeg kullanarak ses dosyasını manipüle ettim. Böylece farklı bir ses elde et...

▶ Play video
south swallow
#

wow, this's an excellent idea!!🤗 is there any github link or binaries that we could try on? @sinful zealot

sinful zealot
#

https://www.youtube.com/watch?v=W5yB6XX0hAc Here I am both testing and having some fun. If you want to watch. 🙂

Yapay zeka destekli sesli asistanıma, Instagram'ın Reels videolarını dinlettim. Farklı kişilerin seslerini nasıl algıladığını ve daha doğrusu bir cihazdan, başka bir cihaza portlanan sesi nasıl algıladığını'da test etmiş oldum. Videoda, içeriği hoparlörden mikrofona ilettiğim için, aşırı derecede echo ve ses seviyesinde %500 gibi yüksek bir raka...

▶ Play video
stiff panther
stiff panther
somber violet
stiff panther
#

honestly I regret not doing python earlier

somber violet
#

Yeah... but honestly just do what ever makes you happy and dont be afraid to ask questions

#

Its surprising you are here and not consuming some crappy tiktoks

stiff panther
#

tiktok is for a bunch of bozos who think they're cool and they wanna fit in

#

and plus yt shorts is so much better

tawny cradle
quartz notch
sinful zealot
#

I guess your demo doesn't work when I access it with a browser using a computer?

quartz notch
#

@sinful zealot it should, it should give a prompt when you click the screen or click submit for the api key which is stored in localStorage

#

@sinful zealot if it does not then please send me the error as shown under dev tools (F12)

sinful zealot
quartz notch
#

@sinful zealot which browser you using?

sinful zealot
#

Firefox 🙂

quartz notch
#

ahh I see the problem 1 sec

#

had to look on stackoverflow

sinful zealot
#

You can define your own api key for demo. I don't think you have to overpay for it.

#

I think you can introduce it this way without any problems.

quartz notch
#

not sure if it will work on firefox , but you can try again

#

Actually let me download firefox and try it

sinful zealot
#

So you can get people to try your software and give feedback.

quartz notch
#

@sinful zealot you have to turn on speech recognition on firefox: