#ChatGPT Voice Assistant
114 messages · Page 1 of 1 (latest)
are you able to drop a github link? I wanna try this out
that‘s badass
im trying to get one of these made for me. I want to train it on specific data and have it ask me questions. Im a pilot and want it to quiz me on my aircraft systems. Any idea what something like this would cost? or how many hours should it take?
costs depends on usage right? no of tokens used. I think u need to insert ur own API key for this to work
I mean for someone to build it for me. So I can upload "train" it for my specific usecase. I want to be able to use it in the car while im driving
Honestly I just wanna put something like this in some Alexa hardware
Sounds very badass
i think if you want a realistic voice you can use the elevenLabs API. they have rlly good voices and i used it for my chatbot
nice! I made the same using azure services
I could work with you to build it, if interested. We could turn it into a product if there’s demand
I can do it for you, send me a message
Unfortunately, I don't think we'll be allowed to train an AI language model.
hello ai
In the explanation section on Youtube, I explained what kind of process I followed, what problems I encountered and how I solved it. I also explained which libraries I use and why. I recommend you take a look. Currently, my voice assistant application is hobby work.
Cost does not depend on usage. I am working with totally free libraries and api key.
It is not necessary to start such a costly and lengthy language model training. Many successful autopilot systems can be reported as commands after a few simple voice recognition and transcription processes. You don't need artificial intelligence for this.
Thank you very much for the suggestion. I will research this and want to try it as soon as possible.
This is great. I will review your project. Congrats on having a github link. It's an experimental hobby for me right now. I hope to improve a little more.
Good work Mohican. I'm using Alexa inside the Alexa Developer Console with Davinci-003 but have been unable to upgrade that to gpt-3.5-turbo so far.
Have you been able to achieve a continuous chat (e.g. it remembers it previous responses)?
This is certainly the way to go 
While I was developing the project, I used the google module of the Speech_recognition library. I was able to make up to 50 requests per day! I replaced this library with OpenAI Whisper for speech-to-text conversion 🙂 So I was able to perform unlimited speech-to-text conversion. All that was required was a free OpenAI ChatGPT API key. I recommend using OpenAI Whisper.
Thanks. Since I'm using the API it doesn't remember previous questions. In order to be able to chat continuously, I ask about a sub-title in the final output. I used the gpt-3.5-turbo model.
I had a look at the elevenLabs api. There is a limit of 10,000 characters per month for the free API. There is no such limit for Google gTTS. If OpenAI announces a text-to-speech engine, I will prefer to use it. 🙂
but dont you have to put ur credit card in? im not old enough to have that...
for the google tts
no 🙂
oh. can you give me the link for that?
I suggest you look into the Python gTTS module. The paid one is speech-to-text conversion and there is a 50 request per day limit for free usage. However, the text-to-speech conversion process uses the free voice-over available on Google translate.
Sorry I don't write javascript code.
ok
thank you bro for this feedback! it's a very low cost right?
I would like to build something for Android like that, I'm having problem to call open ai API using kotlin
If you want a more professional and faster chatbot, get the plus api from OpenAI. Alternatives exist for other costly transactions. 🙂
Unfortunately I only code in C, Python, PHP and Shell. I don't know Java or Javascript or Kotlin.
the whisper doesn't handle ogg audio right?
I recommend using wav.
another thing I'm trying to do is to use whatsapp audio as prompt input, but it's ogg
I think the google cloud can speech-to-text directly on ogg
Google speech-to-text is limited to 50 requests per day if you don't have a paid API key. You can use Whisper's subtitle engine for speech-to-text unlimitedly.
At least I'm pretty sure my daily query count hasn't hit the limit. I do more than 100 speech-to-text operations per day with Whisper.
The project we will build is ChatGPT oriented. There are alternatives to other functions included. But ChatGPT has no alternative! For this reason, the best option to optimize the process would be to get a faster and more detailed response from ChatGPT. 🙂
whisper it's not free but is very cheap
$0.006 per minute
This fee applies if you're actually developing a product. Free for testing or personal use.
The only api key you need is OpenAI API KEY.
Whisper does the speech-to-text conversion locally. You just have to include the library in your project. Valid for Python.
Take a look at the documentation. https://platform.openai.com/docs/guides/speech-to-text
An API for accessing new AI models developed by OpenAI
that's OK if you install whisper on your machine, but with my CPU was very slow. are you using GPU?
Unfortunately my GPU is not usable by the torch. A new CUDA version.
I got, you're using whisper on-premise not as service
are you running whisper on CPU?
No. My projects are for hobby purposes. I'm just having some fun 🙂
but.. my intel i7 is from 2011 so is very outdated
me also
are you calling the web api to whisper or did you install the software on your machine?
When the whisper library is included while working with the Python language, it downloads the model used in the first run to my machine. It does the speech-to-text operation locally.
yep, I tried once, was very slow in my cpu
that's why i'm using azure services
38 seconds long for 6 seconds of audio
I used Whisper in the video I shared here. speech-to-text process takes place in an average of 5 seconds.
what CPU are you running?
Here is a video where I share the time between processes during development. However, you may have to translate it into English.
https://www.youtube.com/watch?v=Qo5FZFCoN14
ChatGPT API ile iletişim kurduğım sesli asistanı optimize ettim. Mümkün olan en hızlı şekilde işlemlerin çalışması için yazdığım kodu her açıdan revize ettim.
Neler Yaptım:
1 - Öncelikle ses kayıt aşamasını elden geçirdim. Daha önce 7 saniyelik süre ile bir soru sorabiliyordum. Bu, kısa sorularda gereksiz bekleme, uzun sorularda ise sürenin...
Intel® Core™ i7-9750H × 12
thanks
NVIDIA GeForce GTX 1660 Ti
NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0
You are welcome.
yep
I spell-checked the output of the words that OpenAI's Whisper speech-to-text engine detects from our voice, in Turkish and English. According to the results I got in my previous videos and the misunderstanding of my voice questions, I observed that there were erroneous transformations in sentences containing words in different languages. Sorry if my pronunciation is bad. I tried to do this test as much as I could pronounce. https://www.youtube.com/watch?v=OFbiyCupttc
OpenAI'nin geliştirdiği Whisper sesten yazıya motorunun, sesimizden algıladığı kelimelerin çıktısı üzerinde, Türkçe ve İngilizce olarak yazım denetimi gerçekleştirdim. Elde ettiğim sonuçlara ve daha önceki videolarımda sesli sorularımın yanlış anlaşılmasına göre, içerisinde farklı dilde kelimeler bulunan cümlelerde hatalı dönüşümler gerçekleştiğ...
Whisper cannot accurately transcribe sounds containing words in different languages when working with any language.
I changed the voice the voice assistant uses. 🙂 https://www.youtube.com/watch?v=8esy8Io3saU
Daha önce yazıdan konuşmaya çeviri için gTTS kullandığımı belirtmiştim. gTTS, Google'ın Translate sayfasında kullandığı ücretsiz metni seslendirme işlevini kullanır. Tek bir ses tipi vardır. Ancak tüm dillerde seslendirme yapabilir. Bu sesi kullanmak istemediğim için, ffmpeg kullanarak ses dosyasını manipüle ettim. Böylece farklı bir ses elde et...
wow, this's an excellent idea!!🤗 is there any github link or binaries that we could try on? @sinful zealot
Thank you. Unfortunately, there is currently no github link or binary available. But I've made close progress in sharing an executable binary.
https://www.youtube.com/watch?v=W5yB6XX0hAc Here I am both testing and having some fun. If you want to watch. 🙂
Yapay zeka destekli sesli asistanıma, Instagram'ın Reels videolarını dinlettim. Farklı kişilerin seslerini nasıl algıladığını ve daha doğrusu bir cihazdan, başka bir cihaza portlanan sesi nasıl algıladığını'da test etmiş oldum. Videoda, içeriği hoparlörden mikrofona ilettiğim için, aşırı derecede echo ve ses seviyesinde %500 gibi yüksek bir raka...
Kiddo
how did u know I'm a kid 😂
also i have started py on replit now and I gotta say it's SO much easier
I was 13 when I coded with html lol. Now 25. Check python if you wanna do something cool with coding.
honestly I regret not doing python earlier
Yeah... but honestly just do what ever makes you happy and dont be afraid to ask questions
Its surprising you are here and not consuming some crappy tiktoks
tiktok is for a bunch of bozos who think they're cool and they wanna fit in
and plus yt shorts is so much better
When I was 13 we used Fortran and BASIC. Later came Pascal, LISP, LPL, Visual Basic and VBA. After HTML and CSS came Javascript and now, at 73, I'm using Apps Script and within VS Code various AI Apis and extensions.
I wrote a similar app that works in the browser and on phones https://github.com/jay23606/chat-gpt-voice
I guess your demo doesn't work when I access it with a browser using a computer?
@sinful zealot it should, it should give a prompt when you click the screen or click submit for the api key which is stored in localStorage
@sinful zealot if it does not then please send me the error as shown under dev tools (F12)
Uncaught ReferenceError: webkitSpeechRecognition is not defined
recognition https://raw.githack.com/jay23606/chat-gpt-voice/main/chat.html:58
<anonymous> https://raw.githack.com/jay23606/chat-gpt-voice/main/chat.html:65
@sinful zealot which browser you using?
Firefox 🙂
You can define your own api key for demo. I don't think you have to overpay for it.
I think you can introduce it this way without any problems.
not sure if it will work on firefox , but you can try again
Actually let me download firefox and try it
https://discord.com/channels/974519864045756446/1085028026443640982 I have a Discord Bot that I developed with ChatGPT and DALL-E. I used my own api key for this bot. It is available for free. You can follow the same path for your own demo application. It will not be costly for you.
So you can get people to try your software and give feedback.
@sinful zealot you have to turn on speech recognition on firefox: