#gpt-realtime
1 messages · Page 1 of 1 (latest)
been wanting to play with whisper for awhile
any recommendations between the model sizes? I want something that works nearly real-time for conversational AI
well I guess up to a few seconds of lag at most
👋

Sup
sup
sup
ngl I'm gonna use this tool to not pay attention in class
how to use this?
Namaste
sooo what’s whisper?
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.
so hopefully it will be able to tell what british people are saying, unlike siri and alexa
how to use whisper?
I'm a bit late but I think any size will work for that. In most cases you won't need to push it above medium unless there is a bunch of made up words or slurred speech
You can google openai whisper github. There's python install instructions and some people made bindings for other langs
can anyone wanted to become my friend
they have a very easy to use python package
pip install openai-whisper
then to use in your program you do something like
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
See more here: https://github.com/openai/whisper
I just ran this. Thank the heavens for fiber.
a
Does anyone know of any npm packages for running whisper inside of NodeJS? Or literally anything at all to run Whisper within NodeJS in general?
The C++ fork of whisper has node bindings. You can can find it on npm as whisper.cpp
Ok
so this basically runs locally on your computer?
what ?
whisper?
yeah
youcan run it
somewhere else too
to host it and implement it
in an app or smth
Thanks for your response. Do you know about its license?
FOr use?
It's under MIT license
ey yo does anyone know where to do ai?
A good place to start is the S.U.G.M.A. forums
Try the MacWhisper app if you have an M1 or M2.
hm
what is whispe
It's an open source speech to text model published by openai
LOUD
yello
Whisper-UI Update: You can now bulk-transcribe, save & search transcriptions with Streamlit & SQLAlchemy 2.0
I'd built a hacky Streamlit UI for OpenAI's Whisper a few months back and there had been a bit of interest so finally got myself to rewrite it to make it a little nicer. Update includes
- Ability to download entire YouTube playlists and upload multiple files at once
- Ability to browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM)
- Auto-export of transcriptions in multiple formats (was a feature request)
- Simple substring based search for transcript segments. This is done with a simple LIKE query on the SQLite database.
- Fully reworked UI with a cleaner layout and more intuitive navigation.
Repo: github.com/hayabhay/whisper-ui
I found that for EN so far medium seemed to work best for the size? But for transcribe + translate I think large is minimum.
did you have accuracy issues with the smaller models?
it was ok-ish for EN, but the accuracy of medium is way better, the smaller models are extremely hit or miss for non-EN languages (like less than 50% iirc)
FYI I was using Whisper.CPP and it has its own pre-baked tuning, but I don't think it heavily diverges from the CUDA version
wow
Hmm
Yup, the whisper.cpp fork actually has an example for transcribing twitch streams in the repo
bro that is a phone transcription
I have no idea what your trying to say here?
the example in what you linked has the guy holding a phone to record?
Read what it says right above that. The phone is just an example to show that it can be run on whatever hardware
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc
Since the original implementation of whisper is meant to run on gpu and is very vram heavy
how would i get it to listen to desktop audio?
Did you actually read the readme? It has a bunch of examples. You can also look in the examples folder and twitch.sh in there to see how to use it in the use case you asked about
no im asking you questions because you're helpful and nice xD
Has anyone built an asynchronous communication app or add-on with Whisper?
I had hacked a live translation bit a few months ago (desktop only) and will add to the UI soon (using streamlit-webrtc)
What is whisper? I work gpt 3 api and stuff but what is whisper?
Whisper is a speech to text transcription model published by openai
Hey I'm having some trouble installing whisper via my macbook terminal. I'm running:
pip install -U openai-whisper
and it gives me:
ERROR: Cannot install openai-whisper==20230117 and openai-whisper==20230124 because these package versions have conflicting dependencies.
The conflict is caused by:
openai-whisper 20230124 depends on torch
openai-whisper 20230117 depends on torch
I had installed torch, now when I try to run "pip install torch" it just says:
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch
anyone around to help out?
Are you using an M1 mac? Iirc there are still some pytorch issues on M1
it's a 2018 i7
Hm, that's odd then. Try running pip install git+https://github.com/openai/whisper.git
Different message but still an error haha
ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement torch (from openai-whisper) (from versions: none)
ERROR: No matching distribution found for torch
My python version is 3.11.1 tho
Well that issue seems easier to approach at least
Try setting up a venv with a python version in the range and trying again?
@dim viper mind trying to help if that doesn't fix his issue? I'm heading to bed
👍 got to bed night man
It seems I can't install any older versions of python now...
pip install python==3.10
gives me:
ERROR: Could not find a version that satisfies the requirement python==3.10 (from versions: none)
ERROR: No matching distribution found for python==3.10
Cheers thank you for your time!
one sec
what shows up when u type python --v
is it 3.11.1?
yeah 3.11.1
hmm u can try installing something like 3.10 u can find it on pythons website under /downloads
but like book said
create a venv
google venv python and you can find the information on how to set up virtual enviroments
brb gotta go take dog out
im having internet outage rn .... if i disappear its cause of the internet ill reply as soon as im back online
What would be the expected processing time? I have a 4 hour long audio file, and im guessing that will take a while?
did u end up figuring it out
Hey thanks for checking in, I downloaded MacWhisper and it wasn’t working on my version of MacOS so I upgraded to Ventura and now the app works so that’s a start I guess, I’m going to try and reinstall from repo this evening and will report back 🫡
Sounds good! hope everything goes well 🙏
is youtube for kids
technically, no, but i think almost all kids watch it
👍
hlo
Just passing by to give a big shoutout to the Whisper project!
I’m working on an accessibility/quality of life tool and Whisper is being INVALUABLE for it
Whisper really is overlooked, it's incredibly powerful and the fact that it's open source only makes it better
wait till its supported or find someone to pay for you thats in the US?
Hi Guy's
Hi all, quick question: Has anyone tried to create an AWS Lambda function that runs Whisper's transcribe on a file from S3? I can't think of a reason it wouldn't work, but when I search Google, I cant find anyone else that's done it. Which makes me think I'm missing something.
I share with you the project I did with Whisper, Embedding and GPT-3
allows you to load any youtube video and start getting information through a chat
how can i get whisper to record twitch streams in real time on windows 10?
IDK
can I try it?
Where are u from?
HI
Did he use chat gpt api to make it or is there a free option of doing it y’all?
I can't share link in this channel, but you can go to the api-projects channel
and you have to search for: YoutubeGPT: Satya Nadella interview
and you will have all the information and the project repo
I'm using OpenAI Whisper, Embedding and GPT-3 API
If you could introduce your technology to creat chatGPT?
I'm building something so that it can be used for free
I will be posting on twitter, if I want to follow me my username is dani_avila7
But very similar
I gotcha
Let’s trade a follow I just made a Twitter yesterday 😁
SullyBillions is my Twitter
egg incubator research study
tes
Damn bro I feel bad 💀💀
it's a pity that things like ChatGPT are being used for fraud.
hey guys your too have this problem ""An error occurred. If this issue persists please contact us through our help center at help.openai.com.""
I need help regarding streamlit cloud
No such file or directory: 'ffmpeg'
I have tried everything since 3 days nothing working for me
Please help
Accusing someone basing your evidence on what AI believes is even worse than using AI to write assignments. Change my mind
When a conversation starts, a time log is necessary. If the dialogue continues for a while, it may be necessary to recall when a specific question was asked or when an answer was received. Although AI cannot hold real-time information, time stamps for the conversation would be helpful.
hello where the real input area?
how to make the ai write phd thesis
Fr it’s gonna b tuff out here
Hi all new Friend
hello from norway
I'm using whisper in my project, and I'm having some weird behavior sometimes. Here's a couple errors i've saved from a recent test. Because I'm testing, the audio it's receiving is basically the same each time. just sometimes it gives errors like these, and most of the time it doesnt. I'm probably just gonna wrap some of this in a retry, but wondering if you guys have had similar experiences.
RuntimeError: The size of tensor a (18) must match the size of tensor b (10) at non-singleton dimension 3
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, 6, -1] because the unspecified dimension size -1 can be any value and is ambiguous
Haven't seen that error before, sorry.
How do I start ?
hi from myanmar
Write a letter to the witcher
You can use it as HTTP API on deepinfra.com
how do i use whisper?
Here's the github repo: https://github.com/openai/whisper
Once you actually install it with pip implementing it in your code is really simple
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
trying to mess around with cuda
found out our server has an ancient GPU that isn't supported. So I set up port forwarding to do some testing from my own computer
I'm not sure I'm seeing any improvement from doing transcribe().cuda()
I'm wondering if I might be going about this the wrong way
I'm transcribing audio samples from an IVR
so 5-10 second long clips grabbed from the IVR over http
The examples I saw on github of other people's projects, specifically the one that functions as a rest api. And I throw a Lock on the transcribe method.
The issue I had yesterday with those runtime exceptions were because I didn't put a Lock on the transcribe method
Guess I'm just real confused about how to get better performance out of it. I have an rtx 3090TI
How can I get better concurrency while working with Whisper? Any advise?
First off is it actually hitting your gpu?
When we just have a couple calls coming in, it works alright with an average response of like 4 seconds. but when we got like 50 it goes upwards of 10
How can I check that? I'm looking at task manager, and i dont see any GPU% coming from python
When i had an earlier version of python installed, it told me it wasnt compatible with my gpu and i updated to 3.10
so now i get no such error
makes me think it is using it
but not putting a big load on it
Is your gpu usage increasing when you run it?
no, not in a noticeable way
If you don't end up installing the CUDA Development Tools it will hit your CPU without any logs
ye, I don't think I can link here but let me try?
Just google windows cuda development tools it's the first link
cool
I really appreciate your help. I think whisper is gonna help save us some decent money on call processing
I think whisper is amazing and just got overlooked by their completion models upgrading shortly after
What size model were you using btw?

tiny
Haven't had too much innacuracies. gonna see if i can bump it up with the cuda help
most issues I get are like numbers sometimes dont process correctly
zero dollars and zero cents transcribes to $0.00 and $0.00
lol
switching to gpu shouldn't change anything with what it outputs, mainly just performance
Only real option for increased accuracy is using a larger model
Yeah, but the model should improve accuracy right?
yeah
If the cuda can handle 100 calls at once and keep responses under 3 seconds i can try a larger model
👍 Let me know how it goes
how do i install it then?
scroll under setup on that page
^ It's at the end of the day it's python library so you just use pip pip install -U openai-whisper
ive never installed pip either how do i do that?
Have you used python before?
no...
Probably not a great starting point then. Whisper is a library that you use in code, not a standalone app
ah
I think my PC froze in the middle of that install and now my graphics drivers aren't working lol

lol well. i couldnt really tell if it was using cuda or not
then i think our server started to refuse all my requests out of nowhere
i think i pissed off some firewall
its not making much of a dent in anything
like im doing 40 calls at once, and it's grabbing a bunch of audio samples from the ivr, and my cpu is just going between 3 and 20%. gpu going between 1 and 7%
try going to a larger model just to see where the load goes?
I'm running into a networking issue cause im not on the same network. it doesnt like all these requests for audio samples lol. after a while it starts failing when ffmpeg goes to load. I'm gonna bump it up a couple models
I set it to medium and I still can't tell lol. it says python is using like 1% GPU or lower. and like 15% CPU. with like 40 calls, over the course of 2 minutes, it transcribes 80 5 second clips.
its cool tho, with tiny we would have hands full of little error, medium is pretty perfect
it looks like when it first starts getting requests, it takes like 5 seconds average, but once it gets going, it goes down to under 2 seconds each
is there any kinda verbose mode we can call? I'm fairly certain it should be using the cuda now but its barely noticable
Not really, you could chuck it something longer and see?
Omg
This is the best
Thing i ever saw
Thank you creators
Ur a blessing
@outer scarab i lov y
Gonna try tomorrow. It's a little late over here. Do you know of any benchmarks people may have done with GPU performance with different cards? Gotta make a purchase decision for work
Purely for whisper you might find it in one of the discussions on the repo otherwise I got nothing. :/
thank u but i am not a creator!
Thank you very much for whisper and the github repo !
I try it on windows with a Shadow (GPU Cloud, NVIDIA RTX A4500) and it's soooo Amazing on the large model 🙂
I am trying to separate the speakers in the rendered text, do you have any ideas?
Who is creator
Look up speaker diarization openai whisper
Lots of results on Google
MuseNet
has anyone made improvements to real time transcription?
There's been a few approaches to realtime transcription in the discussions tab of the repo. I know the cpp fork of it has a few examples that do realtime as well
That's basically what I'm doing
I can make a guide later if you guys want.
The key to doing transcription live is to store your audio in a buffer, and send samples to whisper. You can do this without files by using a rest API and serving the samples as bytes. ffmpeg works fine with urls and that's the first step in whisper. I'll share examples later
open ai is epic
Any great full stack engineer (with iOS and web expertise) who want to build something useful that can positively impact the society, must haves, passion, drive, curiosity, work ethic...please DM me with your resume or your friends resume !
id love to help but im not full stack
I have seen many people do this already and it still results in seconds of time wait until you are done.
We're doing this approach to transcribe live calls for an IVR.
When you say seconds, I feel like you think that matters more than it really does
TV subtitles take seconds. And the IVR takes time to do stuff. But when we were sending 100 x 5s samples to it in 2 minutes, it would solve and response in under 2 seconds consistently
its more the acceptable for industry use for live applications imo
this audio is over websocket btw, so its a continuous stream coming into the ivr
@charred barn If you just have a look at whisper-cpp it has a streaming project and it solves the words but has a delay. Each word would need to be returned within 300 ms after it is done.
This is what I am referring to.
The best they can do is 1.2 s from what i saw
so what's a sub 1 second delay an issue for? if we were just doing 1 call, it solves stuff real quick
and im also pinging from the opposite side of the world right now cause our server doesnt have a gpu suitable
1.2 second delay on a phrase will compound over time
no it wont
will lag over time
@muted axle how do I use this
I would think you would have an issue over time with the slight delay. Are you interacting with the tokenizer?
I got timed out
I'm not interacting with whisper beyond just calling transcribe.cuda
you mentioned that delay was an issue. But in our application we have a suitable delay threshold
i was looking at this version of whisper that allows batch processing
i was gonna see if i could get an improvement overall from delaying 1 second to collect audio before sending to whisper
Whisper is really good, but this limitation has prevented me from trying to implement it.
you can dm your link if you want
Especially when other solutions are so close and are fully developed.
i wish they'd add a leveling system here so we could get trusted enough to react to stuff or add links to discussion lol
ya would be good
what are you trying to use it for ?
Being able to react instantly from incoming audio in conversational style
but that doesnt really tell me what your project is. like, what are you trying to achieve? I dont see what a very short delay would prevent
The fastest solution I have used so far is riva
or triton from nvidia
people talk fast
what kind of response times are you getting with those?
reaction times and delays are what makes it feel clunky or not
just how you would talk with some one
if some one takes 1.5 seconds to respond to you every time during a live conversation that isn't natural feeling
not to mention any other latency added for other computation needed.
That is why it needs to be 300 ms or faster
That makes sense
Seems like whisper isn't targeted to this use case, but it is so good not to try
it'd be nice if they added native support for websocket audio
just send me back all the words 1 at a time lol
the whisperX
you know what'd be even nicer? if i didnt need to use ffmpeg at all
oh ya
i already can serve the audio in the exact format that its encoding it to
but it still calls it every time
the model is good but the rest of the software isn't as useful
i love streams like the unix/linux way
so ffmpeg isn't a big deal if it isn't slow but if its slow then it needs to be removed.
every fraction of a second helps
yep
How to enter whisper?
Whisper is a model used for speech to text transcription. You can find instructions on how to use it here: https://github.com/openai/whisper. There isn't a website or anything of that sort though if that's what you are asking.
Thanks
"I hope this message finds you well. I wanted to discuss the topic of WHISPER, a rapidly growing field that combines engineering and technology to solve complex problems. Learning more about it can be a valuable investment for personal and professional growth with numerous career opportunities. I'm happy to provide information and resources through discussion, online resources, or workshops/conferences. Let me know if you have questions or would like to chat further. Best regards."
Can anyone help me get the phone code?
now what should i do
You talking about my code?
i wanna know whats whisper?
can anyone help me with how to use this gpt bot??
This is the whisper channel. Probably want a gpt channel.
Whisper is model used for speech to text transcription
Has anyone tried integrating Whisper into Audacity, maybe via plugin?
Has anyone tried integrating Whisper into React App?
How can I use whisper
Dude thats crazyQ!!!
thx 🙏
Wow, that's crazy !
Is whisper available as a Discord bot?
yes
So cool
whisper - open source is the way
hii
AWS Lambda functions can’t run continuously for more than 15 mins.
Thanks for the response! We will be breaking up the inferences so that they are short enough to be run on lambda.
yes, but only to upload the audio file and return the results via API
Helmo
If I am correct Whisper is what was used to give ChatGPT human like responses correct?
No, whisper is for voice to text
oh
Hello everyone, hope you are well
I have a question regarding whisper, what's the best way today to modify the formatting of the text output ?
I personally just parse the srt file by hand but pretty sure there are also libs for that
To choose right output format is the best way. There are plenty of them. I am using it from cli, not (yet) from python script.
Thanks for the response, I'm using Python but I'm unable to find a command to help with that
what do you want to transfer it to?
Thanks for the response, CLI means Command-line interface right ? what command could help with that ?
to a text or a word document, I just want to remove the word "speaker 1" and "speaker 2" and instead have whisper put the text from speaker 1 in italique and no change on speaker 2 with some return to line in between speakers
I would also like to remove the timestamp
wait since when does whisper support multiple speakers 
sorry, I did not give an answer with the full context and I was lost in my own head. I'm tinkering with another module called diarization which helps by identifying speakers
I should probably seek help on that specific module rather that with whisper itself
can you give a sample output and we can move to DM if you like since this doesn't really have much to do with whisper itself 
sure
Hey guys, I am interested in building with Whisper.
I am trying to transcribe my calls. Would you guys say it's best to build it on top of something like Twilio flex or is there a way for OpenAI to listen in and transcribe calls based on a Chrome extension or something like that?
ohio radio is down radio.garden/visit/columbus-oh/oHcdAaW1
HEY
AdamAI: The first AI-powered video search engine uses Whisper 🙂 Check it out in api-projects page
Demo
you could probably grab stuff with an extension, but you still need to spawn a local webserver running whisper
not really familiar with twilio flex 
Looks cool
thats freakin EPIC
Sounds like a pyramid scheme and I'm not sure if that's the right channel 
lol
hmmm any one interested in my gaming server?
what game
fortnite, fallguys, minecraft, rocket league and roblox
yes minecraft
wanna join then click on my profile and about me
done'
hey guys im new here
but td I saw lots of cool projects
but for this Reverse Video Search project AdamAI did any of yall experience an issue where sometimes results dont show?
but after trying again a refreshin it works
but why is that?
Wow very cool
anyone can invite BlueWillow&
Hi how do I start using Whisper and where is it located?
Do I have to be a tech person to install it?
you can download the necessary stuff off OpenAI's github for Whisper
nobody and everybody is a tech person, it's just a matter of googling the instructions to set it up
How can I use whisper?
I aint gonna use whisper just cause u told me to.
Any idea when GPT 4 is releasing or the newBing
[mp3 @ 000001ed70287cc0] Format mp3 detected only with low score of 1, misdetection possible!
[mp3 @ 000001ed70287cc0] Failed to read frame size: Could not seek to 1026.
C:\Users\18186\AppData\Local\Temp\Vocaroo 1fhNXMiDDXG6_gchawfnh2335b7b68aaefaa9608d380451ceb5d05a9f5109.mp3: Invalid argument
anyone encounter anything like this?
i cant get whisper to read my mp3.
is there a website where i can try whisper?
InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
If I am not mistaken local Whisper supported .ogg sound format
If it is possible can you please add .ogg to support? For the example, .ogg is the default format for the Telegram audiomessages.
||Yes, I know that I can convert file, however that's tough||
iirc, Whisper uses FFmpeg to convert the input audio anyway, since it has to be in a very specific format before converting
with .ogg I think you can simply just change the file name to .wav
Hooray! Very happy to hear OpenAI's announcement about Whisper API.
anyone play around with the new API yet? wondering if its possible to get the "enable_word_time_offsets" param working?
Haven't messed with it but I don't see a way from the docs :/
the docs don't mention ANY params at all, though I think it has some
getting a very basic, text-only response
Could try setting response_format to verbose_json or vtt?
I gotta set aside some time to mess with it, my gpu could barely run the medium model. Happy to hear verbose_json has decent info
its phrase-level offsets but its something
gahh. yea miscommunication, re: translate and transcribe being the same endpoint
the good info is in the translate part of the docs
oh no wait, there it is in transcribe too.
damn ok, guess I missed this
anyway thx!
Congrats on launching the Whisper API. Maybe I can ditch the 10 AWS-T4 server instances I've been running. lol
Haha same here. A whisper API is... actually quite industry changing. It's pretty damn fast.
I haven't benchmarked the API yet but I'm currently averaging about 4.7 seconds of audio transcribed per second on a Nvidia T4 GPU.
Im getting
const openai = new OpenAIApi(configuration);
const completion = await openai.createTranscription(fs.readFileSync("Fears.mp3"), "whisper-1","this is a youtube comentary","text",1);
The whisper API service is 3.3x faster and 50-75% cheaper for me.
Did it look like i was talking to you??
Thanks for sharing this. I was looking at hosting yesterday and the API popped up today. Seemed like the whisper api was way cheaper but I'm glad to have confirmation.
I hope they add a model parameter. I find the medium model is the ultimate bang for buck in terms of quality and speed.
Huuh???
PSA: transcode your files to mp3 before uploading them to the transcription endpoint. In my testing, mp3 transcription is 100-125% faster than WAV, 80% faster than webm.
It takes 4 seconds to transcode 1 min of audio. Blazing.
Hey, does the api also have a translation function or is it limited to transcribe?
is the translate better than google/aws translate?
Use the translations endpoint.
Could be your internet connection. Internally at least with whisper-asr-webservice, ffmpeg decompresses to uncompressed pcm_s16le
oof i am a dummy. internet connection is probably a major factor - the wav is 4x the mp3
Guys, how do I get access to GPT3.5 model in the API playground?
Hmm, even taking that into account it seems like mp3s run a little faster. For reference, a 5:33 min mp3 (5.5mb) takes 22 seconds to transcribe. The same file in wav format (21mb) takes 52 seconds to transcribe. On my current (terrible hotspot) connection, internet speeds account for roughly 19 seconds of overhead. That's still a 11 second delta. Would appreciate seeing results from others here.
I'm getting the same performance +/- 0.1 seconds
Thanks a ton.
Great tip
when will have a tts model?
hey there folks, i'm working on a project that requires me to transcribe audio live from the user input and store it in a variable. i'm a noob regarding this and would like to know if there's any way to achieve this
hi all, I feel like I'm missing something really obvious. How am I supposed to structure my request using generic fetch?
const form = new FormData();
form.append("file", fs.readFileSync("audio.mp3"));
form.append("model", "whisper-1");
const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: form,
}
);```
I keep getting
error: {
message: '1 validation error for Request\n' +
'body -> file\n' +
" Expected UploadFile, received: <class 'str'> (type=value_error)",
type: 'invalid_request_error',
param: null,
code: null
}```
hi guys, for transcribing 21seconds speech it's taking 5 seconds. Is there any way to decrease the latecy?
it's making it unusuable
5 seconds for 21 seconds is way too much
Is live transcription with the new API possible and if yes then what’s the best resource to learn about it? Has anyone done it yet?
What type of file are you uploading? MP3 seems to be the most performant
mp3 only
For 90s clip of audio, it takes me roughly 5s of processing
well I guess it can process fast in terms of maximum time but the minimum time is not low enough
I'm processing a 5s clip in roughly 900ms
i guess we can expect it to improve only overtime as I presume it's the model issue itself
I think it's just due to the nature of how the system is setup. It's processes faster with longer audio files since overhead plays less of a role
There's probably a good ~300ms or so for a job to be created and assigned to a ready GPU
yes that's what i was thinking
but it rather defeats the purpose of STT
it should ideally be near real-time
i'd love that too
I hope OpenAI will do it eventually
right now for my project I've been sending requests to whisper every 500ms or so and then taking the latest response when it's needed
but it's pretty unideal since it can lose words at the end
yeah
potentially thinking of using the prompt with the current translation to let me send shorter segments (rather than the whole built up buffer)
On your tatas
Hi all
I want to use whisper api but I get this error when I send form data to the api, anyone knows what's the issue?
Hi
Can you share you code? I have some issues with the api
Is it possible to Train the chatGpt and get the trained data from ChatGpt by Api Hit
Try omitting the content-type.
which content type?
https://platform.openai.com/docs/api-reference/audio
in here they mentioned about multipart/form-data
An API for accessing new AI models developed by OpenAI
Content-Type: multipart/form-data in your js
I have this
you can see in the second image
formData will automatically add the content type and the mutipart boundaries required to do a form post.
what's the problem in here?
Remove "Content-Type": "mutipart/form-data"
Thank you so much, it's working now
Having an issue where I'm only getting the first sentence of a transcript back...
INPUT
"key": "file",
"data": "IMTBuffer(369427, binary, THE BINARY DATA",
"fileName": "file.m4a",
"fieldType": "file"
OUTPUT
"data": { "text": "Scientific Advertising, by Claude C. Hopkins." }, "fileSize": 56
More context: Totally Noob way to run whisper i know - (via make.com http module) -- Seems that OPEN AI is requesting the file in binary, so I'm using an HTTP model to convert the file into binary, then sending via the api... if I just put the file URL in the http request it does not seem to work at all
however, this only returns the first sentence of the file for whatever reason
an alternative approach - does not work
error i recieve: 1 validation error for Request
body -> file
Expected UploadFile, received: <class 'str'> (type=value_error)
Trying in postman - no luck
must have been an issue with the file - works fine with an MP3
Random question - if it's an interview between two people, can you instruct via the prompt to label the speakers?
no there's no speaker diarization functionality offered so far
Thanks, makes sense - would be nice to instruct it to output in Markdown, though that could be done via GPT of course
anyone else getting a huge error when using the nodejs library to get a transcript?
looks like its just printing out the request
trying to use fetch just gives me 400 bad request
yeah, the nodejs usage of createTranscription seems to be broken
is the entire api broken or am i doing something wrong?
formData.append("file", fs.createReadStream("/home/simon/Projects/resident/mpthreetest.mp3"));
formData.append("model", "whisper-1");
formData.append("language", "en");
const res = await fetch("https://api.openai.com/v1/audio/transcriptions", {
headers : {
Authorization: `Bearer ${apiKey}`,
},
method: "POST",
body: formData,
});
console.log(res)
let data = await res.json();
console.log(data)```
try adding a third argument to the first append, that includes the filename: { filename: "test.mp3" }
nope
anyone have working fetch code i could see as a reference?
adding content-type gives a different error
i got things working client-side only, still trying to translate it to server-side code, hitting similar errors as you @hidden summit
what does buffer equal in this case?
got it
How can I access the Whisper AI through the OpenAI API? The documentation isn't too clear
import { Configuration, OpenAIApi } from "openai";
import fs from 'fs';
const mp3 = fs.createReadStream("audio.mp3")
const resp = await openai.createTranscription(mp3, "whisper-1");
console.log(resp)
Has anyone managed to pass a .mp3 from an external host?
Hi, how do I pass response_format="srt" in python?
import openai
import unicodedata
audio_file= open("./audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)
anyone experiencing the below error when using the whisper API?
"type": "server_error",
"param": null,
"code": null
}
} 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 6fa27ee8db754a9596de625d704367ce in your email.)', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 02 Mar 2023 22:43:44 GMT', 'Content-Type': 'application/json', 'Content-Length': '365', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'X-Request-Id': '6fa27ee8db754a9596de625d704367ce'}
contact our help center at help.openai.com and include the request ID (6fa27ee8db754a9596de625d704367ce) in your email.
im trying to now'
it gives an odd error though
sometimes it works, but sometimes it gives the error TypeError [ERR_INVALID_STATE]: Invalid state: chunk ArrayBuffer is zero-length or detached
anyone else getting this error message (401) using the whisper api : You didn't provide an API key. You need to provide your API key..., I get this error using curl requests or with the axios library (trying to get it browser compatible without nodejs-specific code)
would like to talk to anyone that has managed to make a whisper api curl request, the 401 message above does not go away...
does Whisper support text-to-speech synthesis ?
Gm. So i have a customer who has possible multiple use for the project. Will the api alone provide access to various use cases?
both .webm and .wav being created by the standard Chrome MediaRecorder as audio/webm or audio/webm not working. Whisper says invalid file format. Is it due to codec used on MediaRecorder side?
Got it. Issue resolved by setting the filetype and filename. curl_file_create($voiceFile, $fileType, "audio.wav")
What are you using to make the request? Running into the same issue with little success, seeing a code snippet that works would be great
You could try the following code.
import requests
headers = {
'Authorization': 'Bearer sk-***',
}
files = {
'file': open('./audio.mp3', 'rb'),
'model': (None, 'whisper-1'),
'response_format': (None, 'srt'),
}
response = requests.post('https://api.openai.com/v1/audio/transcriptions', headers=headers, files=files)
print(response.txt)
How do I send prompt to whisper api?
Where can I get the secret key for whisper model?
You can use same secret key that is available for your openai account
@ionic valley OK, thank you, I go to try it now
Hi, anyone had any success in using power automate to call the whisper api?
Does Whisper support timestamps for words as well?
Hey guys! I'm trying to do a POST request to OpenAI Whsiper in React Native with Expo. I get an audio recording in an .m4a file, but the response I get from the API is
{"error": {"code": null, "message": "The audio file could not be decoded or its format is not supported.", "param": null, "type": "invalid_request_error"}}
Any ideas? I don't think this is related to the format, but maybe I'm missing something?
Okay figured it out. Will leave it here for someone else having troubles.
When I was recording the audio, I had put this options to the Expo Audio object:
ios: {
extension: ".m4a",
audioQuality: Audio.IOSAudioQuality.HIGH,
sampleRate: 44100,
numberOfChannels: 2,
bitRate: 128000,
},
Somehow this doesn't work well with OpenAI. Changed it to this one (as in the Expo docs):
ios: {
extension: ".m4a",
outputFormat: Audio.IOSOutputFormat.MPEG4AAC,
audioQuality: Audio.IOSAudioQuality.MAX,
sampleRate: 44100,
numberOfChannels: 2,
bitRate: 128000,
linearPCMBitDepth: 16,
linearPCMIsBigEndian: false,
linearPCMIsFloat: false,
},
Working good now!
Anyone have code that can feed a remote url into whisper?
I don't believe that's an option as they require you to send the data. You'll have to fetch it yourself and send it
check out whisperX on github
is there a way to separate speakers with the api?
There is not an option for speaker identification at the moment.
Darn
No, speaker diarization is not supported by whisper. You'll find a few projects on the discussions tab of the repo that handle it but that means running it locally
the response_format can be used to generate a SRT file which should have timestamps
Iirc it's phrase level timestamps not word level sadly 
let file = await response.blob();
formData.append("file", file, "audio.m4a");```
btw
Yup that should work, probably also add logic to segment it if the file is too large
Wondering if we can co-opt this thread that was meant for the open source version of Whisper and include the API version of Whisper? Or start another thread to avoid confusion...
co-opting it should be fine, it was one of the least used channels before
Did any one know how to contact openai sale?
Business api is like 250k I think
Anyone has got a way to search for common-password.txt
just looking for a common word less than 16 characters
Using playground's whisper just gives me whitespace when uploading MP3s
have you found out yet?
Nope
I'm getting this error while trying to use the nodeJS library
RequiredError: Required parameter model was null or undefined when calling createTranscription.
Code:
const openai = new OpenAIApi(configuration);
openai.createTranscription({
file: mp3Content,
model: 'whisper-1',
responseFormat: 'text',
}).then((response) => {
console.log(response.data);
message.reply(response.data);
}).catch((error) => {
console.error(error);
});
I'm using the latest library so idk what could be causing the issue
Replace above code with this code and try:
const openai = new OpenAIApi(configuration);
openai.createTranscription({
file: mp3Content,
engine: 'whisper-1',
responseFormat: 'text',
}).then((response) => {
console.log(response.data);
message.reply(response.data);
}).catch((error) => {
console.error(error);
});
Getting error 429 @ionic valley
Hey, I've been using the Whisper API for a bit now, but for me none of the calls to the transcription endpoint seem to be logged in the Usage panel 🤔 (although I can see API calls to other endpoints just fine), is anyone else experiencing the same?
LIquefaction induced failure of shallow foundations
anyone experimented with temperature and beam_size? I have various repetitive gaps in my transcribes, and I want to "fix" it. Anyone had the same problem ?
Code bot chat GPT
You could specify prompt in the following way.
import requests
headers = {
'Authorization': 'Bearer sk-***',
}
files = {
'file': open('test.mp3', 'rb'),
'model': (None, 'whisper-1'),
'response_format': (None, 'srt'),
'language': (None, 'en'),
'prompt': (None, 'The transcript is about a message \
sent From Sarah Cranmer To Laila Alizadeh \
regarding travel arrangements.')
}
response = requests.post('https://api.openai.com/v1/audio/transcriptions',
headers=headers,
files=files)
print(response.text)
I think the types of the nodejs library are messed up for whisper
const { Configuration, OpenAIApi } = require("openai");
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
const resp = await openai.createTranscription(
fs.createReadStream("audio.mp3"),
"whisper-1"
);
The example given gives me a typing error:
Argument of type 'ReadStream' is not assignable to parameter of type 'File'.
https://github.com/openai/openai-node/issues/77 ah ok it's a known issue
There is a way to stream audio?
That would be insane
guys i had a question and i dont seem to find the answer anywhere , the question is that i used my phone nymber multiple times for other email ids but i dont have those emaild ids anymore and i cant seem to use my phone number anymore so if anyone could help me about this plsssss help me
were you able to fix it
does whisper have a max file size on the server side?
I'm sending request to openai whisper api it said write model but i writed model in code.
Here is my code:
fetch("https://api.openai.com/v1/audio/transcriptions", {
method: "POST",
body: {
contentType: "multipart/form-data",
filename: file,
model: "whisper-1",
},
headers: {
Authorization: `Bearer ${this.token}`,
model: "whisper-1",
},
})
.then((res) => res.json())
.then((json) => {
if(this.debug === true) console.log(json);
return json;
});
How long does it take for whisper to transcribe something for you guys? For me it's taking like 3 seconds (using mp3), which for my purposes is pretty slow (would need it to be halved or so)
What's the lenght of your audio file ?
Does it support dual channel transcribing?
.
I'm dealing with large audio file only, not sure it helps for you. for an audio file of 1hour it takes me 10 Min.
what's the use of "temperatue" and "beam_size" ?
you can use Prompt for whisper ? Does it impact the integrity of what you are transcribing ?
is the prompt usage only through API or does it work with Whisper natively ?
@inland epoch can you check this?
It's for a conversation, so anywhere between 1 to 15 seconds
The file in question was like 3 sec
new here, so this is just my first reading...to me it looks like your command is trying to use a lib, that is looking for modern cpu features (for example AVX2) that perhaps your cpu/gpu does not support that. Have you installed an nvidia card and it's drivers/libraries? Might check the path mentioned /usr/lib64-nvidia is populated with expected stuff and is part of your path
you running that with gpu or tpu accelerator?
yes thx
How to fix could not parse multipart form error?
@restive beacon Depends on what you're using to send the request and how you're formatting it.
Okay
@restive beacon I see you posted code up above, seems like you're using JS and Fetch. Assuming you are pulling file from an HTML Input element, you could try something like this:
var inputElement = document.getElementById("myHTMLInputElement");
var file = inputElement.files[0];
var data = new FormData();
data.append("file", file);
var params = {
model: "whisper-1",
};
data.append(JSON.stringify(params));
fetch("https://api.openai.com/v1/audio/transcriptions", {
method: "POST",
body: data,
headers: {
Authorization: `Bearer ${this.token}`,
},
}).then((res) => res.json())
.then((json) => {
if(this.debug === true) console.log(json);
return json;
});
Node JS then?
nodejs should be something like this:
const fs = require('fs');
const fetch = require('node-fetch');
const FormData = require('form-data');
var data = new FormData();
data.append("file", fs.createReadStream('example-file.json'));
var params = {
model: "whisper-1",
};
data.append(JSON.stringify(params));
fetch("https://api.openai.com/v1/audio/transcriptions", {
method: "POST",
body: data,
headers: {
Authorization: `Bearer ${this.token}`,
},
}).then((res) => res.json())
.then((json) => {
if(this.debug === true) console.log(json);
return json;
});
@restive beacon Actually, node's version of fetch and FormData require some special treatment, try this:
import fs from 'fs';
import fetch from 'node-fetch'
import FormData from 'form-data';
var testfile = fs.createReadStream('./audio_only_spanish.wav');
var data = new FormData();
data.append("file", testfile);
data.append("model", "whisper-1");
fetch("https://api.openai.com/v1/audio/transcriptions", {
method: "POST",
body: data,
headers: {
Authorization: `Bearer ${this.token}`,
},
}).then((res) => res.json())
.then((json) => {
console.log(json);
});
This is commonjs
the second one is with "type": "module", in the package.json
@restive beacon I just tried the second one I posted with my own API-Key and I got a proper response:
{
text: 'por tenernos la confianza de venir hasta acá, de invertir en su viaje del español. Nosotros aprendimos de ustedes y creo que ustedes aprendieron un poquito de español de nuestra ciudad y de Querétaro. Así es, y bueno, Querétaro es una ciudad muy bonita. De hecho, vamos a hablar un poquito del estado, no sólo de la ciudad en este episodio, pero la verdad es que hay muchos otros lugares en México que vale la pena conocer. ¡Gracias por ver el video!'
}
I know
And
Thanks so much 🙏
hmm, sorry sergio, I got same initial errors about tensorflow libs, in colab with gpu, but they were not blocking me. the invalid argument about ae.mp3 and the low confidence that it was an mp3, and the failed to read the expected frame size says ae.mp3 might be corrupt.
It's for setting prediction window, I want to narrow it a little bit to have more frequent phrase transcribed
1
Do you guys know if it's possible to use the "Whisper model" (openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt) as an orthographic corrector?
What exactly do you mean by orthographic corrector? It won't help with sentence structure but could help with spelling? You'll probably want to use medium or large though
Folks, is there a way to just detect the language with Whisper API?
There was about 50 seconds of silence at the ending in my recording, the only audio was "Do you like computers" and this was what it transcribed. "Do you like to use computers? Well, yes. I'd like to show you... the product. First of all it offers scans easy. It scans to a specified area. First of all, it is physically novel. There is no woman. If you had the opportunity, you could Project a virtual church scholar. or actually a university student... to engage in friends with one another... "
Is it possible to pass url to whisper so that it download the audio there instead of uploading? My workflow is that someone upload to cloud storage. Downloading from cloud storage and then uploading it to OpenAI feels like a bandwidth wasted. 🤔
You have to pass the file in, there's no way to pass in a url.
Anyone have an idea when ChatGPT will be aware of whisper api so it can provide help with it ?
Likely not for a while. There's been zero mention of them updating the knowledge base so far
hii
Hi all. i found that the Whisper API doesn’t work properly when returning anything other than JSON, ie. srt, vtt. An OpenAI.error is returned with a HTTP code of 200 (the correct output is actually returned, just in the error). Seems like the API isn’t able to handle the other accepted formats
Anyone find similar problem?
I have not found that problem with the Whisper API. It returned to be both SRT and VTT correctly during testing last week Friday.
Confirmed that I am seeing the same today.
I haven't seen a roadmap yet, but I would assume they'd expand the number of languages that can be transcribed at some point.
playing around with the whisper api that openai released a week or so ago. Wondering how I can get timestamps working as in the selfhosted Whisper?
do anyone know why recognize_whisper function in speech_recognition of python lib not working, seem like it not generating any output?
Can the Whisper API handle video input, or only audio?
Thanks. Maybe I can use Transloadit's audio extraction API in the middle.
Hi, Do you know if there is any company that offers the services of a chatbot but with integrated chatgpt technology?
Thanks
What sort of turnaround times per minute are people getting with the Whisper API?
or some way that I can use chatgpt technology to train it with my website and other pdf documents, and integrate it into my website like search engine
Hi guys, a question, how would you make whisper create a vtt file if said file weighs more than 25MB? 
You'll need to split the file into 25mb chunks and then adjust the times in the response based on the duration of the previous ones
so, no prompt usage ?
I haven't tried messing around with prompt but i'm pretty sure it's won't effect the format style. It should effect the actual text
i see, ty
Anyone tried combining whisper api and pyannote library to get speaker diarization?
Anyone following the Windows 11 voice command preview? Pretty cool the way it works, the show numbers/grid thing is really nice, I imagine you'll be able to relabel these numbers with some text once you know the few you're looking for, it would be great if it could just monitor what you're doing for the day and give you the voice commands, then let you tailor the keyword mapping etc, it could even calculate the amount of time it took you to complete certain tasks and suggest where voice commands would have been faster.
Should be able to build something similar with Whisper... also handy if you want to talk to ChatGPT 😀
Has anyone else noticed Whisper get stuck in loops when run on GPU? I am running it on the gpu on my local system, and it gives basically perfect results when run on cpu, and on gpu I'm seeing the expected device system usage and compute-time speedups, but the output is filled with stuff like this, as one example:
[03:33.500 --> 03:34.500] Oh man, no way.
[03:34.500 --> 03:35.500] Oh man, no way.
[03:35.500 --> 03:36.500] Oh man, no way.
[03:36.500 --> 03:37.500] Oh man, no way.
[03:37.500 --> 03:38.500] Oh man, no way.
(x12)
The above text does actually appear in the audio, but only once and about a minute earlier in the clip -- the above timestamps have been pushed later because of the correctly-transcribed text occurred in the interim.
whisper api doesn't have the ability to do speach diarization
instead u can use something like pyannote.audio and then pipe the broken down chunks to whisper
Is Whisper able to translate from English to other languages? What would be the best way to turn existing English subtitles into another language? Would GPT be better?
Pretty sure whisper can do other languages to English, not the other way around. If you already have subtitles then you wouldn't need whisper as it just transcribes. You'd probably want to use gpt or a translations service like deepl or google translate
got it, thank you! I've found Google Translate to not really be the best in terms of accuracy, so maybe I'll give GPT a shot. Thanks!
Hi all, does the Whisper API support word level timestamps? If not, is it on the roadmap?
is whisper as good as azure STT combined with azure NR for avg quality audio ?
JOE
I've been struggling with this for the last few hours. This keeps returning "message": "you must provide a model parameter". Anyone know where I'm going wrong?
$url = 'https://api.openai.com/v1/audio/transcriptions';
$file_path = $_FILES['file']['tmp_name'];
$file_name = basename($_FILES['file']['name']);
$file_type = mime_content_type($file_path);
$file_data = file_get_contents($file_path);
$body = array(
'model' => 'whisper-1',
'file' => array(
'name' => $file_name,
'type' => $file_type,
'bits' => $file_data,
),
);
$headers = array(
'Authorization' => 'Bearer ' . $OPENAI_API_KEY,
);
$request_args = array(
'method' => 'POST',
'headers' => $headers,
'body' => $body,
'timeout' => 0,
);
$response = wp_remote_post($url, $request_args);
Any clue what might be going wrong when i try to use the createTranscription function?
Whenever I use the createTranscription function i get a "TypeError: localVarFormParams.getHeaders is not a function".
I'm in a react native project, recording the microphone, so the code to trigger whisper is simply:
const recordingURI = recording.getURI()
const response = await fetch(recordingURI)
openai
.createTranslation(response.body, 'whisper-1')
.then((response) => {
//do something here
});```
What am I doing wrong here?
ChatGPT has helped me ok with Whisper. I was able to create all the scripts I wanted today
It was a bit wrestling. But with python-openai lib it is really simple. So apart from the request itself, chatgpt can help you with everything else
What do you want to do with the whisper reponse? I mean, like where do you want it to go?
is there any way to explicitly tell the whisper API the language spoken in the audio, rather than rely on it being auto-detected?
Yup it's a param you can supply https://platform.openai.com/docs/api-reference/audio/create#audio/create-language
ah nice, thanks @void egret! i kept looking for something that looked like a proper API reference page and couldn't find it on https://platform.openai.com/docs/guides/speech-to-text - guess i didn't look hard enough. cheers!
An API for accessing new AI models developed by OpenAI
what is whisper?
Whisper is a speech to text model developed by openai. Given audio it generates a transcription of it
I’ll reiterate
Did anyone test whisper against for instance azure noise reduction + STT?
Calibrated audio likely boost STT performance
I've exclusively used whisper, you might find some performance tests on the discussions tab of the repi though
anyone know how to solve this error? error': {'message': 'Maximum content size limit (26214400) exceeded (68638790 bytes read)', 'type': 'server_error', 'param': None, 'code': None
How large was the file?
tried this before, haven't been able to get it right, the speaker just got messed up when I combined both
anyone knows how to circumvent the 25 MB file size limit?
Trying to work it out right now as well by splitting the file into segments of 24mb. Haven’t figured it out yet.
for some reason the transcriptions in vtt arrive truncated in my heroku deploy but perfect on my local pc, any idea what it could be?
when I code a English file into Chinese, here is the code I entered "!whisper "test.mp3" --language Chinese", but the result is only few Chinese others are English, is there anyone can help, BTW I don't know python at all
just got it worked, split chunks into 25 seconds, haven't really looked at accuracy, but it works on file larger than 25 MB
Do you mind sharing code?
def transcribe():
file = request.files['file']
audio = file.read()
try:
audio_file = BytesIO(audio)
audio_segment = AudioSegment.from_file(audio_file, format="mp3")
# Split the audio file into chunks of 25 seconds
chunks = audio_segment[::25000]
# Transcribe each chunk and concatenate the results
results = []
for chunk in chunks:
with BytesIO() as chunk_file:
chunk.export(chunk_file, format="mp3")
chunk_file.seek(0)
chunk_file.name = "audio.mp3"
transcription = openai.Audio.transcribe("whisper-1", chunk_file)
text = transcription['text']
results.append(text)
shoutout gpt3.5 turbo for that code
im doing multiple request at same time i think a limitation truncated the response but why it doesn't happens in my local pc ? 🤔
search on google github Majdoddin nlp (cant post link)
its a project built with pyannote audio and whisper might be able to help you figure out whats going on with yours by digging in theirs
hey guys. who do i speak to for feedback on whisper?
okay. what is this? i got muted for reporting a bug with the API?!?!? this is absolute bullocks! LOL
Anyone experience whisper hallucinating on empty sections? is there anyway to prevent this? maybe with vad filter?
i literally just posted the same question, including all the hallucinated responses... but i got automodded.... lol... because one of the the responses was a 'high risk word'
sorry to hear that
you got any headway on that?
i haven't. but most of my hallucinations are due to short audio segments. i'm thinking of either combining them into the longer segments or just drop them altoghter
do you get things like www dot globalonenessproject dot org ? i got lots of those...
can i please create a thread to share all the hallucinations? they are so funny... i had another one here: "Produced & Uploaded by Houthi Movies"
and another.... "For more information on Geography Now, visit geography(.)nsw(.)ca"
Anyone know if there is only one or more whisper models available via api?
Afaik only whisper-1 is available rn
come on mods... who can i chat to for feedback? lol
I get things like "subs by broth3rmax"
Hi, does anyone knows a project/algorithm that could reliably detect when someone finished speaking using real time transcription with whisper?
I want to make an API call as soon as the user finished speaking/at the end of his sentence
no
But I am leaving I have had to change too many things And then they want my phone number which I'm not giving so I'm leaving this server id dumb
Does anyone have language detection examples to use?
can somebody help me with the whisper thing I want to to take audio from the microphone and turn it into text for python btw
you want to google diarization
there are a few libraries out there. i've tried a few and they work pretty well, but my computer do not have capacity to process large files. it took 1hr for pyannote to diarize 20mins of audio!!! so i now diaritize using google api remotely and then pass the segments to whisper API for processing.
one thing i noted tho, the whisper API (unlike the whisper library which you process locally) tend to make things up when the audio is not very clear, or when the audio segment is too short.
you can also convert the files to mp3, then if you're performing diarization, a longer audio file with complete sentences will give you better results
I have a MP3 player, maybe I can help??
insyaallah brother
"please like and subscribe.... bye bye bye bye bye......." this is so bad.....
actual transcript in the video / audio during the 17.3 seconds were:
[speaker 1] okay, let's have a look. share screen.
[speaker 2] it's not popping up for me.
[speaker 1] you haven't got that?
[speaker2] nope
[speaker 1] okay let's try again.
[speaker 2] and if anybody's listening, audio to the recording afterwards i will walk and talk you through this.
[speaker 1] have you got it now?
[speaker 2] okay, we've got it, we've got it, we've got it.
I'm trying to hit the Whisper API from javascript. Is the correct order of endpoints to hit the files endpoint first to upload my audio file, then to pass the uploaded path to the transcription endpoint?
If so, what value should I pass for the purpose parameter to the files endpoint? The examples for the files endpoint all have purpose="fine-tune in them, but that doesn't sound right for the purpose of transcription.
You shouldn't be using files at all
Yeah I think you just send the file directly from your computer to the whisper endpoint.
Can you hear me
Hello, I'm a novice with a few good computer skills and I've started a project to dub an English video into French. It is a mentorship accessible on YouTube.
I used Whisper to transcribe the 3 hours long video.
I had a few mistakes, like for example, "How" which became "So" according to the model.
The voice is that of a man who speaks American with a strong accent. I have the impression that the "tiny.en" model gave the best results. I didn't try to tune the settings, but do you think it could be improved for American?
The next part of the project will be the translation, either ChatGpt or Deepl I don't know yet, and I wanted to know if you had any advice on which tools to use.
Text to speech, time-stamped? to then integrate them to the video. Do you think that it is possible to automate it with the time-stamping.
I hope I'm not going off topic. Thanks in advance, Rom
My project is to make accessible to people who have difficulty with English, a series of videos with always the same person, the same voice.
Is it possible to train Whisper on a particular voice?
For example, to translate perfectly several passages or a complete video and to provide it to Whisper as an example?
Does anyone have a github repo recommendation for generating subtitles?
I wonder how you get the lines to be properly timestamped. Is that something you do during the transcription, or afterwards?
Both seem complicated
But not the whisper API, I think. I'm not sure if my laptop can run the whisper locally
Specify the response_format in the request to the api, it supports json, text, srt, verbose_json, or vtt
lol just saw you were asking the same
Run it on Google colab for example
Ahh so just send the pure audio data, without needing any concept of something that exists on disk. Ok! Thanks
if any1 has good experience in tkinter module pls hit me up, its a basic help ( im trying to build a GUI for tic tac toe)
Idk if you're using curl or an SDK but here's the endpoint I think:
curl https://api.openai.com/v1/audio/transcriptions \
-X POST \
-H 'Authorization: Bearer TOKEN' \
-H 'Content-Type: multipart/form-data' \
-F file=@/path/to/file/audio.mp3 \
-F model=whisper-1
From https://platform.openai.com/docs/api-reference/audio/create
An API for accessing new AI models developed by OpenAI
Was hoping someone could give some insight into an error I keep getting:
Whisper data from response: {"error":{"message":"Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']","type":"invalid_request_error","param":null,"code":null}}
iPhone:
topicsResult: null
Are you uploading data that was in one of those formats?
Sorry, was having trouble posting the rest - this is a very strict server!
Here is the full description. I think I am uploading an mp4. My blob's type is "video/mp4"
Hi is it possible to do real time transcription using Whisper opensource model?
I heard somewhere whisper only natively supports 30 second or less audio clips, meaning it can provide the best accuracy as long as your clip is around that length. Is this 100% true? I'm trying to transcribe lengthy audio clips (some being hours in length), and I'd rather split them into smaller clips larger than 30s each to make the most of the 50 requests per minute limit. Would splitting the clips into 60 or 120 second clips (or longer) be fine to do as well? Or would it be less accurate than 30s for each clip?
search a project name : openai_whisper_stt
on a site called huggingface*co
you can search in a database with many project ideas
you can also search on GCollab or jupyter
good question, i transcript a 3h long video and i find the result good
These are from the openai-python library?
I wonder what the verbose json includes. I'll have to try it out I guess
Yup, they're specified here in the docs https://platform.openai.com/docs/api-reference/audio/create#audio/create-response_format
I've got a collection of phone calls made to 911 in the middle of a flood from almost 30 years ago, and I'd like to use Whisper to transcribe them. The audio quality is generally poor; they were originally recorded on giant clunky reel-to-reel tape systems. Whisper seems to do a generally decent job on them -- the medium.en model works best based on my testing.
I did a test batch of three calls using the following PowerShell command:
ls -file | % { & Write-Host "Now working on $_" -ForegroundColor green; whisper $_.FullName --model medium.en --language en }
It did the first call fine, then got stuck on the second one. After it spent two hours transcribing a call that only lasted 90 seconds, I canceled the job. That particular call file ends in about seven seconds of silence. My theory is that Whisper is failing to detect that, and gets stuck in a loop analyzing the same period of silence over and over looking for speech that isn't there. There are probably quite a few calls like that out of the roughly 2,000 calls in the collection.
So my question is: how does the --no_speech_threshold parameter work? My googling has led me to Github tickets discussing things, but so far none of them have helped. It defaults to 0.6. Should the value be higher or lower in order to make Whisper less sensitive to chunks of silence?
Did Microsoft bought the OpenAI?
I think your answer is to perform preprocessing. Ie getting rid of the chunks of silent audio is easier than to mess around with whisper’s settings.
does removing the silent audio make whisper cheaper / faster to run?
I'm not sure where your pulling the video/ audio from but you might need to turn the data into a stream buffer than turn that into a blob
This is how you do it via a S3 link
// Set the parameters for the S3 getObject operation
const params = {
Bucket: bucketName,
Key: fileKey,
};
// Call the S3 getObject operation with the specified parameters
const getObjectOutput = await s3.send(new GetObjectCommand(params));
//@ts-ignore
const body = getObjectOutput.Body;
// Extract the Readable stream from the SdkStream objecta
// @ts-ignore
const readableStream = Readable.from(body);
// Convert the stream data to a buffer
const chunks = [];
for await (const chunk of readableStream) {
chunks.push(chunk);
}
const buffer = Buffer.concat(chunks);
// Create a new Blob object from the buffer
return new Blob([buffer], { type: "application/octet-stream" });
const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();
formData.append("file", fileBlob, fileKey);
formData.append("model", "whisper-1");
const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};
const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();
formData.append("file", fileBlob, fileKey);
formData.append("model", "whisper-1");
const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};
AND than transcribe it const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();
formData.append("file", fileBlob, fileKey);
formData.append("model", "whisper-1");
const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};
I would say the accuracy goes up. Whisper often just make things up when there is no audio.
Ya it returned me back Russian looking text when it was a audio with a ton of blank noise lol
Where my people at?
I had the same idea
No one has any idea about the question I asked above. I found this python script but couldn't get it to work. it's on github
It's on GitHub :
End-to-end-Youtube-audio-translation-aws-serverless
Hi, i’m trying to work with the whisper api for the first time. i wanted to know if there’s any way to get the transcription of a youtube video without downloading its audio.
i tried to use YouTubeDL to extract the info but i’m stuck at the file parameter
Can we fine-tune it to detect silence?
I was afraid of that. Manually checking all 2000+ calls for areas of silence is probably not practical. I think I'll write a Python script to handle the batch job and build in a timer on each job so I can kill the process and move on to the next one if something takes too long.
Actually I did some more testing, and setting --no_speech_threshold 0.25 on the command seems to have made it work fine. I think I'll try running the whole batch with that. I'll just have to keep an eye on it, and if it gets stuck I'll move the finished files to some other folder, move the problem file to quarantine until I can go through and remove the silence, and then restart the loop on the remaining files.
Anybody keep getting file not Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
the video file I am sending is indeed webm and. this was working yesterday
anyone else getting invalid responses and error statuses when making a post request to whisper?
got status code 100 before, now its always 0
How do you transcribe the audio without downloading the audio?
Do you mean video?
You can use the ytdl library to download only the audio stream from YT. Then pass that audio stream to YT to transcribe it.
If you really don’t want to use the audio, then maybe try to use ytdl to download the subtitles? I haven’t tried but I think that’s a function.
are there any more whisper apps or programs that can aid in language learning?
such as the one mentioned on the open AI website?
also, are there any extensions to have whisper integrated into chatgpt?
how to make navbar aesthetic
Man whisper combined with other tools really makes transcription embarrassingly easy - I wanted to learn about a popular Java library that somehow had very few videos so I took a video in Hebrew and ran it through Whisper
22 lines of code, including all the setup, and I have an mp4 and also something that's decently generic for future similar needs
(the subtitle utility in Whisper changed a bit so I did have to lock in on an older version of the whisper python library so I could use some code I had already used in the past)
is there whisper for personal usage (not api) ?
Yup, it's open source so you can run it locally https://github.com/openai/whisper
But is there any web version so people with no coding skills use it as well?
there are lots of hosted versions to do various different things on hugging face, but your mileage may vary and for anything of significant length they will likely time out
what are you trying to do?
I want to send a mp3 file and want whisper to transcribe it.
Ok I found this model on the website called repliacate. The best thing is that I wasn't even asked for an API key 🙂
You can import Whisper into python from HuggingFace's transformers package.
I mentioned that I am not very good at programming
if it's just a single mp3 you're probably best off just using a service and you can probably do it for free with a trial although a lot of these AI transcription services are probably using outdated technology at this point (I may be entirely wrong on this) so your mileage may vary
otherwise I could probably help you with a google collab notebook that you could run for free, but then again - you probably shouldn't be running code from random people on the internet if you don't know what you're doing 😆
Hi everyone. Just want to know how to improve the inferencing speed of Whisper if I run in my local PC (not using the Whisper API service). The Whisper model that was released in September 2022.
Yeah replicate is probably your best bet, first 30 minutes of compute free (~2hrs of audio content), then ~$0.30 per hour of audio (on the large model; smaller will be cheaper but worse, esp for non-English
You can use it in Google colab, I know a script if you need it
It's on GitHub :
Youtube Videos Transcription with OpenAI's Whisper
In the first 24 hours of execution, Whisper transcribed 204 phone calls for me, running on a local machine. 1,744 to go! I estimate another eight and a half days to finish the entire collection.
On GitHub, search and launch on gcolab
And read the code.
Or install on your own, and launch, it's few line in a shell
is there any example project for real-time transcription from mic using whisper and pyaudio?
specs? i wanna host my own whisper model on a vps and wondering what i would need. i will only be using english
cloud you help me to learn python
Does the openai whisper api host the audio
or we have to host it first and then transcribe
Can I ask it to code a website for me from here?
Does anyone know if there is documentation available for the 'prompt' flag and how to use it? I need Whisper to create more sequences, preferably after each comma. Currently, Whisper can put 2 sentences into one sequence. There is also an issue with synchronizing the end time of one sequence with the start time of the next one. Is it possible to generate, for example, a JSON file with the start time of each word?
An API for accessing new AI models developed by OpenAI
a few more examples on how to use the prompt most effectively (especially for assisting translation) would be nice. I'm doing some testing and will post if I find anything useful
According to this graphic from the Github page, there is timing data being used. No idea how to access it and may need to be run locally and modified
If you set resopnse_format to verbose_json in your requests, it will return timing information like this. See the API reference for more info: https://platform.openai.com/docs/api-reference/audio
An API for accessing new AI models developed by OpenAI
Thank you for your response. I have read those details carefully and I am using that JSON verbose format exactly. However, the problem is that some sentences have 20-25 words and should be split into at least 4-5 parts. These captions are unreadable if they cover half of the video player's screen. If it were possible to obtain the start time for each word, I would easily split them myself.
You could check the output when using the srt and vtt response formats. I haven't looked at them yet but they might be different
Another thing you could try is breaking your audio into smaller few-second chunks to restrict how long they are
I have checked srt and vtt formats and encountered the same problem. Dividing the audio into several seconds-long chunks is not a good solution. How can I find out where the spoken words end in, let's say, a one-hour long audio file so that the splitting does not cut off any words? Perhaps the new API will bring some sensible solutions.
Has anyone made an easy to use version of Whisper yet? And how would you say it compares to machine translating that was done earlier for media?
Also, how would i ever start learning all this stuff, i don't think standard cs degree comes even close to this, and I'm 18 now, by the time i get through all the procedural learning, the field will already have become saturated like the rest of innovations
No, you send it to the API, its response is a MP3. That file does not persist unless you save it to somewhere else.
is there an ability to differentiate between people?
Oh so you dont need to upload to a storage first?
Then send the link to api
Nothing official, you might find some projects that manage to do but they would require running it locally
You have to send the audio data, not a link
Correct. You are posting the audio file to the API.
Ahh thats so nicee other platforms require you to upload to s3 or sum first
It has a 25mb max so you'll need to split the data into chunks if it's too much
I wish they have voice cloning aswell
neat, will try and find one. thank you.
Trying to use it with English language to transcipt audio - works well even though audio was in Ukrainian, it output English translation.
But when use parameter language=uk to get Ukranian it return code 0 and but get money for that work)
Is anybody try to work with language except English?
how do you get timestamps out of .transcribe?
Whisper doesn't provide word level timestamps
i dont need word level, I just need phrase-level
Set response_format to verbose_json
can I do this with the python lib or do I have to use the rest api to add response_format?
I tried in the params arg, but didn't work
Python lib as in running it locally or their api library?
yeah their api library
Has anyone did a comparison of whisper to Google speech to text?
awesome that worked, cheers
You'll probably find it in the discussion tab or readme of the github repo
wonder why it wasn't showing up in the type hints in vs code
I wrote a python script that lets me record from my mic and it sends it to both. Pretty cool to see the differences.
How well does google speech to text handle named entity recognition?
Really good, think Google Home Assistant good.
I don't use any home assistant type of devices
Or maybe I misunderstood your question
Picking up when to capitalize the names of people or products
I'm going to see right now actually
I've compared both and it seemed that Whisper was more accurate, identified punctuation and capitalization better than Google.
That tracks with what I've heard as they leverage their llm for whisper
Whisper is more accurate with sentence structure, capitalization. But it spelling is more likely to be wrong. It's more phonetic but you can understand what it meant. Models are down for me so I can't continue testing
is there a way to get a word from embedding vector with node.js ?
You can't go from embeddings back to text, you need to store what the text maps to
You'll want to use cosine similarity, if you use a vector database like weaviate or pinecone they do the math for you. #dev-chat is the channel for this
I need to know which portion of the content each embedding vector corresponds to
I was trying with Supabase pgvector
. So I will try with pinecone, ty!
I want to translate english audio into other languages, so far it looks like the translate flag is only for converting to english.. how can i specify another language for that?
They don't currently translate into other languages
can we transcribe in realtime? like text streaming from a live recording
Can someone tell me why this code isn't returning a response when doing a post request?:
// Next.js API route support: https://nextjs.org/docs/api-routes/introduction
const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")
const key = process.env.OPEN_AI_KEY
const model = "whisper-1"
export default async function handler(req, res) {
if (req.method !== "POST") {
res.status(405).send({ message: "Only POST requests allowed" })
return
}
return new Promise((resolve, reject) => {
let formObj = new formidable.IncomingForm()
const formData = new FormData()
formData.append("model", model)
formObj.parse(req, function (error, fields, file) {
let filepath = file.fileupload.filepath
formData.append("file", fs.createReadStream(filepath))
axios
.post("https://api.openai.com/v1/audio/transcriptions", formData, {
headers: {
Authorization: `Bearer ${key}`,
"Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
},
})
.then((res) => {
return res.status(200).send({ res: res.data })
})
.catch((err) => {
return res.status(500).send({ error: err })
})
.finally(() => res.status(204).end())
})
})
}```
Is it because of the boundary of the form data?
It seems the issue is not with the boundary of the form data but with how you are handling the response from the axios post request. In your .then block, you are using the res variable as both the response from your Next.js API route and as the axios response, causing a conflict.
To fix this issue, you can rename the axios response variable to avoid conflicts with the Next.js API route's res. Here's the updated code:
const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")
const key = process.env.OPEN_AI_KEY
const model = "whisper-1"
export default async function handler(req, res) {
if (req.method !== "POST") {
res.status(405).send({ message: "Only POST requests allowed" })
return
}
return new Promise((resolve, reject) => {
let formObj = new formidable.IncomingForm()
const formData = new FormData()
formData.append("model", model)
formObj.parse(req, function (error, fields, file) {
let filepath = file.fileupload.filepath
formData.append("file", fs.createReadStream(filepath))
axios
.post("https://api.openai.com/v1/audio/transcriptions", formData, {
headers: {
Authorization: `Bearer ${key}`,
"Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
},
})
.then((axiosRes) => {
return res.status(200).send({ res: axiosRes.data })
})
.catch((err) => {
return res.status(500).send({ error: err })
})
.finally(() => res.status(204).end())
})
})
}```
In this updated code, I changed the axios response variable from res to axiosRes to avoid the conflict. This should resolve the issue and return a response as expected.
Alright thanks for pointing that out
However, still not getting a response. This is what I have in the frontend:
const submitHandler = (event) => {
event.preventDefault()
const data = new FormData(event.target)
data.set("fileupload", data.get("fileupload"))
const config = {
headers: { "content-type": "multipart/form-data" },
}
axios
.post("/api/whisper", data, config)
.then((res) => console.log(res))
.catch((err) => console.log(err))
}
return (
<>
<Head>
<title>Lirical App</title>
<meta name='viewport' content='width=device-width, initial-scale=1' />
<link rel='icon' href='/favicon.ico' />
</Head>
<main className={styles.main}>
<div className={styles.center}>
<form onSubmit={submitHandler}>
<label htmlFor='fileupload'>Upload Audio file</label>
<input required type='file' name='fileupload' accept='audio/*' />
<button type='submit'>Submit</button>
</form>
</div>
</main>
</>
)
Are there any py or open ai dependencies I need installed?
Hello, Im trying to kind open source code for whisper in python
You can find info on running the model here locally here https://github.com/openai/whisper
Okay thanks, Im trying to use it to make a chatbot, just for the stt
Ok I got my file to post to my whisper endpoint correctly, but how should I append to formData if the file doesn't include a file path from client?
hey, how can I avoid unsupported chars ? ```
{
"text": "Dis bonjour \u00e0 ma m\u00e8re."
}```
anyone has tried sending a recorded audio file from safari straight to whisper endpoint v1?
it always complains
"message": "Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']",
i have tested my code on chrome, it works, but not on safari
even chatgpt4 didnt help hehe
const speechToText = async (blob: Blob) => {
const formData = new FormData();
formData.append('file', new File([blob], 'audio.mp3', { type: 'audio/mp3' }));
formData.append('model', 'whisper-1');
return axios({
method: 'POST',
url: 'https://api.openai.com/v1/audio/transcriptions',
data: formData,
headers: {
'Authorization': 'Bearer <TOKEN>,
'Content-Type': 'multipart/form-data',
},
})
.then((response) => {
return response.data.text;
});
}
it is, safari is becoming a headache.
What backend are you using?
I used formidable in the backend to intercept the incoming form and get the file from the files object
I’ll also try my code on safari soon
Also, after I think I got everything working right, how would I be hitting the quota limit already? Can we not use it with free trial?
NOOOO
I find some youtube vids misleading then when they say it's free
I think you need to put await
const submitHandler = async (event) => {
event.preventDefault()
const data = new FormData(event.target)
data.set("fileupload", data.get("fileupload"))
const config = {
headers: { "content-type": "multipart/form-data" },
}
await axios
.post("/api/whisper", data, config)
.then((res) => console.log(res))
.catch((err) => console.log(err))
}
trying to understand this
Cell In [49], line 1
----> 1 result = whisper.transcribe(audio='talking.wav', model="Whisper")
File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/whisper/transcribe.py:75, in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, **decode_options)
32 """
33 Transcribe an audio file using Whisper
34
(...)
72 the spoken language ("language"), which is detected when `decode_options["language"]` is None.
73 """
74 dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
---> 75 if model.device == torch.device("cpu"):
76 if torch.cuda.is_available():
77 warnings.warn("Performing inference on CPU when CUDA is available")
AttributeError: 'str' object has no attribute 'device'
on MacOs M1 processor
it seems in transcribe.py model is a string by default, but it is expecting a property model.device
hmm. Seems to be a bug, they probably wanted to load the model when a string is provided, but didn't write that part ; you can get past it by providing a model instance instead of a string, for example model = whisper.load_model("base") then ... model=model)
so, question answered I guess. might file a bug, not sure if this even latest code, will look into it 🙂
How to get rid of your toxic thoughts
What r you guys talking about
^^^i am also curious
Whisper is an open source speech to text transcription model developed by openai.
anyone know how to significantly speed up the offline models in whisper? like medium/large?
Need to use GPU inference and viable resources.
If you’re asking about the details beyond that. Quantisation of the inference data types. For example lower resolution float types or the AI data types which seems to be a thing now. (Top of my head)
@dense pulsar gladia is a service that claims to have massively sped up the large model, haven't had the time to test it out yet.
also haven't tested out the official openai whisper model on the platform, but I assume it's quite fast.
Else you could have a look at using the huggingface transformers versions since they might be more easy to optimize. There are also a bunch of discussions on the whisper github that you could check out.
👽
x
is anyone able to look at my post and tell me if its possible to do this https://discord.com/channels/974519864045756446/1087070780615045180
any way to use the whisper API just using fetch / curl ?
Passing in a link to a mp3 or audio file
#gpt-realtime make a game in python
give me a two direction stepwise regression code
#gpt-realtime give me a two direction stepwise regression code
#gpt-realtime make a game in python
#gpt-realtime sorry
Lol, is this your real key?
I was thinking the exact same thing. If so, you need to make a couple of changes and quite urgent.
🤣
I have question about whisper use - can it translate from english to other language?
I paid for the gpt plus chat subscription to the account registered by mail maksim543761@gmail.com. What should I do? I didn't even get an email\
@dense zodiac I hope you aren’t under the paid plan with that key
Or else someone gonna spam up to 120 usd with your key
Has anyone gotten whisper to work with audio recorded in Safari?
(I mean the whisper api - not sure which channel is the correct location)
Hi, I need to transcribe an audio file in non English language and transalate the transcribed output to other language with actual time stamp so the TTS can be in sync with the original file.
I have tried WshiperX but not that great results. Please help me here.
the tricky part is synchronization
Yumi Karahashi, the princess of my heart, has gotten married. I am stunned.
how do you translate to another language than English?
Hello, is Whisper works 'on the fly' as Google Speech-to-text? I mean is it as fast as google? I would like to make a conversational bot using whisper API , chatgpt API and elevenlab API. I worry that whisper is too slow and the user experience won't be fluent.
no its not that fast at all. its a bummer as for example Pixel 6/7 with Tensor can do this immediately on the phone. for online its still far from perfect in my opinion
What API do you reccomend then? Is google Speech-to-text best on the market?
I do not know exactly how quick in response can be Whisper API so perhaps you still should give it a try. My results are based on own hardware and so far its always slower then 1x speed
what makes it not best for path you are looking for. My assumption is you want to make kind of own google assistant/Siri like bot. Thumbs up for this + integration with homeassistant open platform 🙂
what should the initial prompt look like?
just the start of the audio?
random list of important words?
What's whisper
What s whisper
Hello everyone, I have a program on Github that will run whisper in batch mode if anyone wants it? Message me directly because OpenAI won't let me post a simple link to it on Github!
guys
who uses the microsft chat gpt like the cht in microsoft edge
@cosmic lanternone
Well, other question, why should I?
i do
I’d recommend that you remove the email from discord messages
Unless you want someone to send you phishing mails or scam mails, it is just a friendly reminder since some people are “not nice”
As far as your question goes, I’d recommend contacting OpenAI through their support mail
I was trying Whisper with Bengali. It doesn't seem to figure out that it is Bengali and just translates to Hindi. I had more luck with Kannada. Does Whisper support Bengali?
Whisper is quite good with Hindi BTW.
You were translating from Hindi to English or from English to Hindi?
so i have a few long audio files, around 10-15 minutes what I want to do is separate the audio files at the end of each sentence for that I have a test document separating them. can whisper know when a portion of audio relative to the text starts and ends so that I can extract that data to separate the large audio files into small audio files of each sentence
@mortal plover I was trying to transcribe Bangla.
Sorry for saying translate. 🤦♂️ \
Can this do anything? Per say math?
And………. Does it work while I don’t have safari going
Hello, is it possible to implement a feature like ':on-progress' (which calls a function) in Whisper ASR, similar to the ':verbose true' functionality? I would like to use in backend.. I can't catch the stdout just at the end. Any idea? (at transcribe fn)
hey i am trying to install whisper following this video https:// www. youtube .com/ watch?v=XX-ET_-onYU
but i have to install a ''git'' , what is it? ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?
the video didnt explained this
post the link with space pls
On OSX just "brew install git" and solved.
?
Git is a tool for developers and we can manage our project there. You can upload the source code of your program in github and use git to make changes there
For install we need first know what is your distribution?
in the terminal console. I don't know what you use..
brew?
Do you use linux distributions? @autumn bolt
brew .sh ok, sorry, that is Mac
https:// hub. tcno.co/ai/whisper/install/ i was following this made it all the way to the last line
but then got the error
pip3 install git+https: //github.com/openai/whisper.git
(the last line)
because the tutorial is incomplete
what is the error message?
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?
i didnt install this git
pip3 install git
ok
wait.. I check it
C:\Users\gusta>pip3 install git
ERROR: Could not find a version that satisfies the requirement git (from versions: none)
ERROR: No matching distribution found for git
pip3 install GitPython
I didn't chk this command but try it in PowerShell:
winget install git
Yes, maybe that is Windows specific
he is thinking now
downloaded it
now installing but quite slowly
''redifine the entry''
ops ''refine the entry''
the message is: Several packages matched entry criteria. Refine the entry.
now what do i do? @autumn bolt
What is your language? We need to translate to understand what is it saying...
here
first he asked if i agree with terms of contract i said y (YES)
Ah OK tnx
then this
ok i will try thanks
Okay if it is not successfully, tell us
I use Mac, so that is sound like a Windows issue. I don't have experience in this.
If you have issue too you can visit my above link message
I mean this www.atlassian.com/git/tutorials/install-git
.
is this normal right
My issue is here.. #gpt-realtime message .. is this place.. the OpenAI support or just community?