#gpt-realtime

1 messages · Page 1 of 1 (latest)

spring python
#

first

#

been wanting to play with whisper for awhile

#

any recommendations between the model sizes? I want something that works nearly real-time for conversational AI

#

well I guess up to a few seconds of lag at most

tribal karma
#

👋

uncut vale
vagrant latch
#

Sup

shrewd lintel
#

sup

unreal halo
#

sup

atomic adder
#

ngl I'm gonna use this tool to not pay attention in class

autumn bolt
#

how to use this?

soft warren
#

Namaste

keen agate
#

sooo what’s whisper?

vast grove
#

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.

keen agate
#

so hopefully it will be able to tell what british people are saying, unlike siri and alexa

autumn bolt
#

how to use whisper?

void egret
void egret
sturdy dock
#

can anyone wanted to become my friend

meager fulcrum
# autumn bolt how to use whisper?

they have a very easy to use python package
pip install openai-whisper
then to use in your program you do something like

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

See more here: https://github.com/openai/whisper

GitHub

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

pale island
latent granite
#

a

full mirage
#

Does anyone know of any npm packages for running whisper inside of NodeJS? Or literally anything at all to run Whisper within NodeJS in general?

void egret
inner ingot
#

Ok

fiery warren
#

so this basically runs locally on your computer?

fiery warren
#

whisper?

raven folio
#

yeah

#

youcan run it

#

somewhere else too

#

to host it and implement it

#

in an app or smth

fiery warren
#

FOr use?

void egret
#

It's under MIT license

autumn bolt
#

ey yo does anyone know where to do ai?

inner mesa
thorny portal
#

Try the MacWhisper app if you have an M1 or M2.

vague narwhal
#

hm

placid siren
#

what is whispe

void egret
#

It's an open source speech to text model published by openai

rustic sigil
#

LOUD

livid kraken
#

yello

halcyon wraith
#

Whisper-UI Update: You can now bulk-transcribe, save & search transcriptions with Streamlit & SQLAlchemy 2.0
I'd built a hacky Streamlit UI for OpenAI's Whisper a few months back and there had been a bit of interest so finally got myself to rewrite it to make it a little nicer. Update includes

  1. Ability to download entire YouTube playlists and upload multiple files at once
  2. Ability to browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM)
  3. Auto-export of transcriptions in multiple formats (was a feature request)
  4. Simple substring based search for transcript segments. This is done with a simple LIKE query on the SQLite database.
  5. Fully reworked UI with a cleaner layout and more intuitive navigation.
    Repo: github.com/hayabhay/whisper-ui
wary knot
spring python
wary knot
#

FYI I was using Whisper.CPP and it has its own pre-baked tuning, but I don't think it heavily diverges from the CUDA version

autumn bolt
plucky palm
#

can i use whisper AI to transcribe livestreams on twitch?

#

@dim viper

void egret
plucky palm
void egret
plucky palm
void egret
#

Read what it says right above that. The phone is just an example to show that it can be run on whatever hardware

Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc

#

Since the original implementation of whisper is meant to run on gpu and is very vram heavy

plucky palm
void egret
plucky palm
willow cobalt
#

Has anyone built an asynchronous communication app or add-on with Whisper?

halcyon wraith
limpid prawn
#

What is whisper? I work gpt 3 api and stuff but what is whisper?

void egret
#

Whisper is a speech to text transcription model published by openai

shrewd rose
gilded palm
#

Hey I'm having some trouble installing whisper via my macbook terminal. I'm running:
pip install -U openai-whisper

and it gives me:
ERROR: Cannot install openai-whisper==20230117 and openai-whisper==20230124 because these package versions have conflicting dependencies.

The conflict is caused by:
openai-whisper 20230124 depends on torch
openai-whisper 20230117 depends on torch

I had installed torch, now when I try to run "pip install torch" it just says:
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

#

anyone around to help out?

void egret
void egret
#

Hm, that's odd then. Try running pip install git+https://github.com/openai/whisper.git

gilded palm
# void egret Hm, that's odd then. Try running `pip install git+https://github.com/openai/whis...

Different message but still an error haha

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement torch (from openai-whisper) (from versions: none)
ERROR: No matching distribution found for torch

#

My python version is 3.11.1 tho

void egret
#

Well that issue seems easier to approach at least

#

Try setting up a venv with a python version in the range and trying again?

#

@dim viper mind trying to help if that doesn't fix his issue? I'm heading to bed

dim viper
#

👍 got to bed night man

gilded palm
#

pip install python==3.10

gives me:
ERROR: Could not find a version that satisfies the requirement python==3.10 (from versions: none)
ERROR: No matching distribution found for python==3.10

gilded palm
dim viper
#

what shows up when u type python --v

#

is it 3.11.1?

gilded palm
#

yeah 3.11.1

dim viper
#

hmm u can try installing something like 3.10 u can find it on pythons website under /downloads

#

but like book said

#

create a venv

#

google venv python and you can find the information on how to set up virtual enviroments

#

brb gotta go take dog out

#

im having internet outage rn .... if i disappear its cause of the internet ill reply as soon as im back online

muted trench
#

What would be the expected processing time? I have a 4 hour long audio file, and im guessing that will take a while?

dim viper
gilded palm
# dim viper did u end up figuring it out

Hey thanks for checking in, I downloaded MacWhisper and it wasn’t working on my version of MacOS so I upgraded to Ventura and now the app works so that’s a start I guess, I’m going to try and reinstall from repo this evening and will report back 🫡

dim viper
finite mirage
#

is youtube for kids

autumn bolt
#

technically, no, but i think almost all kids watch it

amber vortex
#

👍

woeful tapir
#

hlo

olive monolith
#

Mainland and Hong Kong credit card payments are not supported

#

How should I pay?

raven escarp
#

Just passing by to give a big shoutout to the Whisper project!

#

I’m working on an accessibility/quality of life tool and Whisper is being INVALUABLE for it

void egret
#

Whisper really is overlooked, it's incredibly powerful and the fact that it's open source only makes it better

dim viper
rotund leaf
#

Hi Guy's

dreamy palm
#

Hi all, quick question: Has anyone tried to create an AWS Lambda function that runs Whisper's transcribe on a file from S3? I can't think of a reason it wouldn't work, but when I search Google, I cant find anyone else that's done it. Which makes me think I'm missing something.

frozen pendant
#

I share with you the project I did with Whisper, Embedding and GPT-3

allows you to load any youtube video and start getting information through a chat

plucky palm
#

how can i get whisper to record twitch streams in real time on windows 10?

patent steppe
#

IDK

thorny ledge
#

Where are u from?

celest blade
#

HI

iron oar
frozen pendant
#

I can't share link in this channel, but you can go to the api-projects channel

#

and you have to search for: YoutubeGPT: Satya Nadella interview

#

and you will have all the information and the project repo

#

I'm using OpenAI Whisper, Embedding and GPT-3 API

mystic dune
#

If you could introduce your technology to creat chatGPT?

frozen pendant
iron oar
#

damn chatgpt3 api I wish there was a free option

#

I like the idea of searching via vid

frozen pendant
#

I'm building something so that it can be used for free

iron oar
#

yh I respect that

#

I mean it’s nice

#

I’m working on something more complex

frozen pendant
#

I will be posting on twitter, if I want to follow me my username is dani_avila7

iron oar
#

But very similar

#

I gotcha

#

Let’s trade a follow I just made a Twitter yesterday 😁

#

SullyBillions is my Twitter

daring cloud
#

egg incubator research study

white mauve
#

tes

compact marten
mortal plover
silent basin
#

it's a pity that things like ChatGPT are being used for fraud.

glossy moth
#

hey guys your too have this problem ""An error occurred. If this issue persists please contact us through our help center at help.openai.com.""

delicate bridge
#

I need help regarding streamlit cloud
No such file or directory: 'ffmpeg'
I have tried everything since 3 days nothing working for me
Please help

stable nimbus
# compact marten

Accusing someone basing your evidence on what AI believes is even worse than using AI to write assignments. Change my mind

fickle python
#

Hii ALL

#

How're you?\

tropic wyvern
#

When a conversation starts, a time log is necessary. If the dialogue continues for a while, it may be necessary to recall when a specific question was asked or when an answer was received. Although AI cannot hold real-time information, time stamps for the conversation would be helpful.

#

hello where the real input area?

marble apex
#

how to make the ai write phd thesis

compact marten
mellow laurel
#

Hi all new Friend

round shore
#

hello from norway

charred barn
#

I'm using whisper in my project, and I'm having some weird behavior sometimes. Here's a couple errors i've saved from a recent test. Because I'm testing, the audio it's receiving is basically the same each time. just sometimes it gives errors like these, and most of the time it doesnt. I'm probably just gonna wrap some of this in a retry, but wondering if you guys have had similar experiences.

RuntimeError: The size of tensor a (18) must match the size of tensor b (10) at non-singleton dimension 3
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, 6, -1] because the unspecified dimension size -1 can be any value and is ambiguous

void egret
frozen axle
#

How do I start ?

indigo wasp
#

hi from myanmar

sweet drum
#

Write a letter to the witcher

harsh wigeon
winter wraith
#

how do i use whisper?

void egret
charred barn
#

trying to mess around with cuda

#

found out our server has an ancient GPU that isn't supported. So I set up port forwarding to do some testing from my own computer

#

I'm not sure I'm seeing any improvement from doing transcribe().cuda()

#

I'm wondering if I might be going about this the wrong way

#

I'm transcribing audio samples from an IVR

#

so 5-10 second long clips grabbed from the IVR over http

#

The examples I saw on github of other people's projects, specifically the one that functions as a rest api. And I throw a Lock on the transcribe method.

#

The issue I had yesterday with those runtime exceptions were because I didn't put a Lock on the transcribe method

#

Guess I'm just real confused about how to get better performance out of it. I have an rtx 3090TI

#

How can I get better concurrency while working with Whisper? Any advise?

void egret
#

First off is it actually hitting your gpu?

charred barn
#

When we just have a couple calls coming in, it works alright with an average response of like 4 seconds. but when we got like 50 it goes upwards of 10

#

How can I check that? I'm looking at task manager, and i dont see any GPU% coming from python

#

When i had an earlier version of python installed, it told me it wasnt compatible with my gpu and i updated to 3.10

#

so now i get no such error

#

makes me think it is using it

#

but not putting a big load on it

void egret
#

Is your gpu usage increasing when you run it?

charred barn
#

no, not in a noticeable way

void egret
#

If you don't end up installing the CUDA Development Tools it will hit your CPU without any logs

charred barn
#

this cuda toolkit?

#

I can check that out now

void egret
#

ye, I don't think I can link here but let me try?

#

Just google windows cuda development tools it's the first link

charred barn
#

cool

#

I really appreciate your help. I think whisper is gonna help save us some decent money on call processing

void egret
#

I think whisper is amazing and just got overlooked by their completion models upgrading shortly after

#

What size model were you using btw?

civic minnow
charred barn
#

tiny

#

Haven't had too much innacuracies. gonna see if i can bump it up with the cuda help

#

most issues I get are like numbers sometimes dont process correctly

#

zero dollars and zero cents transcribes to $0.00 and $0.00

#

lol

void egret
#

switching to gpu shouldn't change anything with what it outputs, mainly just performance

#

Only real option for increased accuracy is using a larger model

charred barn
#

Yeah, but the model should improve accuracy right?

#

yeah

#

If the cuda can handle 100 calls at once and keep responses under 3 seconds i can try a larger model

void egret
#

👍 Let me know how it goes

charred barn
#

scroll under setup on that page

void egret
#

^ It's at the end of the day it's python library so you just use pip pip install -U openai-whisper

undone bear
#

ive never installed pip either how do i do that?

void egret
#

Have you used python before?

undone bear
#

no...

void egret
#

Probably not a great starting point then. Whisper is a library that you use in code, not a standalone app

undone bear
#

ah

charred barn
#

I think my PC froze in the middle of that install and now my graphics drivers aren't working lol

void egret
charred barn
#

lol well. i couldnt really tell if it was using cuda or not

#

then i think our server started to refuse all my requests out of nowhere lulw i think i pissed off some firewall

#

its not making much of a dent in anything

#

like im doing 40 calls at once, and it's grabbing a bunch of audio samples from the ivr, and my cpu is just going between 3 and 20%. gpu going between 1 and 7%

void egret
#

try going to a larger model just to see where the load goes?

charred barn
#

I'm running into a networking issue cause im not on the same network. it doesnt like all these requests for audio samples lol. after a while it starts failing when ffmpeg goes to load. I'm gonna bump it up a couple models

charred barn
#

I set it to medium and I still can't tell lol. it says python is using like 1% GPU or lower. and like 15% CPU. with like 40 calls, over the course of 2 minutes, it transcribes 80 5 second clips.

#

its cool tho, with tiny we would have hands full of little error, medium is pretty perfect

#

it looks like when it first starts getting requests, it takes like 5 seconds average, but once it gets going, it goes down to under 2 seconds each

#

is there any kinda verbose mode we can call? I'm fairly certain it should be using the cuda now but its barely noticable

void egret
#

Not really, you could chuck it something longer and see?

heady estuary
#

Omg

#

This is the best

#

Thing i ever saw

#

Thank you creators

#

Ur a blessing

#

@outer scarab i lov y

charred barn
#

Gonna try tomorrow. It's a little late over here. Do you know of any benchmarks people may have done with GPU performance with different cards? Gotta make a purchase decision for work

void egret
#

Purely for whisper you might find it in one of the discussions on the repo otherwise I got nothing. :/

outer scarab
modest flax
#

Thank you very much for whisper and the github repo !
I try it on windows with a Shadow (GPU Cloud, NVIDIA RTX A4500) and it's soooo Amazing on the large model 🙂

#

I am trying to separate the speakers in the rendered text, do you have any ideas?

heady estuary
charred barn
#

Lots of results on Google

worn bison
#

MuseNet

scarlet frost
#

Guys

rancid nimbus
#

has anyone made improvements to real time transcription?

void egret
#

There's been a few approaches to realtime transcription in the discussions tab of the repo. I know the cpp fork of it has a few examples that do realtime as well

charred barn
#

That's basically what I'm doing

#

I can make a guide later if you guys want.

#

The key to doing transcription live is to store your audio in a buffer, and send samples to whisper. You can do this without files by using a rest API and serving the samples as bytes. ffmpeg works fine with urls and that's the first step in whisper. I'll share examples later

inland reef
#

open ai is epic

livid copper
#

Any great full stack engineer (with iOS and web expertise) who want to build something useful that can positively impact the society, must haves, passion, drive, curiosity, work ethic...please DM me with your resume or your friends resume !

full plover
rancid nimbus
charred barn
#

We're doing this approach to transcribe live calls for an IVR.

#

When you say seconds, I feel like you think that matters more than it really does

#

TV subtitles take seconds. And the IVR takes time to do stuff. But when we were sending 100 x 5s samples to it in 2 minutes, it would solve and response in under 2 seconds consistently

#

its more the acceptable for industry use for live applications imo

#

this audio is over websocket btw, so its a continuous stream coming into the ivr

rancid nimbus
#

@charred barn If you just have a look at whisper-cpp it has a streaming project and it solves the words but has a delay. Each word would need to be returned within 300 ms after it is done.

#

This is what I am referring to.

#

The best they can do is 1.2 s from what i saw

charred barn
#

so what's a sub 1 second delay an issue for? if we were just doing 1 call, it solves stuff real quick

#

and im also pinging from the opposite side of the world right now cause our server doesnt have a gpu suitable

rancid nimbus
#

1.2 second delay on a phrase will compound over time

charred barn
#

no it wont

rancid nimbus
#

will lag over time

charred barn
#

nope

#

i disagree with you fully

#

we are doing 50 calls at once

#

in testing

harsh wing
#

@muted axle how do I use this

rancid nimbus
charred barn
#

I got timed out

#

I'm not interacting with whisper beyond just calling transcribe.cuda

#

you mentioned that delay was an issue. But in our application we have a suitable delay threshold

#

i was looking at this version of whisper that allows batch processing

#

i was gonna see if i could get an improvement overall from delaying 1 second to collect audio before sending to whisper

rancid nimbus
#

Whisper is really good, but this limitation has prevented me from trying to implement it.

charred barn
#

you can dm your link if you want

rancid nimbus
#

Especially when other solutions are so close and are fully developed.

charred barn
#

i wish they'd add a leveling system here so we could get trusted enough to react to stuff or add links to discussion lol

rancid nimbus
#

ya would be good

charred barn
#

what are you trying to use it for ?

rancid nimbus
#

Being able to react instantly from incoming audio in conversational style

charred barn
#

but that doesnt really tell me what your project is. like, what are you trying to achieve? I dont see what a very short delay would prevent

rancid nimbus
#

The fastest solution I have used so far is riva

#

or triton from nvidia

#

people talk fast

charred barn
#

what kind of response times are you getting with those?

rancid nimbus
#

reaction times and delays are what makes it feel clunky or not

#

just how you would talk with some one

#

if some one takes 1.5 seconds to respond to you every time during a live conversation that isn't natural feeling

#

not to mention any other latency added for other computation needed.

#

That is why it needs to be 300 ms or faster

charred barn
#

That makes sense

rancid nimbus
#

Seems like whisper isn't targeted to this use case, but it is so good not to try

charred barn
#

it'd be nice if they added native support for websocket audio

#

just send me back all the words 1 at a time lol

rancid nimbus
#

the whisperX

charred barn
#

you know what'd be even nicer? if i didnt need to use ffmpeg at all

rancid nimbus
#

oh ya

charred barn
#

i already can serve the audio in the exact format that its encoding it to

#

but it still calls it every time

rancid nimbus
#

the model is good but the rest of the software isn't as useful

#

i love streams like the unix/linux way

#

so ffmpeg isn't a big deal if it isn't slow but if its slow then it needs to be removed.

charred barn
#

every fraction of a second helps

rancid nimbus
#

yep

waxen grove
#

How to enter whisper?

void egret
#

Whisper is a model used for speech to text transcription. You can find instructions on how to use it here: https://github.com/openai/whisper. There isn't a website or anything of that sort though if that's what you are asking.

modest glacier
#

Thanks

spiral tartan
#

"I hope this message finds you well. I wanted to discuss the topic of WHISPER, a rapidly growing field that combines engineering and technology to solve complex problems. Learning more about it can be a valuable investment for personal and professional growth with numerous career opportunities. I'm happy to provide information and resources through discussion, online resources, or workshops/conferences. Let me know if you have questions or would like to chat further. Best regards."

radiant sapphire
#

Can anyone help me get the phone code?

hazy dagger
#

now what should i do

charred barn
pseudo prawn
#

i wanna know whats whisper?

polar pewter
#

can anyone help me with how to use this gpt bot??

rancid nimbus
#

This is the whisper channel. Probably want a gpt channel.

rain flame
timid lintel
#

Has anyone tried integrating Whisper into Audacity, maybe via plugin?

hollow turtle
#

Has anyone tried integrating Whisper into React App?

celest shell
#

How can I use whisper

autumn turret
fathom timber
#

Dude thats crazyQ!!!

iron oar
#

thx 🙏

autumn bolt
#

So

#

I can send messages here

#

Wow

#

Can anyone see ma messages

#

Nop ?

vast wharf
#

Wow, that's crazy !

autumn bolt
#

Is whisper available as a Discord bot?

tawny light
autumn bolt
#

So cool

blazing valley
#

whisper - open source is the way

clear current
#

hii

empty onyx
dreamy palm
slow dew
copper swan
#

Helmo

lilac sleet
#

If I am correct Whisper is what was used to give ChatGPT human like responses correct?

marsh breach
lilac sleet
sonic mango
#

Hello everyone, hope you are well

#

I have a question regarding whisper, what's the best way today to modify the formatting of the text output ?

tidal canopy
spiral hawk
sonic mango
tidal canopy
#

what do you want to transfer it to?

sonic mango
sonic mango
# tidal canopy what do you want to transfer it to?

to a text or a word document, I just want to remove the word "speaker 1" and "speaker 2" and instead have whisper put the text from speaker 1 in italique and no change on speaker 2 with some return to line in between speakers

#

I would also like to remove the timestamp

tidal canopy
#

wait since when does whisper support multiple speakers lul

sonic mango
#

sorry, I did not give an answer with the full context and I was lost in my own head. I'm tinkering with another module called diarization which helps by identifying speakers

#

I should probably seek help on that specific module rather that with whisper itself

tidal canopy
#

can you give a sample output and we can move to DM if you like since this doesn't really have much to do with whisper itself SeemsBlob

sonic mango
#

sure

candid junco
#

Hey guys, I am interested in building with Whisper.

I am trying to transcribe my calls. Would you guys say it's best to build it on top of something like Twilio flex or is there a way for OpenAI to listen in and transcribe calls based on a Chrome extension or something like that?

still monolith
#

ohio radio is down radio.garden/visit/columbus-oh/oHcdAaW1

fickle merlin
#

Shh

#

Whisper

buoyant cairn
#

HEY

uncut venture
#

AdamAI: The first AI-powered video search engine uses Whisper 🙂 Check it out in api-projects page

uncut venture
tidal canopy
frigid ferry
candid junco
uncut venture
#

thx 🙂

#

try it out

#

in projects section the link is tehre

tidal canopy
#

Sounds like a pyramid scheme and I'm not sure if that's the right channel Thonk

void lotus
#

lol

autumn bolt
#

hmmm any one interested in my gaming server?

autumn bolt
autumn bolt
#

yes minecraft

#

wanna join then click on my profile and about me

#

done'

elfin acorn
#

hey guys im new here

#

but td I saw lots of cool projects

#

but for this Reverse Video Search project AdamAI did any of yall experience an issue where sometimes results dont show?

#

but after trying again a refreshin it works

#

but why is that?

frozen field
#

Wow very cool

mighty marlin
#

turkey

#

süüü

ruby grail
#

anyone can invite BlueWillow&

errant inlet
#

Hi how do I start using Whisper and where is it located?

#

Do I have to be a tech person to install it?

void sail
void sail
torpid kraken
#

How can I use whisper?

autumn bolt
#

Download it from the GitHub

plain gyro
native ledge
#

Any idea when GPT 4 is releasing or the newBing

plucky palm
#

[mp3 @ 000001ed70287cc0] Format mp3 detected only with low score of 1, misdetection possible!
[mp3 @ 000001ed70287cc0] Failed to read frame size: Could not seek to 1026.
C:\Users\18186\AppData\Local\Temp\Vocaroo 1fhNXMiDDXG6_gchawfnh2335b7b68aaefaa9608d380451ceb5d05a9f5109.mp3: Invalid argument

#

anyone encounter anything like this?

#

i cant get whisper to read my mp3.

muted axleBOT
#
Introducing ChatGPT and Whisper APIs

Developers can now integrate ChatGPT and Whisper models into their apps and products through our API.

cursive torrent
#

is there a website where i can try whisper?

pine sun
#

InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

If I am not mistaken local Whisper supported .ogg sound format
If it is possible can you please add .ogg to support? For the example, .ogg is the default format for the Telegram audiomessages.

||Yes, I know that I can convert file, however that's tough||

timid lintel
#

iirc, Whisper uses FFmpeg to convert the input audio anyway, since it has to be in a very specific format before converting

near dawn
blissful dove
#

Hooray! Very happy to hear OpenAI's announcement about Whisper API.

valid cradle
#

anyone play around with the new API yet? wondering if its possible to get the "enable_word_time_offsets" param working?

void egret
#

Haven't messed with it but I don't see a way from the docs :/

valid cradle
#

the docs don't mention ANY params at all, though I think it has some

#

getting a very basic, text-only response

void egret
#

Could try setting response_format to verbose_json or vtt?

valid cradle
#

therrrre it is thank you!

#

where did you even see that as an option btw?

void egret
#

I gotta set aside some time to mess with it, my gpu could barely run the medium model. Happy to hear verbose_json has decent info

valid cradle
#

its phrase-level offsets but its something

valid cradle
#

gahh. yea miscommunication, re: translate and transcribe being the same endpoint

#

the good info is in the translate part of the docs

#

oh no wait, there it is in transcribe too.

#

damn ok, guess I missed this

#

anyway thx!

sleek pebble
#

Congrats on launching the Whisper API. Maybe I can ditch the 10 AWS-T4 server instances I've been running. lol

rotund furnace
#

Haha same here. A whisper API is... actually quite industry changing. It's pretty damn fast.

sleek pebble
#

I haven't benchmarked the API yet but I'm currently averaging about 4.7 seconds of audio transcribed per second on a Nvidia T4 GPU.

indigo totem
#

Im getting

const openai = new OpenAIApi(configuration);


const completion = await openai.createTranscription(fs.readFileSync("Fears.mp3"), "whisper-1","this is a youtube comentary","text",1);

sleek pebble
#

The whisper API service is 3.3x faster and 50-75% cheaper for me.

autumn bolt
lavish ferry
#

I hope they add a model parameter. I find the medium model is the ultimate bang for buck in terms of quality and speed.

plain gyro
lavish ferry
#

PSA: transcode your files to mp3 before uploading them to the transcription endpoint. In my testing, mp3 transcription is 100-125% faster than WAV, 80% faster than webm.

#

It takes 4 seconds to transcode 1 min of audio. Blazing.

shut spoke
#

Hey, does the api also have a translation function or is it limited to transcribe?

#

is the translate better than google/aws translate?

sleek pebble
sleek pebble
lavish ferry
solid tapir
#

Guys, how do I get access to GPT3.5 model in the API playground?

lavish ferry
# sleek pebble Could be your internet connection. Internally at least with whisper-asr-webservi...

Hmm, even taking that into account it seems like mp3s run a little faster. For reference, a 5:33 min mp3 (5.5mb) takes 22 seconds to transcribe. The same file in wav format (21mb) takes 52 seconds to transcribe. On my current (terrible hotspot) connection, internet speeds account for roughly 19 seconds of overhead. That's still a 11 second delta. Would appreciate seeing results from others here.

sleek pebble
lavish ferry
marble night
#

when will have a tts model?

night niche
#

hey there folks, i'm working on a project that requires me to transcribe audio live from the user input and store it in a variable. i'm a noob regarding this and would like to know if there's any way to achieve this

peak elm
#

hi all, I feel like I'm missing something really obvious. How am I supposed to structure my request using generic fetch?

        const form = new FormData();
        form.append("file", fs.readFileSync("audio.mp3"));
        form.append("model", "whisper-1");

        const response = await fetch(
            "https://api.openai.com/v1/audio/transcriptions",
            {
                method: "POST",
                headers: {
                    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                },
                body: form,
            }
        );```
#

I keep getting

error: {
    message: '1 validation error for Request\n' +
      'body -> file\n' +
      "  Expected UploadFile, received: <class 'str'> (type=value_error)",
    type: 'invalid_request_error',
    param: null,
    code: null
  }```
wheat kindle
#

hi guys, for transcribing 21seconds speech it's taking 5 seconds. Is there any way to decrease the latecy?

#

it's making it unusuable

#

5 seconds for 21 seconds is way too much

quartz cove
#

Is live transcription with the new API possible and if yes then what’s the best resource to learn about it? Has anyone done it yet?

peak elm
peak elm
wheat kindle
#

well I guess it can process fast in terms of maximum time but the minimum time is not low enough

peak elm
#

I'm processing a 5s clip in roughly 900ms

wheat kindle
#

i guess we can expect it to improve only overtime as I presume it's the model issue itself

peak elm
#

I think it's just due to the nature of how the system is setup. It's processes faster with longer audio files since overhead plays less of a role

#

There's probably a good ~300ms or so for a job to be created and assigned to a ready GPU

wheat kindle
#

yes that's what i was thinking

#

but it rather defeats the purpose of STT

#

it should ideally be near real-time

peak elm
#

i'd love that too

wheat kindle
#

I hope OpenAI will do it eventually

peak elm
#

right now for my project I've been sending requests to whisper every 500ms or so and then taking the latest response when it's needed

#

but it's pretty unideal since it can lose words at the end

wheat kindle
#

yeah

peak elm
#

potentially thinking of using the prompt with the current translation to let me send shorter segments (rather than the whole built up buffer)

plain gyro
azure chasm
#

Hi all
I want to use whisper api but I get this error when I send form data to the api, anyone knows what's the issue?

azure chasm
still wave
#

Is it possible to Train the chatGpt and get the trained data from ChatGpt by Api Hit

sleek pebble
azure chasm
sleek pebble
azure chasm
#

you can see in the second image

sleek pebble
# azure chasm I have this

formData will automatically add the content type and the mutipart boundaries required to do a form post.

sleek pebble
#

Remove "Content-Type": "mutipart/form-data"

azure chasm
deep bane
#

Having an issue where I'm only getting the first sentence of a transcript back...

INPUT
"key": "file",
"data": "IMTBuffer(369427, binary, THE BINARY DATA",
"fileName": "file.m4a",
"fieldType": "file"

OUTPUT

"data": { "text": "Scientific Advertising, by Claude C. Hopkins." }, "fileSize": 56

#

More context: Totally Noob way to run whisper i know - (via make.com http module) -- Seems that OPEN AI is requesting the file in binary, so I'm using an HTTP model to convert the file into binary, then sending via the api... if I just put the file URL in the http request it does not seem to work at all

however, this only returns the first sentence of the file for whatever reason

#

an alternative approach - does not work

error i recieve: 1 validation error for Request
body -> file
Expected UploadFile, received: <class 'str'> (type=value_error)

#

Trying in postman - no luck

#

must have been an issue with the file - works fine with an MP3

deep bane
#

Random question - if it's an interview between two people, can you instruct via the prompt to label the speakers?

wheat kindle
deep bane
#

Thanks, makes sense - would be nice to instruct it to output in Markdown, though that could be done via GPT of course

hidden summit
#

anyone else getting a huge error when using the nodejs library to get a transcript?

#

looks like its just printing out the request

#

trying to use fetch just gives me 400 bad request

stiff leaf
#

yeah, the nodejs usage of createTranscription seems to be broken

hidden summit
#

is the entire api broken or am i doing something wrong?

#

formData.append("file", fs.createReadStream("/home/simon/Projects/resident/mpthreetest.mp3"));
formData.append("model", "whisper-1");
formData.append("language", "en");

const res = await fetch("https://api.openai.com/v1/audio/transcriptions", {
    headers : {
        Authorization: `Bearer ${apiKey}`,
    },
    method: "POST",
    body: formData,
});

console.log(res)
let data = await res.json();
console.log(data)```
stiff leaf
#

try adding a third argument to the first append, that includes the filename: { filename: "test.mp3" }

hidden summit
#

nope

#

anyone have working fetch code i could see as a reference?

#

adding content-type gives a different error

stiff leaf
#

i got things working client-side only, still trying to translate it to server-side code, hitting similar errors as you @hidden summit

hidden summit
#

got it

wet herald
#

How can I access the Whisper AI through the OpenAI API? The documentation isn't too clear

sour ermine
#

Has anyone managed to pass a .mp3 from an external host?

marble abyss
#

Hi is there an app like otter.ai that uses whisper?

sterile apex
#

Hi, how do I pass response_format="srt" in python?

import openai
import unicodedata
audio_file= open("./audio.mp3", "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)
subtle radish
#

anyone experiencing the below error when using the whisper API?

"type": "server_error",
"param": null,
"code": null
}
} 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 6fa27ee8db754a9596de625d704367ce in your email.)', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 02 Mar 2023 22:43:44 GMT', 'Content-Type': 'application/json', 'Content-Length': '365', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'X-Request-Id': '6fa27ee8db754a9596de625d704367ce'}

quaint jay
hidden summit
#

it gives an odd error though

#

sometimes it works, but sometimes it gives the error TypeError [ERR_INVALID_STATE]: Invalid state: chunk ArrayBuffer is zero-length or detached

storm thorn
#

anyone else getting this error message (401) using the whisper api : You didn't provide an API key. You need to provide your API key..., I get this error using curl requests or with the axios library (trying to get it browser compatible without nodejs-specific code)

#

would like to talk to anyone that has managed to make a whisper api curl request, the 401 message above does not go away...

unique anchor
#

does Whisper support text-to-speech synthesis ?

native ruin
#

Gm. So i have a customer who has possible multiple use for the project. Will the api alone provide access to various use cases?

errant narwhal
#

both .webm and .wav being created by the standard Chrome MediaRecorder as audio/webm or audio/webm not working. Whisper says invalid file format. Is it due to codec used on MediaRecorder side?

errant narwhal
storm thorn
amber sage
ionic valley
#

How do I send prompt to whisper api?

proper kayak
#

Where can I get the secret key for whisper model?

ionic valley
proper kayak
#

@ionic valley OK, thank you, I go to try it now

indigo flame
#

Hi, anyone had any success in using power automate to call the whisper api?

hearty prism
#

Does Whisper support timestamps for words as well?

inland epoch
#

Hey guys! I'm trying to do a POST request to OpenAI Whsiper in React Native with Expo. I get an audio recording in an .m4a file, but the response I get from the API is

{"error": {"code": null, "message": "The audio file could not be decoded or its format is not supported.", "param": null, "type": "invalid_request_error"}}

Any ideas? I don't think this is related to the format, but maybe I'm missing something?

inland epoch
#

Okay figured it out. Will leave it here for someone else having troubles.

When I was recording the audio, I had put this options to the Expo Audio object:

ios: {
                extension: ".m4a",
                audioQuality: Audio.IOSAudioQuality.HIGH,
                sampleRate: 44100,
                numberOfChannels: 2,
                bitRate: 128000,
},

Somehow this doesn't work well with OpenAI. Changed it to this one (as in the Expo docs):

ios: {
                extension: ".m4a",
                outputFormat: Audio.IOSOutputFormat.MPEG4AAC,
                audioQuality: Audio.IOSAudioQuality.MAX,
                sampleRate: 44100,
                numberOfChannels: 2,
                bitRate: 128000,
                linearPCMBitDepth: 16,
                linearPCMIsBigEndian: false,
                linearPCMIsFloat: false,
},

Working good now!

hidden summit
#

Anyone have code that can feed a remote url into whisper?

void egret
hidden summit
#

Yeah

#

That would be a step

vivid sparrow
hidden summit
#

is there a way to separate speakers with the api?

patent shale
hidden summit
#

Darn

void egret
patent shale
void egret
#

Iirc it's phrase level timestamps not word level sadly sad

hidden summit
void egret
#

Yup that should work, probably also add logic to segment it if the file is too large

patent shale
#

Wondering if we can co-opt this thread that was meant for the open source version of Whisper and include the API version of Whisper? Or start another thread to avoid confusion...

void egret
#

co-opting it should be fine, it was one of the least used channels before

faint latch
#

Did any one know how to contact openai sale?

timber shard
#

Business api is like 250k I think

hushed crystal
#

Anyone has got a way to search for common-password.txt

#

just looking for a common word less than 16 characters

fickle coyote
#

Using playground's whisper just gives me whitespace when uploading MP3s

unkempt wolf
ionic valley
potent niche
#

I'm getting this error while trying to use the nodeJS library

RequiredError: Required parameter model was null or undefined when calling createTranscription.

Code:

const openai = new OpenAIApi(configuration);
  openai.createTranscription({
    file: mp3Content,
    model: 'whisper-1',
    responseFormat: 'text',
  }).then((response) => {
    console.log(response.data);
    message.reply(response.data);
  }).catch((error) => {
    console.error(error);
  });
#

I'm using the latest library so idk what could be causing the issue

ionic valley
#

Replace above code with this code and try:

const openai = new OpenAIApi(configuration);
  openai.createTranscription({
    file: mp3Content,
    engine: 'whisper-1',
    responseFormat: 'text',
  }).then((response) => {
    console.log(response.data);
    message.reply(response.data);
  }).catch((error) => {
    console.error(error);
  });
red zephyr
#

Getting error 429 @ionic valley

dim jacinth
#

Hey, I've been using the Whisper API for a bit now, but for me none of the calls to the transcription endpoint seem to be logged in the Usage panel 🤔 (although I can see API calls to other endpoints just fine), is anyone else experiencing the same?

remote gazelle
#

LIquefaction induced failure of shallow foundations

spiral hawk
#

anyone experimented with temperature and beam_size? I have various repetitive gaps in my transcribes, and I want to "fix" it. Anyone had the same problem ?

lavish jungle
#

Code bot chat GPT

amber sage
# ionic valley How do I send prompt to whisper api?

You could specify prompt in the following way.

import requests

headers = {
    'Authorization': 'Bearer sk-***',
}

files = {
    'file': open('test.mp3', 'rb'),
    'model': (None, 'whisper-1'),
    'response_format': (None, 'srt'),
    'language': (None, 'en'),
    'prompt': (None, 'The transcript is about a message \
                sent From Sarah Cranmer To Laila Alizadeh \
                regarding travel arrangements.')
}

response = requests.post('https://api.openai.com/v1/audio/transcriptions', 
                         headers=headers, 
                         files=files)
print(response.text)
peak elm
#

I think the types of the nodejs library are messed up for whisper

#
const { Configuration, OpenAIApi } = require("openai");
const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
const resp = await openai.createTranscription(
  fs.createReadStream("audio.mp3"),
  "whisper-1"
);

The example given gives me a typing error:

Argument of type 'ReadStream' is not assignable to parameter of type 'File'.

rain bloom
#

There is a way to stream audio?

pallid slate
#

That would be insane

sonic cloud
#

guys i had a question and i dont seem to find the answer anywhere , the question is that i used my phone nymber multiple times for other email ids but i dont have those emaild ids anymore and i cant seem to use my phone number anymore so if anyone could help me about this plsssss help me

shut spoke
#

does whisper have a max file size on the server side?

restive beacon
#

I'm sending request to openai whisper api it said write model but i writed model in code.
Here is my code:


    fetch("https://api.openai.com/v1/audio/transcriptions", {
      method: "POST",

      body: {
        contentType: "multipart/form-data",

        filename: file,

        model: "whisper-1",
      },

      headers: {
        Authorization: `Bearer ${this.token}`,

        model: "whisper-1",
      },
    })
      .then((res) => res.json())
      .then((json) => {
        if(this.debug === true) console.log(json);

        return json;
      });
unkempt wolf
#

How long does it take for whisper to transcribe something for you guys? For me it's taking like 3 seconds (using mp3), which for my purposes is pretty slow (would need it to be halved or so)

sonic mango
#

What's the lenght of your audio file ?

rare furnace
#

Does it support dual channel transcribing?

sonic mango
sonic mango
sonic mango
#

is the prompt usage only through API or does it work with Whisper natively ?

restive beacon
unkempt wolf
#

The file in question was like 3 sec

unkempt bolt
#

new here, so this is just my first reading...to me it looks like your command is trying to use a lib, that is looking for modern cpu features (for example AVX2) that perhaps your cpu/gpu does not support that. Have you installed an nvidia card and it's drivers/libraries? Might check the path mentioned /usr/lib64-nvidia is populated with expected stuff and is part of your path

#

you running that with gpu or tpu accelerator?

potent niche
restive beacon
#

How to fix could not parse multipart form error?

sour stag
#

@restive beacon Depends on what you're using to send the request and how you're formatting it.

sour stag
#

@restive beacon I see you posted code up above, seems like you're using JS and Fetch. Assuming you are pulling file from an HTML Input element, you could try something like this:

var inputElement = document.getElementById("myHTMLInputElement");
var file = inputElement.files[0];
var data = new FormData();
data.append("file", file);

var params = {
    model: "whisper-1",
};

data.append(JSON.stringify(params));

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
      if(this.debug === true) console.log(json);

      return json;
    });
restive beacon
#

@sour stag I'm not using html

#

I will create whisper npm package

#

But

sour stag
#

Node JS then?

restive beacon
#

Yeah

#

In now i'm getting this error

#

But idk how to fix

sour stag
#

nodejs should be something like this:

const fs = require('fs');
const fetch = require('node-fetch');
const FormData = require('form-data');

var data = new FormData();
data.append("file", fs.createReadStream('example-file.json'));

var params = {
    model: "whisper-1",
};

data.append(JSON.stringify(params));

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
      if(this.debug === true) console.log(json);

      return json;
    });
sour stag
#

@restive beacon Actually, node's version of fetch and FormData require some special treatment, try this:

import fs from 'fs';
import fetch from 'node-fetch'
import FormData from 'form-data';

var testfile = fs.createReadStream('./audio_only_spanish.wav');

var data = new FormData();
data.append("file", testfile);
data.append("model", "whisper-1");

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
        console.log(json);
    });
sour stag
#

the second one is with "type": "module", in the package.json

#

@restive beacon I just tried the second one I posted with my own API-Key and I got a proper response:

{
  text: 'por tenernos la confianza de venir hasta acá, de invertir en su viaje del español. Nosotros aprendimos de ustedes y creo que ustedes aprendieron un poquito de español de nuestra ciudad y de Querétaro. Así es, y bueno, Querétaro es una ciudad muy bonita. De hecho, vamos a hablar un poquito del estado, no sólo de la ciudad en este episodio, pero la verdad es que hay muchos otros lugares en México que vale la pena conocer. ¡Gracias por ver el video!'
}
restive beacon
#

And

#

Thanks so much 🙏

unkempt bolt
#

hmm, sorry sergio, I got same initial errors about tensorflow libs, in colab with gpu, but they were not blocking me. the invalid argument about ae.mp3 and the low confidence that it was an mp3, and the failed to read the expected frame size says ae.mp3 might be corrupt.

spiral hawk
north relic
#

1

pearl schooner
#

isn't this just midjourney?

#

also, why advertise in a voice ai channel

normal ravine
void egret
heady star
#

Folks, is there a way to just detect the language with Whisper API?

warm nymph
#

There was about 50 seconds of silence at the ending in my recording, the only audio was "Do you like computers" and this was what it transcribed. "Do you like to use computers? Well, yes. I'd like to show you... the product. First of all it offers scans easy. It scans to a specified area. First of all, it is physically novel. There is no woman. If you had the opportunity, you could Project a virtual church scholar. or actually a university student... to engage in friends with one another... "

dense mountain
#

Is it possible to pass url to whisper so that it download the audio there instead of uploading? My workflow is that someone upload to cloud storage. Downloading from cloud storage and then uploading it to OpenAI feels like a bandwidth wasted. 🤔

void egret
noble crystal
#

Anyone have an idea when ChatGPT will be aware of whisper api so it can provide help with it ?

void egret
#

Likely not for a while. There's been zero mention of them updating the knowledge base so far

tepid escarp
#

hii

fluid locust
#

Hi all. i found that the Whisper API doesn’t work properly when returning anything other than JSON, ie. srt, vtt. An OpenAI.error is returned with a HTTP code of 200 (the correct output is actually returned, just in the error). Seems like the API isn’t able to handle the other accepted formats

#

Anyone find similar problem?

patent shale
#

I have not found that problem with the Whisper API. It returned to be both SRT and VTT correctly during testing last week Friday.

#

Confirmed that I am seeing the same today.

sonic mango
#

Is there some kind of roadmap for whisper functionality ?

#

functionalities*

patent shale
# sonic mango functionalities*

I haven't seen a roadmap yet, but I would assume they'd expand the number of languages that can be transcribed at some point.

hoary surge
#

playing around with the whisper api that openai released a week or so ago. Wondering how I can get timestamps working as in the selfhosted Whisper?

merry junco
#

do anyone know why recognize_whisper function in speech_recognition of python lib not working, seem like it not generating any output?

flat horizon
#

Can the Whisper API handle video input, or only audio?

patent shale
#

audio only

#

File types supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm

flat horizon
#

Thanks. Maybe I can use Transloadit's audio extraction API in the middle.

toxic ingot
#

Hi, Do you know if there is any company that offers the services of a chatbot but with integrated chatgpt technology?

Thanks

flat horizon
#

What sort of turnaround times per minute are people getting with the Whisper API?

toxic ingot
#

or some way that I can use chatgpt technology to train it with my website and other pdf documents, and integrate it into my website like search engine

upbeat totem
#

Hi guys, a question, how would you make whisper create a vtt file if said file weighs more than 25MB? miduNotLikeThis

void egret
void egret
#

I haven't tried messing around with prompt but i'm pretty sure it's won't effect the format style. It should effect the actual text

upbeat totem
#

i see, ty

fluid locust
#

Anyone tried combining whisper api and pyannote library to get speaker diarization?

autumn bolt
#

Anyone following the Windows 11 voice command preview? Pretty cool the way it works, the show numbers/grid thing is really nice, I imagine you'll be able to relabel these numbers with some text once you know the few you're looking for, it would be great if it could just monitor what you're doing for the day and give you the voice commands, then let you tailor the keyword mapping etc, it could even calculate the amount of time it took you to complete certain tasks and suggest where voice commands would have been faster.

Should be able to build something similar with Whisper... also handy if you want to talk to ChatGPT 😀

final mica
#

Has anyone else noticed Whisper get stuck in loops when run on GPU? I am running it on the gpu on my local system, and it gives basically perfect results when run on cpu, and on gpu I'm seeing the expected device system usage and compute-time speedups, but the output is filled with stuff like this, as one example:
[03:33.500 --> 03:34.500] Oh man, no way.
[03:34.500 --> 03:35.500] Oh man, no way.
[03:35.500 --> 03:36.500] Oh man, no way.
[03:36.500 --> 03:37.500] Oh man, no way.
[03:37.500 --> 03:38.500] Oh man, no way.
(x12)
The above text does actually appear in the audio, but only once and about a minute earlier in the clip -- the above timestamps have been pushed later because of the correctly-transcribed text occurred in the interim.

dim viper
opaque oak
#

Is Whisper able to translate from English to other languages? What would be the best way to turn existing English subtitles into another language? Would GPT be better?

void egret
opaque oak
gloomy ice
#

Hi all, does the Whisper API support word level timestamps? If not, is it on the roadmap?

round osprey
#

is whisper as good as azure STT combined with azure NR for avg quality audio ?

wooden torrent
#

JOE

lean girder
#

I've been struggling with this for the last few hours. This keeps returning "message": "you must provide a model parameter". Anyone know where I'm going wrong?

$url = 'https://api.openai.com/v1/audio/transcriptions';

$file_path = $_FILES['file']['tmp_name'];
$file_name = basename($_FILES['file']['name']);
$file_type = mime_content_type($file_path);
$file_data = file_get_contents($file_path);

$body = array(
'model' => 'whisper-1',
'file' => array(
'name' => $file_name,
'type' => $file_type,
'bits' => $file_data,
),
);

$headers = array(
'Authorization' => 'Bearer ' . $OPENAI_API_KEY,
);

$request_args = array(
'method' => 'POST',
'headers' => $headers,
'body' => $body,
'timeout' => 0,
);

$response = wp_remote_post($url, $request_args);

summer void
#

Any clue what might be going wrong when i try to use the createTranscription function?

Whenever I use the createTranscription function i get a "TypeError: localVarFormParams.getHeaders is not a function".

I'm in a react native project, recording the microphone, so the code to trigger whisper is simply:

  const recordingURI = recording.getURI()
  const response = await fetch(recordingURI)

  openai
  .createTranslation(response.body, 'whisper-1')
  .then((response) => {
    //do something here
  });```


What am I doing wrong here?
final wolf
#

ChatGPT has helped me ok with Whisper. I was able to create all the scripts I wanted today
It was a bit wrestling. But with python-openai lib it is really simple. So apart from the request itself, chatgpt can help you with everything else

final wolf
stiff leaf
#

is there any way to explicitly tell the whisper API the language spoken in the audio, rather than rely on it being auto-detected?

void egret
stiff leaf
fair sierra
#

what is whisper?

void egret
#

Whisper is a speech to text model developed by openai. Given audio it generates a transcription of it

round osprey
#

Did anyone test whisper against for instance azure noise reduction + STT?

#

Calibrated audio likely boost STT performance

void egret
#

I've exclusively used whisper, you might find some performance tests on the discussions tab of the repi though

boreal hill
#

anyone know how to solve this error? error': {'message': 'Maximum content size limit (26214400) exceeded (68638790 bytes read)', 'type': 'server_error', 'param': None, 'code': None

void egret
#

How large was the file?

fluid locust
#

anyone knows how to circumvent the 25 MB file size limit?

subtle radish
upbeat totem
#

for some reason the transcriptions in vtt arrive truncated in my heroku deploy but perfect on my local pc, any idea what it could be?

solemn dirge
#

when I code a English file into Chinese, here is the code I entered "!whisper "test.mp3" --language Chinese", but the result is only few Chinese others are English, is there anyone can help, BTW I don't know python at all

fluid locust
fluid locust
# subtle radish Do you mind sharing code?

def transcribe():
file = request.files['file']
audio = file.read()

try:
    audio_file = BytesIO(audio)
    audio_segment = AudioSegment.from_file(audio_file, format="mp3")

    # Split the audio file into chunks of 25 seconds
    chunks = audio_segment[::25000]

    # Transcribe each chunk and concatenate the results
    results = []
    for chunk in chunks:
        with BytesIO() as chunk_file:
            chunk.export(chunk_file, format="mp3")
            chunk_file.seek(0)
            chunk_file.name = "audio.mp3"
            transcription = openai.Audio.transcribe("whisper-1", chunk_file)
            text = transcription['text']
            results.append(text)
#

shoutout gpt3.5 turbo for that code

upbeat totem
dim viper
stone chasm
#

hey guys. who do i speak to for feedback on whisper?

stone chasm
#

okay. what is this? i got muted for reporting a bug with the API?!?!? this is absolute bullocks! LOL

fluid locust
#

Anyone experience whisper hallucinating on empty sections? is there anyway to prevent this? maybe with vad filter?

stone chasm
fluid locust
#

sorry to hear that
you got any headway on that?

stone chasm
#

i haven't. but most of my hallucinations are due to short audio segments. i'm thinking of either combining them into the longer segments or just drop them altoghter

#

do you get things like www dot globalonenessproject dot org ? i got lots of those...

stone chasm
#

can i please create a thread to share all the hallucinations? they are so funny... i had another one here: "Produced & Uploaded by Houthi Movies"

#

and another.... "For more information on Geography Now, visit geography(.)nsw(.)ca"

fiery warren
#

Anyone know if there is only one or more whisper models available via api?

amber dune
#

Afaik only whisper-1 is available rn

peak vergeBOT
#
Пожалуйста, говорите по-английски.

@stone chasm, your message was removed. We do not currently have capacity to support other languages.

stone chasm
#

come on mods... who can i chat to for feedback? lol

fluid locust
uncut cipher
#

Hi, does anyone knows a project/algorithm that could reliably detect when someone finished speaking using real time transcription with whisper?
I want to make an API call as soon as the user finished speaking/at the end of his sentence

dusty willow
#

But I am leaving I have had to change too many things And then they want my phone number which I'm not giving so I'm leaving this server id dumb

stone pine
#

Does anyone have language detection examples to use?

autumn bolt
#

can somebody help me with the whisper thing I want to to take audio from the microphone and turn it into text for python btw

stone chasm
#

there are a few libraries out there. i've tried a few and they work pretty well, but my computer do not have capacity to process large files. it took 1hr for pyannote to diarize 20mins of audio!!! so i now diaritize using google api remotely and then pass the segments to whisper API for processing.
one thing i noted tho, the whisper API (unlike the whisper library which you process locally) tend to make things up when the audio is not very clear, or when the audio segment is too short.

stone chasm
hearty fable
#

I have a MP3 player, maybe I can help??

stone chasm
#

insyaallah brother

stone chasm
#

"please like and subscribe.... bye bye bye bye bye......." this is so bad.....
actual transcript in the video / audio during the 17.3 seconds were:
[speaker 1] okay, let's have a look. share screen.
[speaker 2] it's not popping up for me.
[speaker 1] you haven't got that?
[speaker2] nope
[speaker 1] okay let's try again.
[speaker 2] and if anybody's listening, audio to the recording afterwards i will walk and talk you through this.
[speaker 1] have you got it now?
[speaker 2] okay, we've got it, we've got it, we've got it.

earnest horizon
#

I'm trying to hit the Whisper API from javascript. Is the correct order of endpoints to hit the files endpoint first to upload my audio file, then to pass the uploaded path to the transcription endpoint?

If so, what value should I pass for the purpose parameter to the files endpoint? The examples for the files endpoint all have purpose="fine-tune in them, but that doesn't sound right for the purpose of transcription.

void egret
fair coyote
slow urchin
#

Can you hear me

gentle sinew
#

Hello, I'm a novice with a few good computer skills and I've started a project to dub an English video into French. It is a mentorship accessible on YouTube.
I used Whisper to transcribe the 3 hours long video.
I had a few mistakes, like for example, "How" which became "So" according to the model.
The voice is that of a man who speaks American with a strong accent. I have the impression that the "tiny.en" model gave the best results. I didn't try to tune the settings, but do you think it could be improved for American?
The next part of the project will be the translation, either ChatGpt or Deepl I don't know yet, and I wanted to know if you had any advice on which tools to use.
Text to speech, time-stamped? to then integrate them to the video. Do you think that it is possible to automate it with the time-stamping.
I hope I'm not going off topic. Thanks in advance, Rom

gentle sinew
#

My project is to make accessible to people who have difficulty with English, a series of videos with always the same person, the same voice.
Is it possible to train Whisper on a particular voice?
For example, to translate perfectly several passages or a complete video and to provide it to Whisper as an example?

final wolf
#

Does anyone have a github repo recommendation for generating subtitles?
I wonder how you get the lines to be properly timestamped. Is that something you do during the transcription, or afterwards?

Both seem complicated

gentle sinew
#

whisper do it perfectly

#

Vtt or srt or json

final wolf
#

But not the whisper API, I think. I'm not sure if my laptop can run the whisper locally

void egret
bold cipher
#

Hello!!!

#

Timestamps how ?????????????

bold cipher
gentle sinew
earnest horizon
autumn bolt
#

if any1 has good experience in tkinter module pls hit me up, its a basic help ( im trying to build a GUI for tic tac toe)

fair coyote
earnest horizon
#

Was hoping someone could give some insight into an error I keep getting:

Whisper data from response:  {"error":{"message":"Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']","type":"invalid_request_error","param":null,"code":null}}
iPhone:
topicsResult:  null
void egret
#

Are you uploading data that was in one of those formats?

earnest horizon
#

Sorry, was having trouble posting the rest - this is a very strict server!

#

Here is the full description. I think I am uploading an mp4. My blob's type is "video/mp4"

outer summit
#

Hi is it possible to do real time transcription using Whisper opensource model?

dense pulsar
#

I heard somewhere whisper only natively supports 30 second or less audio clips, meaning it can provide the best accuracy as long as your clip is around that length. Is this 100% true? I'm trying to transcribe lengthy audio clips (some being hours in length), and I'd rather split them into smaller clips larger than 30s each to make the most of the 50 requests per minute limit. Would splitting the clips into 60 or 120 second clips (or longer) be fine to do as well? Or would it be less accurate than 30s for each clip?

gentle sinew
#

search a project name : openai_whisper_stt
on a site called huggingface*co

#

you can search in a database with many project ideas

#

you can also search on GCollab or jupyter

gentle sinew
keen gulch
#

whispering

#

whisper is honestly cool

final wolf
#

I wonder what the verbose json includes. I'll have to try it out I guess

void egret
remote jasper
#

I've got a collection of phone calls made to 911 in the middle of a flood from almost 30 years ago, and I'd like to use Whisper to transcribe them. The audio quality is generally poor; they were originally recorded on giant clunky reel-to-reel tape systems. Whisper seems to do a generally decent job on them -- the medium.en model works best based on my testing.

I did a test batch of three calls using the following PowerShell command:

ls -file | % { & Write-Host "Now working on $_" -ForegroundColor green; whisper $_.FullName --model medium.en --language en }

It did the first call fine, then got stuck on the second one. After it spent two hours transcribing a call that only lasted 90 seconds, I canceled the job. That particular call file ends in about seven seconds of silence. My theory is that Whisper is failing to detect that, and gets stuck in a loop analyzing the same period of silence over and over looking for speech that isn't there. There are probably quite a few calls like that out of the roughly 2,000 calls in the collection.

So my question is: how does the --no_speech_threshold parameter work? My googling has led me to Github tickets discussing things, but so far none of them have helped. It defaults to 0.6. Should the value be higher or lower in order to make Whisper less sensitive to chunks of silence?

tacit hill
#

Did Microsoft bought the OpenAI?

stone chasm
stone chasm
noble snow
#

does removing the silent audio make whisper cheaper / faster to run?

noble snow
#

This is how you do it via a S3 link

#

// Set the parameters for the S3 getObject operation
const params = {
Bucket: bucketName,
Key: fileKey,
};

// Call the S3 getObject operation with the specified parameters
const getObjectOutput = await s3.send(new GetObjectCommand(params));
//@ts-ignore
const body = getObjectOutput.Body;

// Extract the Readable stream from the SdkStream objecta
// @ts-ignore
const readableStream = Readable.from(body);

// Convert the stream data to a buffer
const chunks = [];
for await (const chunk of readableStream) {
chunks.push(chunk);
}
const buffer = Buffer.concat(chunks);

// Create a new Blob object from the buffer
return new Blob([buffer], { type: "application/octet-stream" });

#

const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

#

const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

#

AND than transcribe it const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

stone chasm
noble snow
slow urchin
#

Where my people at?

gentle sinew
tepid plover
#

Hi, i’m trying to work with the whisper api for the first time. i wanted to know if there’s any way to get the transcription of a youtube video without downloading its audio.
i tried to use YouTubeDL to extract the info but i’m stuck at the file parameter

small juniper
remote jasper
#

I was afraid of that. Manually checking all 2000+ calls for areas of silence is probably not practical. I think I'll write a Python script to handle the batch job and build in a timer on each job so I can kill the process and move on to the next one if something takes too long.

remote jasper
#

Actually I did some more testing, and setting --no_speech_threshold 0.25 on the command seems to have made it work fine. I think I'll try running the whole batch with that. I'll just have to keep an eye on it, and if it gets stuck I'll move the finished files to some other folder, move the problem file to quarantine until I can go through and remove the silence, and then restart the loop on the remaining files.

noble snow
#

Anybody keep getting file not Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

#

the video file I am sending is indeed webm and. this was working yesterday

dense pulsar
#

anyone else getting invalid responses and error statuses when making a post request to whisper?

#

got status code 100 before, now its always 0

stone chasm
#

Do you mean video?

#

You can use the ytdl library to download only the audio stream from YT. Then pass that audio stream to YT to transcribe it.

#

If you really don’t want to use the audio, then maybe try to use ytdl to download the subtitles? I haven’t tried but I think that’s a function.

autumn bolt
#

are there any more whisper apps or programs that can aid in language learning?

#

such as the one mentioned on the open AI website?

#

also, are there any extensions to have whisper integrated into chatgpt?

modest yacht
#

how to make navbar aesthetic

carmine tree
#

Man whisper combined with other tools really makes transcription embarrassingly easy - I wanted to learn about a popular Java library that somehow had very few videos so I took a video in Hebrew and ran it through Whisper
22 lines of code, including all the setup, and I have an mp4 and also something that's decently generic for future similar needs
(the subtitle utility in Whisper changed a bit so I did have to lock in on an older version of the whisper python library so I could use some code I had already used in the past)

high mountain
#

is there whisper for personal usage (not api) ?

void egret
high mountain
#

But is there any web version so people with no coding skills use it as well?

carmine tree
#

there are lots of hosted versions to do various different things on hugging face, but your mileage may vary and for anything of significant length they will likely time out

#

what are you trying to do?

high mountain
#

I want to send a mp3 file and want whisper to transcribe it.

#

Ok I found this model on the website called repliacate. The best thing is that I wasn't even asked for an API key 🙂

small juniper
high mountain
#

I mentioned that I am not very good at programming

carmine tree
#

if it's just a single mp3 you're probably best off just using a service and you can probably do it for free with a trial although a lot of these AI transcription services are probably using outdated technology at this point (I may be entirely wrong on this) so your mileage may vary

#

otherwise I could probably help you with a google collab notebook that you could run for free, but then again - you probably shouldn't be running code from random people on the internet if you don't know what you're doing 😆

main salmon
#

Hi everyone. Just want to know how to improve the inferencing speed of Whisper if I run in my local PC (not using the Whisper API service). The Whisper model that was released in September 2022.

severe spruce
gentle sinew
#

It's on GitHub :
Youtube Videos Transcription with OpenAI's Whisper

remote jasper
#

In the first 24 hours of execution, Whisper transcribed 204 phone calls for me, running on a local machine. 1,744 to go! I estimate another eight and a half days to finish the entire collection.

gentle sinew
#

And read the code.
Or install on your own, and launch, it's few line in a shell

wintry cobalt
#

is there any example project for real-time transcription from mic using whisper and pyaudio?

rare raft
unborn pecan
#

cloud you help me to learn python

wise sparrow
#

Does the openai whisper api host the audio

#

or we have to host it first and then transcribe

dry harbor
#

Can I ask it to code a website for me from here?

trim sinew
#

Does anyone know if there is documentation available for the 'prompt' flag and how to use it? I need Whisper to create more sequences, preferably after each comma. Currently, Whisper can put 2 sentences into one sequence. There is also an issue with synchronizing the end time of one sequence with the start time of the next one. Is it possible to generate, for example, a JSON file with the start time of each word?

mild basin
#

a few more examples on how to use the prompt most effectively (especially for assisting translation) would be nice. I'm doing some testing and will post if I find anything useful

mild basin
#

According to this graphic from the Github page, there is timing data being used. No idea how to access it and may need to be run locally and modified

mild basin
trim sinew
# mild basin If you set resopnse_format to verbose_json in your requests, it will return timi...

Thank you for your response. I have read those details carefully and I am using that JSON verbose format exactly. However, the problem is that some sentences have 20-25 words and should be split into at least 4-5 parts. These captions are unreadable if they cover half of the video player's screen. If it were possible to obtain the start time for each word, I would easily split them myself.

mild basin
trim sinew
tropic hare
#

Has anyone made an easy to use version of Whisper yet? And how would you say it compares to machine translating that was done earlier for media?

#

Also, how would i ever start learning all this stuff, i don't think standard cs degree comes even close to this, and I'm 18 now, by the time i get through all the procedural learning, the field will already have become saturated like the rest of innovations

patent shale
warped sedge
#

is there an ability to differentiate between people?

wise sparrow
#

Then send the link to api

void egret
void egret
patent shale
wise sparrow
#

Ahh thats so nicee other platforms require you to upload to s3 or sum first

void egret
#

It has a 25mb max so you'll need to split the data into chunks if it's too much

wise sparrow
#

I wish they have voice cloning aswell

warped sedge
feral cedar
#

Trying to use it with English language to transcipt audio - works well even though audio was in Ukrainian, it output English translation.
But when use parameter language=uk to get Ukranian it return code 0 and but get money for that work)

#

Is anybody try to work with language except English?

magic lance
#

how do you get timestamps out of .transcribe?

void egret
magic lance
void egret
magic lance
#

I tried in the params arg, but didn't work

void egret
#

Python lib as in running it locally or their api library?

magic lance
void egret
#

Add response_format="verbose_json"

#

In that function call

serene plume
#

Has anyone did a comparison of whisper to Google speech to text?

magic lance
void egret
magic lance
#

wonder why it wasn't showing up in the type hints in vs code

serene plume
#

I wrote a python script that lets me record from my mic and it sends it to both. Pretty cool to see the differences.

void egret
#

How well does google speech to text handle named entity recognition?

serene plume
#

Really good, think Google Home Assistant good.

void egret
#

I don't use any home assistant type of devices

serene plume
#

Or maybe I misunderstood your question

void egret
#

Picking up when to capitalize the names of people or products

serene plume
#

I'm going to see right now actually

patent shale
void egret
#

That tracks with what I've heard as they leverage their llm for whisper

serene plume
#

Whisper is more accurate with sentence structure, capitalization. But it spelling is more likely to be wrong. It's more phonetic but you can understand what it meant. Models are down for me so I can't continue testing

slim hull
#

is there a way to get a word from embedding vector with node.js ?

void egret
#

You can't go from embeddings back to text, you need to store what the text maps to

slim hull
#

so

#

how can I get the distance between that embeddings? I'm a little confused

void egret
#

You'll want to use cosine similarity, if you use a vector database like weaviate or pinecone they do the math for you. #dev-chat is the channel for this

slim hull
#

I need to know which portion of the content each embedding vector corresponds to

#

I was trying with Supabase pgvector dallef . So I will try with pinecone, ty!

long matrix
#

I want to translate english audio into other languages, so far it looks like the translate flag is only for converting to english.. how can i specify another language for that?

void egret
#

They don't currently translate into other languages

gleaming meadow
#

can we transcribe in realtime? like text streaming from a live recording

raven gate
#

Can someone tell me why this code isn't returning a response when doing a post request?:

// Next.js API route support: https://nextjs.org/docs/api-routes/introduction
const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")

const key = process.env.OPEN_AI_KEY
const model = "whisper-1"

export default async function handler(req, res) {
    if (req.method !== "POST") {
        res.status(405).send({ message: "Only POST requests allowed" })
        return
    }

    return new Promise((resolve, reject) => {
        let formObj = new formidable.IncomingForm()

        const formData = new FormData()
        formData.append("model", model)

        formObj.parse(req, function (error, fields, file) {
            let filepath = file.fileupload.filepath
            formData.append("file", fs.createReadStream(filepath))
            axios
                .post("https://api.openai.com/v1/audio/transcriptions", formData, {
                    headers: {
                        Authorization: `Bearer ${key}`,
                        "Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
                    },
                })
                .then((res) => {
                    return res.status(200).send({ res: res.data })
                })
                .catch((err) => {
                    return res.status(500).send({ error: err })
                })
                .finally(() => res.status(204).end())
        })
    })
}```
#

Is it because of the boundary of the form data?

lost thistle
#

It seems the issue is not with the boundary of the form data but with how you are handling the response from the axios post request. In your .then block, you are using the res variable as both the response from your Next.js API route and as the axios response, causing a conflict.

#

To fix this issue, you can rename the axios response variable to avoid conflicts with the Next.js API route's res. Here's the updated code:

#
const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")

const key = process.env.OPEN_AI_KEY
const model = "whisper-1"

export default async function handler(req, res) {
    if (req.method !== "POST") {
        res.status(405).send({ message: "Only POST requests allowed" })
        return
    }

    return new Promise((resolve, reject) => {
        let formObj = new formidable.IncomingForm()

        const formData = new FormData()
        formData.append("model", model)

        formObj.parse(req, function (error, fields, file) {
            let filepath = file.fileupload.filepath
            formData.append("file", fs.createReadStream(filepath))
            axios
                .post("https://api.openai.com/v1/audio/transcriptions", formData, {
                    headers: {
                        Authorization: `Bearer ${key}`,
                        "Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
                    },
                })
                .then((axiosRes) => {
                    return res.status(200).send({ res: axiosRes.data })
                })
                .catch((err) => {
                    return res.status(500).send({ error: err })
                })
                .finally(() => res.status(204).end())
        })
    })
}```
#

In this updated code, I changed the axios response variable from res to axiosRes to avoid the conflict. This should resolve the issue and return a response as expected.

raven gate
#

Alright thanks for pointing that out

#

However, still not getting a response. This is what I have in the frontend:

const submitHandler = (event) => {
        event.preventDefault()

        const data = new FormData(event.target)
        data.set("fileupload", data.get("fileupload"))

        const config = {
            headers: { "content-type": "multipart/form-data" },
        }

        axios
            .post("/api/whisper", data, config)
            .then((res) => console.log(res))
            .catch((err) => console.log(err))
    }

    return (
        <>
            <Head>
                <title>Lirical App</title>
                <meta name='viewport' content='width=device-width, initial-scale=1' />
                <link rel='icon' href='/favicon.ico' />
            </Head>
            <main className={styles.main}>
                <div className={styles.center}>
                    <form onSubmit={submitHandler}>
                        <label htmlFor='fileupload'>Upload Audio file</label>
                        <input required type='file' name='fileupload' accept='audio/*' />
                        <button type='submit'>Submit</button>
                    </form>
                </div>
            </main>
        </>
    )
#

Are there any py or open ai dependencies I need installed?

upbeat elm
#

Hello, Im trying to kind open source code for whisper in python

void egret
upbeat elm
#

Okay thanks, Im trying to use it to make a chatbot, just for the stt

raven gate
#

Ok I got my file to post to my whisper endpoint correctly, but how should I append to formData if the file doesn't include a file path from client?

cobalt pivot
#

hey, how can I avoid unsupported chars ? ```

{
"text": "Dis bonjour \u00e0 ma m\u00e8re."
}```

autumn bolt
#

anyone has tried sending a recorded audio file from safari straight to whisper endpoint v1?

#

it always complains

"message": "Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']",

#

i have tested my code on chrome, it works, but not on safari

#

even chatgpt4 didnt help hehe

raven gate
#

Strange things indeed

#

Safari is kinda like IE imo

autumn bolt
#

const speechToText = async (blob: Blob) => {
const formData = new FormData();
formData.append('file', new File([blob], 'audio.mp3', { type: 'audio/mp3' }));
formData.append('model', 'whisper-1');

return axios({
method: 'POST',
url: 'https://api.openai.com/v1/audio/transcriptions',
data: formData,
headers: {
'Authorization': 'Bearer <TOKEN>,
'Content-Type': 'multipart/form-data',
},
})
.then((response) => {
return response.data.text;
});
}

#

it is, safari is becoming a headache.

raven gate
#

I used formidable in the backend to intercept the incoming form and get the file from the files object

#

I’ll also try my code on safari soon

#

Also, after I think I got everything working right, how would I be hitting the quota limit already? Can we not use it with free trial?

idle basin
#

NOOOO

raven gate
#

I find some youtube vids misleading then when they say it's free

odd heron
forest rover
#

trying to understand this

Cell In [49], line 1
----> 1 result = whisper.transcribe(audio='talking.wav', model="Whisper")

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/whisper/transcribe.py:75, in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, **decode_options)
     32 """
     33 Transcribe an audio file using Whisper
     34
   (...)
     72 the spoken language ("language"), which is detected when `decode_options["language"]` is None.
     73 """
     74 dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
---> 75 if model.device == torch.device("cpu"):
     76     if torch.cuda.is_available():
     77         warnings.warn("Performing inference on CPU when CUDA is available")

AttributeError: 'str' object has no attribute 'device'
#

on MacOs M1 processor

#

it seems in transcribe.py model is a string by default, but it is expecting a property model.device

#

hmm. Seems to be a bug, they probably wanted to load the model when a string is provided, but didn't write that part ; you can get past it by providing a model instance instead of a string, for example model = whisper.load_model("base") then ... model=model)

#

so, question answered I guess. might file a bug, not sure if this even latest code, will look into it 🙂

gaunt plank
#

How to get rid of your toxic thoughts

fickle cave
#

What r you guys talking about

strong pendant
#

^^^i am also curious

void egret
#

Whisper is an open source speech to text transcription model developed by openai.

dense pulsar
#

anyone know how to significantly speed up the offline models in whisper? like medium/large?

short rover
#

Need to use GPU inference and viable resources.

#

If you’re asking about the details beyond that. Quantisation of the inference data types. For example lower resolution float types or the AI data types which seems to be a thing now. (Top of my head)

exotic cairn
#

GPU >>

#

such a big difference for sure

left mist
#

@dense pulsar gladia is a service that claims to have massively sped up the large model, haven't had the time to test it out yet.

#

also haven't tested out the official openai whisper model on the platform, but I assume it's quite fast.

#

Else you could have a look at using the huggingface transformers versions since they might be more easy to optimize. There are also a bunch of discussions on the whisper github that you could check out.

craggy heart
#

hello rvry one

#

hello evry one

mellow nacelle
#

👽

autumn bolt
#

x

lapis flicker
torpid coral
#

any way to use the whisper API just using fetch / curl ?

#

Passing in a link to a mp3 or audio file

barren compass
#

no, you need to send the file data

#

fetch it yourself from the url, then send it

hybrid flint
zealous cedar
#

give me a two direction stepwise regression code

willow gale
wise lily
#

are you there?

wicked saffron
stone chasm
#

Lol, is this your real key?

uneven kayak
#

I was thinking the exact same thing. If so, you need to make a couple of changes and quite urgent.

static pagoda
#

🤣

mortal plover
#

I have question about whisper use - can it translate from english to other language?

neon zinc
#

I paid for the gpt plus chat subscription to the account registered by mail maksim543761@gmail.com. What should I do? I didn't even get an email\

raven gate
#

@dense zodiac I hope you aren’t under the paid plan with that key

#

Or else someone gonna spam up to 120 usd with your key

peak saffron
#

Has anyone gotten whisper to work with audio recorded in Safari?

#

(I mean the whisper api - not sure which channel is the correct location)

tacit hedge
#

Hi, I need to transcribe an audio file in non English language and transalate the transcribed output to other language with actual time stamp so the TTS can be in sync with the original file.

I have tried WshiperX but not that great results. Please help me here.

regal dagger
#

the tricky part is synchronization

dark parrot
#

Yumi Karahashi, the princess of my heart, has gotten married. I am stunned.

mortal plover
high mountain
#

Hello, is Whisper works 'on the fly' as Google Speech-to-text? I mean is it as fast as google? I would like to make a conversational bot using whisper API , chatgpt API and elevenlab API. I worry that whisper is too slow and the user experience won't be fluent.

mortal plover
high mountain
#

What API do you reccomend then? Is google Speech-to-text best on the market?

mortal plover
#

I do not know exactly how quick in response can be Whisper API so perhaps you still should give it a try. My results are based on own hardware and so far its always slower then 1x speed

#

what makes it not best for path you are looking for. My assumption is you want to make kind of own google assistant/Siri like bot. Thumbs up for this + integration with homeassistant open platform 🙂

storm oak
#

what should the initial prompt look like?
just the start of the audio?
random list of important words?

gray thunder
#

What's whisper

small juniper
#

What s whisper

analog willow
#

Hello everyone, I have a program on Github that will run whisper in batch mode if anyone wants it? Message me directly because OpenAI won't let me post a simple link to it on Github!

autumn bolt
#

guys

#

who uses the microsft chat gpt like the cht in microsoft edge

#

@cosmic lanternone

grim hound
covert shuttle
#

Unless you want someone to send you phishing mails or scam mails, it is just a friendly reminder since some people are “not nice”

#

As far as your question goes, I’d recommend contacting OpenAI through their support mail

trim rampart
#

I was trying Whisper with Bengali. It doesn't seem to figure out that it is Bengali and just translates to Hindi. I had more luck with Kannada. Does Whisper support Bengali?

#

Whisper is quite good with Hindi BTW.

mortal plover
still briar
#

so i have a few long audio files, around 10-15 minutes what I want to do is separate the audio files at the end of each sentence for that I have a test document separating them. can whisper know when a portion of audio relative to the text starts and ends so that I can extract that data to separate the large audio files into small audio files of each sentence

trim rampart
#

@mortal plover I was trying to transcribe Bangla.

trim rampart
#

Sorry for saying translate. 🤦‍♂️ \

covert skiff
#

Can this do anything? Per say math?

#

And………. Does it work while I don’t have safari going

pure veldt
#

Hello, is it possible to implement a feature like ':on-progress' (which calls a function) in Whisper ASR, similar to the ':verbose true' functionality? I would like to use in backend.. I can't catch the stdout just at the end. Any idea? (at transcribe fn)

autumn bolt
#

hey i am trying to install whisper following this video https:// www. youtube .com/ watch?v=XX-ET_-onYU

#

but i have to install a ''git'' , what is it? ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

the video didnt explained this

#

post the link with space pls

autumn bolt
lilac karma
#

For install we need first know what is your distribution?

autumn bolt
autumn bolt
lilac karma
#

Do you use linux distributions? @autumn bolt

autumn bolt
#

windows

#

1

#

0

autumn bolt
#

but then got the error

#

because the tutorial is incomplete

#

what is the error message?

#

ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

#

i didnt install this git

#

pip3 install git

#

ok

#

wait.. I check it

#

C:\Users\gusta>pip3 install git
ERROR: Could not find a version that satisfies the requirement git (from versions: none)
ERROR: No matching distribution found for git

#

pip3 install GitPython

lilac karma
autumn bolt
#

Yes, maybe that is Windows specific

autumn bolt
#

downloaded it

#

now installing but quite slowly

#

''redifine the entry''

#

ops ''refine the entry''

#

the message is: Several packages matched entry criteria. Refine the entry.

#

now what do i do? @autumn bolt

lilac karma
# autumn bolt

What is your language? We need to translate to understand what is it saying...

autumn bolt
#

its portuguese

#

i translated

#

it

autumn bolt
#

first he asked if i agree with terms of contract i said y (YES)

lilac karma
lilac karma
#

Visit this website

autumn bolt
#

ok i will try thanks

lilac karma
#

Okay if it is not successfully, tell us

autumn bolt
lilac karma
autumn bolt
#

.

#

is this normal right