#gpt-realtime | OpenAI | Page 1

spring python Feb 3, 2023, 8:41 AM

#

first

#

been wanting to play with whisper for awhile

#

any recommendations between the model sizes? I want something that works nearly real-time for conversational AI

#

well I guess up to a few seconds of lag at most

tribal karma Feb 3, 2023, 9:32 AM

#

👋

uncut vale Feb 3, 2023, 10:05 AM

#

hello

vagrant latch Feb 3, 2023, 10:47 AM

#

Sup

shrewd lintel Feb 3, 2023, 10:52 AM

#

sup

unreal halo Feb 3, 2023, 10:58 AM

#

sup

atomic adder Feb 3, 2023, 11:17 AM

#

ngl I'm gonna use this tool to not pay attention in class

autumn bolt Feb 3, 2023, 12:39 PM

#

how to use this?

soft warren Feb 3, 2023, 12:45 PM

#

Namaste

keen agate Feb 3, 2023, 12:45 PM

#

sooo what’s whisper?

vast grove Feb 3, 2023, 1:11 PM

#

keen agate sooo what’s whisper?

https://openai.com/blog/whisper/

#

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.

keen agate Feb 3, 2023, 1:14 PM

#

so hopefully it will be able to tell what british people are saying, unlike siri and alexa

autumn bolt Feb 3, 2023, 3:24 PM

#

how to use whisper?

void egret Feb 3, 2023, 4:05 PM

#

spring python any recommendations between the model sizes? I want something that works nearly ...

I'm a bit late but I think any size will work for that. In most cases you won't need to push it above medium unless there is a bunch of made up words or slurred speech

void egret Feb 3, 2023, 4:06 PM

#

autumn bolt how to use whisper?

You can google openai whisper github. There's python install instructions and some people made bindings for other langs

sturdy dock Feb 3, 2023, 5:39 PM

#

can anyone wanted to become my friend

meager fulcrum Feb 3, 2023, 6:03 PM

#

autumn bolt how to use whisper?

they have a very easy to use python package
pip install openai-whisper
then to use in your program you do something like

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

See more here: https://github.com/openai/whisper

GitHub

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale ...

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

pale island Feb 3, 2023, 6:14 PM

#

meager fulcrum they have a very easy to use python package `pip install openai-whisper` then to...

I just ran this. Thank the heavens for fiber.

latent granite Feb 4, 2023, 1:28 AM

#

a

full mirage Feb 4, 2023, 1:28 AM

#

Does anyone know of any npm packages for running whisper inside of NodeJS? Or literally anything at all to run Whisper within NodeJS in general?

void egret Feb 4, 2023, 1:56 AM

#

full mirage Does anyone know of any npm packages for running whisper inside of NodeJS? Or li...

The C++ fork of whisper has node bindings. You can can find it on npm as whisper.cpp

inner ingot Feb 4, 2023, 8:45 AM

#

Ok

fiery warren Feb 4, 2023, 12:12 PM

#

so this basically runs locally on your computer?

raven folio Feb 4, 2023, 1:40 PM

#

fiery warren so this basically runs locally on your computer?

what ?

fiery warren Feb 4, 2023, 1:40 PM

#

whisper?

raven folio Feb 4, 2023, 1:41 PM

#

yeah

#

youcan run it

#

somewhere else too

#

to host it and implement it

#

in an app or smth

fiery warren Feb 4, 2023, 3:28 PM

#

raven folio in an app or smth

Thanks for your response. Do you know about its license?

#

FOr use?

void egret Feb 4, 2023, 5:04 PM

#

It's under MIT license

raven folio Feb 4, 2023, 5:36 PM

#

fiery warren Thanks for your response. Do you know about its license?

MIT

autumn bolt Feb 4, 2023, 6:44 PM

#

ey yo does anyone know where to do ai?

inner mesa Feb 5, 2023, 12:05 AM

#

autumn bolt ey yo does anyone know where to do ai?

A good place to start is the S.U.G.M.A. forums

thorny portal Feb 5, 2023, 12:10 AM

#

Try the MacWhisper app if you have an M1 or M2.

vague narwhal Feb 5, 2023, 1:29 PM

#

hm

placid siren Feb 5, 2023, 2:31 PM

#

what is whispe

void egret Feb 5, 2023, 2:36 PM

#

It's an open source speech to text model published by openai

rustic sigil Feb 5, 2023, 2:53 PM

#

LOUD

livid kraken Feb 6, 2023, 11:01 AM

#

yello

halcyon wraith Feb 6, 2023, 6:26 PM

#

Whisper-UI Update: You can now bulk-transcribe, save & search transcriptions with Streamlit & SQLAlchemy 2.0
I'd built a hacky Streamlit UI for OpenAI's Whisper a few months back and there had been a bit of interest so finally got myself to rewrite it to make it a little nicer. Update includes

Ability to download entire YouTube playlists and upload multiple files at once
Ability to browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM)
Auto-export of transcriptions in multiple formats (was a feature request)
Simple substring based search for transcript segments. This is done with a simple LIKE query on the SQLite database.
Fully reworked UI with a cleaner layout and more intuitive navigation.
Repo: github.com/hayabhay/whisper-ui

wary knot Feb 6, 2023, 7:54 PM

#

spring python any recommendations between the model sizes? I want something that works nearly ...

I found that for EN so far medium seemed to work best for the size? But for transcribe + translate I think large is minimum.

spring python Feb 6, 2023, 8:01 PM

#

wary knot I found that for EN so far medium seemed to work best for the size? But for tran...

did you have accuracy issues with the smaller models?

wary knot Feb 6, 2023, 8:42 PM

#

spring python did you have accuracy issues with the smaller models?

it was ok-ish for EN, but the accuracy of medium is way better, the smaller models are extremely hit or miss for non-EN languages (like less than 50% iirc)

#

FYI I was using Whisper.CPP and it has its own pre-baked tuning, but I don't think it heavily diverges from the CUDA version

shrewd rose Feb 6, 2023, 10:34 PM

#

halcyon wraith Whisper-UI Update: You can now bulk-transcribe, save & search transcriptions wit...

wow

autumn bolt Feb 7, 2023, 12:10 AM

#

shrewd rose wow

Hmm

plucky palm Feb 7, 2023, 1:09 AM

#

can i use whisper AI to transcribe livestreams on twitch?

#

@dim viper

void egret Feb 7, 2023, 1:53 AM

#

plucky palm can i use whisper AI to transcribe livestreams on twitch?

Yup, the whisper.cpp fork actually has an example for transcribing twitch streams in the repo

plucky palm Feb 7, 2023, 2:11 AM

#

void egret Yup, the whisper.cpp fork actually has an example for transcribing twitch stream...

bro that is a phone transcription

void egret Feb 7, 2023, 2:13 AM

#

plucky palm bro that is a phone transcription

I have no idea what your trying to say here?

plucky palm Feb 7, 2023, 2:13 AM

#

void egret I have no idea what your trying to say here?

the example in what you linked has the guy holding a phone to record?

void egret Feb 7, 2023, 2:14 AM

#

Read what it says right above that. The phone is just an example to show that it can be run on whatever hardware

Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: whisper.objc

#

Since the original implementation of whisper is meant to run on gpu and is very vram heavy

plucky palm Feb 7, 2023, 2:15 AM

#

void egret Read what it says right above that. The phone is just an example to show that it...

how would i get it to listen to desktop audio?

void egret Feb 7, 2023, 2:17 AM

#

plucky palm how would i get it to listen to desktop audio?

Did you actually read the readme? It has a bunch of examples. You can also look in the examples folder and twitch.sh in there to see how to use it in the use case you asked about

plucky palm Feb 7, 2023, 2:17 AM

#

void egret Did you actually read the readme? It has a bunch of examples. You can also look ...

no im asking you questions because you're helpful and nice xD

willow cobalt Feb 7, 2023, 2:19 AM

#

Has anyone built an asynchronous communication app or add-on with Whisper?

halcyon wraith Feb 7, 2023, 3:38 AM

#

willow cobalt Has anyone built an asynchronous communication app or add-on with Whisper?

I had hacked a live translation bit a few months ago (desktop only) and will add to the UI soon (using streamlit-webrtc)

limpid prawn Feb 7, 2023, 1:40 PM

#

What is whisper? I work gpt 3 api and stuff but what is whisper?

void egret Feb 7, 2023, 3:43 PM

#

Whisper is a speech to text transcription model published by openai

shrewd rose Feb 7, 2023, 11:09 PM

#

limpid prawn What is whisper? I work gpt 3 api and stuff but what is whisper?

https://github.com/openai/whisper

GitHub

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale ...

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

gilded palm Feb 8, 2023, 7:51 AM

#

Hey I'm having some trouble installing whisper via my macbook terminal. I'm running:
pip install -U openai-whisper

and it gives me:
ERROR: Cannot install openai-whisper==20230117 and openai-whisper==20230124 because these package versions have conflicting dependencies.

The conflict is caused by:
openai-whisper 20230124 depends on torch
openai-whisper 20230117 depends on torch

I had installed torch, now when I try to run "pip install torch" it just says:
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

#

anyone around to help out?

void egret Feb 8, 2023, 7:59 AM

#

gilded palm Hey I'm having some trouble installing whisper via my macbook terminal. I'm runn...

Are you using an M1 mac? Iirc there are still some pytorch issues on M1

gilded palm Feb 8, 2023, 8:01 AM

#

void egret Are you using an M1 mac? Iirc there are still some pytorch issues on M1

it's a 2018 i7

void egret Feb 8, 2023, 8:03 AM

#

Hm, that's odd then. Try running pip install git+https://github.com/openai/whisper.git

gilded palm Feb 8, 2023, 8:06 AM

#

void egret Hm, that's odd then. Try running `pip install git+https://github.com/openai/whis...

Different message but still an error haha

ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement torch (from openai-whisper) (from versions: none)
ERROR: No matching distribution found for torch

#

My python version is 3.11.1 tho

void egret Feb 8, 2023, 8:08 AM

#

Well that issue seems easier to approach at least

#

Try setting up a venv with a python version in the range and trying again?

#

@dim viper mind trying to help if that doesn't fix his issue? I'm heading to bed

dim viper Feb 8, 2023, 8:24 AM

#

👍 got to bed night man

gilded palm Feb 8, 2023, 8:26 AM

#

dim viper 👍 got to bed night man

It seems I can't install any older versions of python now...

#

pip install python==3.10

gives me:
ERROR: Could not find a version that satisfies the requirement python==3.10 (from versions: none)
ERROR: No matching distribution found for python==3.10

gilded palm Feb 8, 2023, 8:26 AM

#

void egret <@972272837245173801> mind trying to help if that doesn't fix his issue? I'm hea...

Cheers thank you for your time!

dim viper Feb 8, 2023, 8:27 AM

#

gilded palm It seems I can't install any older versions of python now...

one sec

#

what shows up when u type python --v

#

is it 3.11.1?

gilded palm Feb 8, 2023, 8:29 AM

#

yeah 3.11.1

dim viper Feb 8, 2023, 8:31 AM

#

hmm u can try installing something like 3.10 u can find it on pythons website under /downloads

#

but like book said

#

create a venv

#

google venv python and you can find the information on how to set up virtual enviroments

#

brb gotta go take dog out

#

im having internet outage rn .... if i disappear its cause of the internet ill reply as soon as im back online

muted trench Feb 8, 2023, 2:04 PM

#

What would be the expected processing time? I have a 4 hour long audio file, and im guessing that will take a while?

dim viper Feb 8, 2023, 5:03 PM

#

gilded palm yeah 3.11.1

did u end up figuring it out

gilded palm Feb 8, 2023, 7:00 PM

#

dim viper did u end up figuring it out

Hey thanks for checking in, I downloaded MacWhisper and it wasn’t working on my version of MacOS so I upgraded to Ventura and now the app works so that’s a start I guess, I’m going to try and reinstall from repo this evening and will report back 🫡

dim viper Feb 8, 2023, 7:05 PM

#

gilded palm Hey thanks for checking in, I downloaded MacWhisper and it wasn’t working on my ...

Sounds good! hope everything goes well 🙏

finite mirage Feb 9, 2023, 3:01 AM

#

is youtube for kids

autumn bolt Feb 9, 2023, 3:31 AM

#

technically, no, but i think almost all kids watch it

amber vortex Feb 9, 2023, 5:14 AM

#

👍

woeful tapir Feb 9, 2023, 10:23 AM

#

hlo

olive monolith Feb 9, 2023, 12:21 PM

#

Mainland and Hong Kong credit card payments are not supported

#

How should I pay？

raven escarp Feb 9, 2023, 4:43 PM

#

Just passing by to give a big shoutout to the Whisper project!

#

I’m working on an accessibility/quality of life tool and Whisper is being INVALUABLE for it

void egret Feb 9, 2023, 5:21 PM

#

Whisper really is overlooked, it's incredibly powerful and the fact that it's open source only makes it better

dim viper Feb 9, 2023, 5:32 PM

#

olive monolith Mainland and Hong Kong credit card payments are not supported

wait till its supported or find someone to pay for you thats in the US?

rotund leaf Feb 9, 2023, 5:34 PM

#

Hi Guy's

dreamy palm Feb 9, 2023, 6:37 PM

#

Hi all, quick question: Has anyone tried to create an AWS Lambda function that runs Whisper's transcribe on a file from S3? I can't think of a reason it wouldn't work, but when I search Google, I cant find anyone else that's done it. Which makes me think I'm missing something.

frozen pendant Feb 9, 2023, 6:54 PM

#

I share with you the project I did with Whisper, Embedding and GPT-3

allows you to load any youtube video and start getting information through a chat

plucky palm Feb 10, 2023, 12:37 AM

#

how can i get whisper to record twitch streams in real time on windows 10?

patent steppe Feb 10, 2023, 1:08 AM

#

IDK

random prism Feb 11, 2023, 2:02 PM

#

frozen pendant I share with you the project I did with Whisper, Embedding and GPT-3 allows you...

can I try it?

thorny ledge Feb 11, 2023, 3:18 PM

#

Where are u from?

celest blade Feb 11, 2023, 3:47 PM

#

HI

iron oar Feb 11, 2023, 4:29 PM

#

frozen pendant I share with you the project I did with Whisper, Embedding and GPT-3 allows you...

Did he use chat gpt api to make it or is there a free option of doing it y’all?

frozen pendant Feb 11, 2023, 4:32 PM

#

I can't share link in this channel, but you can go to the api-projects channel

#

and you have to search for: YoutubeGPT: Satya Nadella interview

#

and you will have all the information and the project repo

#

I'm using OpenAI Whisper, Embedding and GPT-3 API

mystic dune Feb 11, 2023, 4:34 PM

#

If you could introduce your technology to creat chatGPT?

frozen pendant Feb 11, 2023, 4:34 PM

#

#1073315227371851818 message

iron oar Feb 11, 2023, 4:34 PM

#

damn chatgpt3 api I wish there was a free option

#

I like the idea of searching via vid

frozen pendant Feb 11, 2023, 4:36 PM

#

I'm building something so that it can be used for free

iron oar Feb 11, 2023, 4:37 PM

#

yh I respect that

#

I mean it’s nice

#

I’m working on something more complex

frozen pendant Feb 11, 2023, 4:37 PM

#

I will be posting on twitter, if I want to follow me my username is dani_avila7

iron oar Feb 11, 2023, 4:38 PM

#

But very similar

#

I gotcha

#

Let’s trade a follow I just made a Twitter yesterday 😁

#

SullyBillions is my Twitter

daring cloud Feb 11, 2023, 5:04 PM

#

egg incubator research study

white mauve Feb 11, 2023, 7:09 PM

#

tes

compact marten Feb 12, 2023, 12:00 AM

#

mortal plover Feb 12, 2023, 12:15 AM

#

compact marten

Damn bro I feel bad 💀💀

silent basin Feb 12, 2023, 4:23 AM

#

it's a pity that things like ChatGPT are being used for fraud.

glossy moth Feb 12, 2023, 12:01 PM

#

hey guys your too have this problem ""An error occurred. If this issue persists please contact us through our help center at help.openai.com.""

delicate bridge Feb 12, 2023, 12:44 PM

#

I need help regarding streamlit cloud
No such file or directory: 'ffmpeg'
I have tried everything since 3 days nothing working for me
Please help

stable nimbus Feb 12, 2023, 1:50 PM

#

compact marten

Accusing someone basing your evidence on what AI believes is even worse than using AI to write assignments. Change my mind

fickle python Feb 12, 2023, 3:49 PM

#

Hii ALL

#

How're you?\

tropic wyvern Feb 12, 2023, 4:45 PM

#

When a conversation starts, a time log is necessary. If the dialogue continues for a while, it may be necessary to recall when a specific question was asked or when an answer was received. Although AI cannot hold real-time information, time stamps for the conversation would be helpful.

#

hello where the real input area?

marble apex Feb 12, 2023, 5:08 PM

#

how to make the ai write phd thesis

compact marten Feb 12, 2023, 9:08 PM

#

mortal plover Damn bro I feel bad 💀💀

Fr it’s gonna b tuff out here

mellow laurel Feb 12, 2023, 9:09 PM

#

Hi all new Friend

round shore Feb 12, 2023, 9:54 PM

#

hello from norway

charred barn Feb 13, 2023, 4:37 AM

#

I'm using whisper in my project, and I'm having some weird behavior sometimes. Here's a couple errors i've saved from a recent test. Because I'm testing, the audio it's receiving is basically the same each time. just sometimes it gives errors like these, and most of the time it doesnt. I'm probably just gonna wrap some of this in a retry, but wondering if you guys have had similar experiences.

RuntimeError: The size of tensor a (18) must match the size of tensor b (10) at non-singleton dimension 3
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, 6, -1] because the unspecified dimension size -1 can be any value and is ambiguous

void egret Feb 13, 2023, 4:50 AM

#

charred barn I'm using whisper in my project, and I'm having some weird behavior sometimes. H...

Haven't seen that error before, sorry.

frozen axle Feb 13, 2023, 8:08 AM

#

How do I start ?

indigo wasp Feb 13, 2023, 8:33 AM

#

hi from myanmar

sweet drum Feb 13, 2023, 2:19 PM

#

Write a letter to the witcher

harsh wigeon Feb 13, 2023, 10:14 PM

#

You can use it as HTTP API on deepinfra.com

winter wraith Feb 13, 2023, 10:21 PM

#

how do i use whisper?

void egret Feb 13, 2023, 10:59 PM

#

winter wraith how do i use whisper?

Here's the github repo: https://github.com/openai/whisper
Once you actually install it with pip implementing it in your code is really simple

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

charred barn Feb 14, 2023, 2:13 AM

#

trying to mess around with cuda

#

found out our server has an ancient GPU that isn't supported. So I set up port forwarding to do some testing from my own computer

#

I'm not sure I'm seeing any improvement from doing transcribe().cuda()

#

I'm wondering if I might be going about this the wrong way

#

I'm transcribing audio samples from an IVR

#

so 5-10 second long clips grabbed from the IVR over http

#

The examples I saw on github of other people's projects, specifically the one that functions as a rest api. And I throw a Lock on the transcribe method.

#

The issue I had yesterday with those runtime exceptions were because I didn't put a Lock on the transcribe method

#

Guess I'm just real confused about how to get better performance out of it. I have an rtx 3090TI

#

How can I get better concurrency while working with Whisper? Any advise?

void egret Feb 14, 2023, 2:18 AM

#

First off is it actually hitting your gpu?

charred barn Feb 14, 2023, 2:18 AM

#

When we just have a couple calls coming in, it works alright with an average response of like 4 seconds. but when we got like 50 it goes upwards of 10

#

How can I check that? I'm looking at task manager, and i dont see any GPU% coming from python

#

When i had an earlier version of python installed, it told me it wasnt compatible with my gpu and i updated to 3.10

#

so now i get no such error

#

makes me think it is using it

#

but not putting a big load on it

void egret Feb 14, 2023, 2:19 AM

#

Is your gpu usage increasing when you run it?

charred barn Feb 14, 2023, 2:19 AM

#

no, not in a noticeable way

void egret Feb 14, 2023, 2:19 AM

#

If you don't end up installing the CUDA Development Tools it will hit your CPU without any logs

charred barn Feb 14, 2023, 2:20 AM

#

this cuda toolkit?

#

I can check that out now

void egret Feb 14, 2023, 2:20 AM

#

ye, I don't think I can link here but let me try?

#

Just google windows cuda development tools it's the first link

charred barn Feb 14, 2023, 2:21 AM

#

cool

#

I really appreciate your help. I think whisper is gonna help save us some decent money on call processing

void egret Feb 14, 2023, 2:22 AM

#

I think whisper is amazing and just got overlooked by their completion models upgrading shortly after

#

What size model were you using btw?

civic minnow Feb 14, 2023, 2:23 AM

#

a_skull

charred barn Feb 14, 2023, 2:23 AM

#

tiny

#

Haven't had too much innacuracies. gonna see if i can bump it up with the cuda help

#

most issues I get are like numbers sometimes dont process correctly

#

zero dollars and zero cents transcribes to $0.00 and $0.00

#

lol

void egret Feb 14, 2023, 2:25 AM

#

switching to gpu shouldn't change anything with what it outputs, mainly just performance

#

Only real option for increased accuracy is using a larger model

charred barn Feb 14, 2023, 2:25 AM

#

Yeah, but the model should improve accuracy right?

#

yeah

#

If the cuda can handle 100 calls at once and keep responses under 3 seconds i can try a larger model

void egret Feb 14, 2023, 2:26 AM

#

👍 Let me know how it goes

undone bear Feb 14, 2023, 2:40 AM

#

void egret Here's the github repo: https://github.com/openai/whisper Once you actually inst...

how do i install it then?

charred barn Feb 14, 2023, 2:41 AM

#

scroll under setup on that page

void egret Feb 14, 2023, 2:41 AM

#

^ It's at the end of the day it's python library so you just use pip pip install -U openai-whisper

undone bear Feb 14, 2023, 2:42 AM

#

ive never installed pip either how do i do that?

void egret Feb 14, 2023, 2:43 AM

#

Have you used python before?

undone bear Feb 14, 2023, 2:43 AM

#

no...

void egret Feb 14, 2023, 2:43 AM

#

Probably not a great starting point then. Whisper is a library that you use in code, not a standalone app

undone bear Feb 14, 2023, 2:43 AM

#

ah

charred barn Feb 14, 2023, 2:46 AM

#

I think my PC froze in the middle of that install and now my graphics drivers aren't working lol

void egret Feb 14, 2023, 2:46 AM

#

sad

charred barn Feb 14, 2023, 3:27 AM

#

lol well. i couldnt really tell if it was using cuda or not

#

then i think our server started to refuse all my requests out of nowhere lulw i think i pissed off some firewall

#

its not making much of a dent in anything

#

like im doing 40 calls at once, and it's grabbing a bunch of audio samples from the ivr, and my cpu is just going between 3 and 20%. gpu going between 1 and 7%

void egret Feb 14, 2023, 3:33 AM

#

try going to a larger model just to see where the load goes?

charred barn Feb 14, 2023, 3:34 AM

#

I'm running into a networking issue cause im not on the same network. it doesnt like all these requests for audio samples lol. after a while it starts failing when ffmpeg goes to load. I'm gonna bump it up a couple models

charred barn Feb 14, 2023, 3:57 AM

#

I set it to medium and I still can't tell lol. it says python is using like 1% GPU or lower. and like 15% CPU. with like 40 calls, over the course of 2 minutes, it transcribes 80 5 second clips.

#

its cool tho, with tiny we would have hands full of little error, medium is pretty perfect

#

it looks like when it first starts getting requests, it takes like 5 seconds average, but once it gets going, it goes down to under 2 seconds each

#

is there any kinda verbose mode we can call? I'm fairly certain it should be using the cuda now but its barely noticable

void egret Feb 14, 2023, 4:03 AM

#

Not really, you could chuck it something longer and see?

heady estuary Feb 14, 2023, 4:03 AM

#

Omg

#

This is the best

#

Thing i ever saw

#

Thank you creators

#

Ur a blessing

#

@outer scarab i lov y

charred barn Feb 14, 2023, 4:06 AM

#

Gonna try tomorrow. It's a little late over here. Do you know of any benchmarks people may have done with GPU performance with different cards? Gotta make a purchase decision for work

void egret Feb 14, 2023, 4:18 AM

#

Purely for whisper you might find it in one of the discussions on the repo otherwise I got nothing. :/

outer scarab Feb 14, 2023, 4:20 AM

#

heady estuary <@375366726176604161> i lov y

thank u but i am not a creator!

modest flax Feb 14, 2023, 11:46 AM

#

Thank you very much for whisper and the github repo !
I try it on windows with a Shadow (GPU Cloud, NVIDIA RTX A4500) and it's soooo Amazing on the large model 🙂

#

I am trying to separate the speakers in the rendered text, do you have any ideas?

heady estuary Feb 14, 2023, 12:26 PM

#

outer scarab thank u but i am not a creator!

Who is creator

charred barn Feb 14, 2023, 1:18 PM

#

modest flax I am trying to separate the speakers in the rendered text, do you have any ideas...

Look up speaker diarization openai whisper

#

Lots of results on Google

worn bison Feb 14, 2023, 4:17 PM

#

MuseNet

scarlet frost Feb 14, 2023, 4:42 PM

#

Guys

#

rancid nimbus Feb 14, 2023, 8:49 PM

#

has anyone made improvements to real time transcription?

void egret Feb 14, 2023, 8:59 PM

#

There's been a few approaches to realtime transcription in the discussions tab of the repo. I know the cpp fork of it has a few examples that do realtime as well

charred barn Feb 14, 2023, 9:48 PM

#

That's basically what I'm doing

#

I can make a guide later if you guys want.

#

The key to doing transcription live is to store your audio in a buffer, and send samples to whisper. You can do this without files by using a rest API and serving the samples as bytes. ffmpeg works fine with urls and that's the first step in whisper. I'll share examples later

inland reef Feb 14, 2023, 10:48 PM

#

open ai is epic

livid copper Feb 14, 2023, 11:50 PM

#

Any great full stack engineer (with iOS and web expertise) who want to build something useful that can positively impact the society, must haves, passion, drive, curiosity, work ethic...please DM me with your resume or your friends resume !

full plover Feb 15, 2023, 12:28 AM

#

livid copper Any great full stack engineer (with iOS and web expertise) who want to build som...

id love to help but im not full stack

rancid nimbus Feb 15, 2023, 12:31 AM

#

charred barn That's basically what I'm doing

I have seen many people do this already and it still results in seconds of time wait until you are done.

charred barn Feb 15, 2023, 12:32 AM

#

We're doing this approach to transcribe live calls for an IVR.

#

When you say seconds, I feel like you think that matters more than it really does

#

TV subtitles take seconds. And the IVR takes time to do stuff. But when we were sending 100 x 5s samples to it in 2 minutes, it would solve and response in under 2 seconds consistently

#

its more the acceptable for industry use for live applications imo

#

this audio is over websocket btw, so its a continuous stream coming into the ivr

rancid nimbus Feb 15, 2023, 12:34 AM

#

@charred barn If you just have a look at whisper-cpp it has a streaming project and it solves the words but has a delay. Each word would need to be returned within 300 ms after it is done.

#

This is what I am referring to.

#

The best they can do is 1.2 s from what i saw

charred barn Feb 15, 2023, 12:34 AM

#

so what's a sub 1 second delay an issue for? if we were just doing 1 call, it solves stuff real quick

#

and im also pinging from the opposite side of the world right now cause our server doesnt have a gpu suitable

rancid nimbus Feb 15, 2023, 12:36 AM

#

1.2 second delay on a phrase will compound over time

charred barn Feb 15, 2023, 12:36 AM

#

no it wont

rancid nimbus Feb 15, 2023, 12:36 AM

#

will lag over time

charred barn Feb 15, 2023, 12:36 AM

#

nope

#

i disagree with you fully

#

we are doing 50 calls at once

#

in testing

harsh wing Feb 15, 2023, 12:36 AM

#

@muted axle how do I use this

rancid nimbus Feb 15, 2023, 12:37 AM

#

charred barn we are doing 50 calls at once

I would think you would have an issue over time with the slight delay. Are you interacting with the tokenizer?

charred barn Feb 15, 2023, 12:43 AM

#

I got timed out

#

I'm not interacting with whisper beyond just calling transcribe.cuda

#

you mentioned that delay was an issue. But in our application we have a suitable delay threshold

#

i was looking at this version of whisper that allows batch processing

#

i was gonna see if i could get an improvement overall from delaying 1 second to collect audio before sending to whisper

rancid nimbus Feb 15, 2023, 12:45 AM

#

Whisper is really good, but this limitation has prevented me from trying to implement it.

charred barn Feb 15, 2023, 12:46 AM

#

you can dm your link if you want

rancid nimbus Feb 15, 2023, 12:46 AM

#

Especially when other solutions are so close and are fully developed.

charred barn Feb 15, 2023, 12:46 AM

#

i wish they'd add a leveling system here so we could get trusted enough to react to stuff or add links to discussion lol

rancid nimbus Feb 15, 2023, 12:46 AM

#

ya would be good

charred barn Feb 15, 2023, 12:47 AM

#

what are you trying to use it for ?

rancid nimbus Feb 15, 2023, 12:47 AM

#

Being able to react instantly from incoming audio in conversational style

charred barn Feb 15, 2023, 12:48 AM

#

but that doesnt really tell me what your project is. like, what are you trying to achieve? I dont see what a very short delay would prevent

rancid nimbus Feb 15, 2023, 12:48 AM

#

The fastest solution I have used so far is riva

#

or triton from nvidia

#

people talk fast

charred barn Feb 15, 2023, 12:49 AM

#

what kind of response times are you getting with those?

rancid nimbus Feb 15, 2023, 12:49 AM

#

reaction times and delays are what makes it feel clunky or not

#

just how you would talk with some one

#

if some one takes 1.5 seconds to respond to you every time during a live conversation that isn't natural feeling

#

not to mention any other latency added for other computation needed.

#

That is why it needs to be 300 ms or faster

charred barn Feb 15, 2023, 12:50 AM

#

That makes sense

rancid nimbus Feb 15, 2023, 12:51 AM

#

Seems like whisper isn't targeted to this use case, but it is so good not to try

charred barn Feb 15, 2023, 12:51 AM

#

it'd be nice if they added native support for websocket audio

#

just send me back all the words 1 at a time lol

rancid nimbus Feb 15, 2023, 12:52 AM

#

the whisperX

charred barn Feb 15, 2023, 12:52 AM

#

you know what'd be even nicer? if i didnt need to use ffmpeg at all

rancid nimbus Feb 15, 2023, 12:52 AM

#

oh ya

charred barn Feb 15, 2023, 12:52 AM

#

i already can serve the audio in the exact format that its encoding it to

#

but it still calls it every time

rancid nimbus Feb 15, 2023, 12:53 AM

#

the model is good but the rest of the software isn't as useful

#

i love streams like the unix/linux way

#

so ffmpeg isn't a big deal if it isn't slow but if its slow then it needs to be removed.

charred barn Feb 15, 2023, 12:54 AM

#

every fraction of a second helps

rancid nimbus Feb 15, 2023, 12:56 AM

#

yep

waxen grove Feb 15, 2023, 6:07 PM

#

How to enter whisper?

void egret Feb 15, 2023, 6:11 PM

#

Whisper is a model used for speech to text transcription. You can find instructions on how to use it here: https://github.com/openai/whisper. There isn't a website or anything of that sort though if that's what you are asking.

modest glacier Feb 15, 2023, 9:37 PM

#

Thanks

spiral tartan Feb 16, 2023, 7:16 AM

#

"I hope this message finds you well. I wanted to discuss the topic of WHISPER, a rapidly growing field that combines engineering and technology to solve complex problems. Learning more about it can be a valuable investment for personal and professional growth with numerous career opportunities. I'm happy to provide information and resources through discussion, online resources, or workshops/conferences. Let me know if you have questions or would like to chat further. Best regards."

radiant sapphire Feb 16, 2023, 9:37 AM

#

Can anyone help me get the phone code?

hazy dagger Feb 16, 2023, 11:53 AM

#

now what should i do

charred barn Feb 16, 2023, 3:10 PM

#

radiant sapphire Can anyone help me get the phone code?

You talking about my code?

pseudo prawn Feb 16, 2023, 3:44 PM

#

i wanna know whats whisper?

polar pewter Feb 16, 2023, 4:41 PM

#

can anyone help me with how to use this gpt bot??

rancid nimbus Feb 16, 2023, 4:52 PM

#

This is the whisper channel. Probably want a gpt channel.

autumn bolt Feb 16, 2023, 5:06 PM

#

polar pewter can anyone help me with how to use this gpt bot??

#chatgpt-discussions

rain flame Feb 16, 2023, 6:04 PM

#

pseudo prawn i wanna know whats whisper?

Whisper is model used for speech to text transcription

timid lintel Feb 17, 2023, 12:57 AM

#

Has anyone tried integrating Whisper into Audacity, maybe via plugin?

hollow turtle Feb 17, 2023, 2:44 PM

#

Has anyone tried integrating Whisper into React App?

celest shell Feb 17, 2023, 11:31 PM

#

How can I use whisper

autumn turret Feb 18, 2023, 1:26 AM

#

celest shell How can I use whisper

https://github.com/openai/whisper

GitHub

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale ...

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

fathom timber Feb 18, 2023, 4:20 AM

#

Dude thats crazyQ!!!

iron oar Feb 18, 2023, 5:09 AM

#

thx 🙏

autumn bolt Feb 18, 2023, 9:18 AM

#

So

#

I can send messages here

#

Wow

#

Can anyone see ma messages

#

Nop ?

vast wharf Feb 18, 2023, 1:26 PM

#

Wow, that's crazy !

autumn bolt Feb 18, 2023, 1:32 PM

#

Is whisper available as a Discord bot?

tawny light Feb 18, 2023, 3:06 PM

#

autumn bolt Can anyone see ma messages

yes

autumn bolt Feb 18, 2023, 4:06 PM

#

So cool

blazing valley Feb 18, 2023, 6:23 PM

#

whisper - open source is the way

clear current Feb 18, 2023, 7:09 PM

#

hii

empty onyx Feb 19, 2023, 2:43 AM

#

dreamy palm Hi all, quick question: Has anyone tried to create an AWS Lambda function that r...

AWS Lambda functions can’t run continuously for more than 15 mins.

dreamy palm Feb 19, 2023, 2:49 AM

#

empty onyx AWS Lambda functions can’t run continuously for more than 15 mins.

Thanks for the response! We will be breaking up the inferences so that they are short enough to be run on lambda.

slow dew Feb 19, 2023, 9:30 AM

#

hollow turtle Has anyone tried integrating Whisper into React App?

yes, but only to upload the audio file and return the results via API

copper swan Feb 19, 2023, 12:37 PM

#

Helmo

lilac sleet Feb 20, 2023, 11:09 AM

#

If I am correct Whisper is what was used to give ChatGPT human like responses correct?

marsh breach Feb 20, 2023, 12:06 PM

#

lilac sleet If I am correct Whisper is what was used to give ChatGPT human like responses co...

No, whisper is for voice to text

lilac sleet Feb 20, 2023, 12:30 PM

#

marsh breach No, whisper is for voice to text

oh

sonic mango Feb 20, 2023, 2:58 PM

#

Hello everyone, hope you are well

#

I have a question regarding whisper, what's the best way today to modify the formatting of the text output ?

tidal canopy Feb 20, 2023, 11:02 PM

#

sonic mango I have a question regarding whisper, what's the best way today to modify the for...

I personally just parse the srt file by hand but pretty sure there are also libs for that

spiral hawk Feb 21, 2023, 1:14 PM

#

sonic mango I have a question regarding whisper, what's the best way today to modify the for...

To choose right output format is the best way. There are plenty of them. I am using it from cli, not (yet) from python script.

sonic mango Feb 21, 2023, 9:38 PM

#

tidal canopy I personally just parse the srt file by hand but pretty sure there are also libs...

Thanks for the response, I'm using Python but I'm unable to find a command to help with that

tidal canopy Feb 21, 2023, 9:39 PM

#

what do you want to transfer it to?

sonic mango Feb 21, 2023, 9:39 PM

#

spiral hawk To choose right output format is the best way. There are plenty of them. I am us...

Thanks for the response, CLI means Command-line interface right ? what command could help with that ?

sonic mango Feb 21, 2023, 9:41 PM

#

tidal canopy what do you want to transfer it to?

to a text or a word document, I just want to remove the word "speaker 1" and "speaker 2" and instead have whisper put the text from speaker 1 in italique and no change on speaker 2 with some return to line in between speakers

#

I would also like to remove the timestamp

tidal canopy Feb 21, 2023, 9:41 PM

#

wait since when does whisper support multiple speakers lul

sonic mango Feb 21, 2023, 9:43 PM

#

sorry, I did not give an answer with the full context and I was lost in my own head. I'm tinkering with another module called diarization which helps by identifying speakers

#

I should probably seek help on that specific module rather that with whisper itself

tidal canopy Feb 21, 2023, 9:44 PM

#

can you give a sample output and we can move to DM if you like since this doesn't really have much to do with whisper itself SeemsBlob

sonic mango Feb 21, 2023, 9:44 PM

#

sure

candid junco Feb 22, 2023, 4:12 AM

#

Hey guys, I am interested in building with Whisper.

I am trying to transcribe my calls. Would you guys say it's best to build it on top of something like Twilio flex or is there a way for OpenAI to listen in and transcribe calls based on a Chrome extension or something like that?

still monolith Feb 22, 2023, 12:56 PM

#

ohio radio is down radio.garden/visit/columbus-oh/oHcdAaW1

fickle merlin Feb 22, 2023, 4:19 PM

#

Shh

#

Whisper

buoyant cairn Feb 22, 2023, 4:21 PM

#

HEY

uncut venture Feb 22, 2023, 6:34 PM

#

AdamAI: The first AI-powered video search engine uses Whisper 🙂 Check it out in api-projects page

uncut venture Feb 22, 2023, 7:47 PM

#

Demo

tidal canopy Feb 22, 2023, 8:06 PM

#

candid junco Hey guys, I am interested in building with Whisper. I am trying to transcribe m...

you could probably grab stuff with an extension, but you still need to spawn a local webserver running whisper
not really familiar with twilio flex Coke_Sip

frigid ferry Feb 22, 2023, 9:06 PM

#

uncut venture Demo

Looks cool

candid junco Feb 22, 2023, 9:21 PM

#

uncut venture Demo

thats freakin EPIC

uncut venture Feb 22, 2023, 9:21 PM

#

thx 🙂

#

try it out

#

in projects section the link is tehre

tidal canopy Feb 22, 2023, 10:58 PM

#

Sounds like a pyramid scheme and I'm not sure if that's the right channel Thonk

void lotus Feb 22, 2023, 11:27 PM

#

lol

autumn bolt Feb 23, 2023, 1:07 PM

#

hmmm any one interested in my gaming server?

autumn bolt Feb 23, 2023, 1:11 PM

#

autumn bolt hmmm any one interested in my gaming server?

what game

autumn bolt Feb 23, 2023, 1:11 PM

#

autumn bolt what game

fortnite, fallguys, minecraft, rocket league and roblox

#

yes minecraft

#

wanna join then click on my profile and about me

#

done'

elfin acorn Feb 23, 2023, 5:22 PM

#

hey guys im new here

#

but td I saw lots of cool projects

#

but for this Reverse Video Search project AdamAI did any of yall experience an issue where sometimes results dont show?

#

#

but after trying again a refreshin it works

#

but why is that?

frozen field Feb 23, 2023, 9:20 PM

#

Wow very cool

mighty marlin Feb 24, 2023, 11:19 AM

#

turkey

#

süüü

ruby grail Feb 24, 2023, 8:14 PM

#

anyone can invite BlueWillow&

errant inlet Feb 25, 2023, 7:39 AM

#

Hi how do I start using Whisper and where is it located?

#

Do I have to be a tech person to install it?

void sail Feb 25, 2023, 9:44 PM

#

errant inlet Hi how do I start using Whisper and where is it located?

you can download the necessary stuff off OpenAI's github for Whisper

void sail Feb 25, 2023, 9:45 PM

#

errant inlet Do I have to be a tech person to install it?

nobody and everybody is a tech person, it's just a matter of googling the instructions to set it up

torpid kraken Mar 1, 2023, 3:14 AM

#

How can I use whisper?

autumn bolt Mar 1, 2023, 7:22 AM

#

Download it from the GitHub

#

github.com/openai/whisper

plain gyro Mar 1, 2023, 1:20 PM

#

autumn bolt Download it from the GitHub

I aint gonna use whisper just cause u told me to.

native ledge Mar 1, 2023, 4:53 PM

#

Any idea when GPT 4 is releasing or the newBing

plucky palm Mar 1, 2023, 5:16 PM

#

[mp3 @ 000001ed70287cc0] Format mp3 detected only with low score of 1, misdetection possible!
[mp3 @ 000001ed70287cc0] Failed to read frame size: Could not seek to 1026.
C:\Users\18186\AppData\Local\Temp\Vocaroo 1fhNXMiDDXG6_gchawfnh2335b7b68aaefaa9608d380451ceb5d05a9f5109.mp3: Invalid argument

#

anyone encounter anything like this?

#

i cant get whisper to read my mp3.

muted axleBOT Mar 1, 2023, 6:20 PM

#

Introducing ChatGPT and Whisper APIs

Developers can now integrate ChatGPT and Whisper models into their apps and products through our API.

cursive torrent Mar 1, 2023, 6:24 PM

#

is there a website where i can try whisper?

pine sun Mar 1, 2023, 6:35 PM

#

InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

If I am not mistaken local Whisper supported .ogg sound format
If it is possible can you please add .ogg to support? For the example, .ogg is the default format for the Telegram audiomessages.

||Yes, I know that I can convert file, however that's tough||

timid lintel Mar 1, 2023, 6:41 PM

#

iirc, Whisper uses FFmpeg to convert the input audio anyway, since it has to be in a very specific format before converting

near dawn Mar 1, 2023, 8:20 PM

#

pine sun `InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'we...

with .ogg I think you can simply just change the file name to .wav

blissful dove Mar 1, 2023, 8:22 PM

#

Hooray! Very happy to hear OpenAI's announcement about Whisper API.

valid cradle Mar 1, 2023, 8:47 PM

#

anyone play around with the new API yet? wondering if its possible to get the "enable_word_time_offsets" param working?

void egret Mar 1, 2023, 8:57 PM

#

Haven't messed with it but I don't see a way from the docs :/

valid cradle Mar 1, 2023, 8:58 PM

#

the docs don't mention ANY params at all, though I think it has some

#

getting a very basic, text-only response

void egret Mar 1, 2023, 8:59 PM

#

Could try setting response_format to verbose_json or vtt?

valid cradle Mar 1, 2023, 9:08 PM

#

therrrre it is thank you!

#

where did you even see that as an option btw?

void egret Mar 1, 2023, 9:10 PM

#

I gotta set aside some time to mess with it, my gpu could barely run the medium model. Happy to hear verbose_json has decent info

valid cradle Mar 1, 2023, 9:10 PM

#

its phrase-level offsets but its something

void egret Mar 1, 2023, 9:10 PM

#

https://platform.openai.com/docs/api-reference/audio/create#audio/create-response_format

valid cradle Mar 1, 2023, 9:11 PM

#

gahh. yea miscommunication, re: translate and transcribe being the same endpoint

#

the good info is in the translate part of the docs

#

oh no wait, there it is in transcribe too.

#

damn ok, guess I missed this

#

anyway thx!

sleek pebble Mar 1, 2023, 9:12 PM

#

Congrats on launching the Whisper API. Maybe I can ditch the 10 AWS-T4 server instances I've been running. lol

rotund furnace Mar 1, 2023, 9:20 PM

#

Haha same here. A whisper API is... actually quite industry changing. It's pretty damn fast.

sleek pebble Mar 1, 2023, 9:28 PM

#

I haven't benchmarked the API yet but I'm currently averaging about 4.7 seconds of audio transcribed per second on a Nvidia T4 GPU.

indigo totem Mar 1, 2023, 10:21 PM

#

Im getting

const openai = new OpenAIApi(configuration);


const completion = await openai.createTranscription(fs.readFileSync("Fears.mp3"), "whisper-1","this is a youtube comentary","text",1);

sleek pebble Mar 1, 2023, 10:24 PM

#

The whisper API service is 3.3x faster and 50-75% cheaper for me.

autumn bolt Mar 1, 2023, 10:55 PM

#

plain gyro I aint gonna use whisper just cause u told me to.

Did it look like i was talking to you??

lavish ferry Mar 1, 2023, 11:18 PM

#

sleek pebble The whisper API service is 3.3x faster and 50-75% cheaper for me.

Thanks for sharing this. I was looking at hosting yesterday and the API popped up today. Seemed like the whisper api was way cheaper but I'm glad to have confirmation.

#

I hope they add a model parameter. I find the medium model is the ultimate bang for buck in terms of quality and speed.

plain gyro Mar 1, 2023, 11:54 PM

#

autumn bolt Did it look like i was talking to you??

Huuh???

lavish ferry Mar 1, 2023, 11:56 PM

#

PSA: transcode your files to mp3 before uploading them to the transcription endpoint. In my testing, mp3 transcription is 100-125% faster than WAV, 80% faster than webm.

#

It takes 4 seconds to transcode 1 min of audio. Blazing.

shut spoke Mar 2, 2023, 12:02 AM

#

Hey, does the api also have a translation function or is it limited to transcribe?

#

is the translate better than google/aws translate?

sleek pebble Mar 2, 2023, 12:09 AM

#

shut spoke Hey, does the api also have a translation function or is it limited to transcrib...

Use the translations endpoint.

sleek pebble Mar 2, 2023, 12:11 AM

#

lavish ferry PSA: transcode your files to mp3 before uploading them to the transcription endp...

Could be your internet connection. Internally at least with whisper-asr-webservice, ffmpeg decompresses to uncompressed pcm_s16le

lavish ferry Mar 2, 2023, 12:12 AM

#

sleek pebble Could be your internet connection. Internally at least with whisper-asr-webservi...

oof i am a dummy. internet connection is probably a major factor - the wav is 4x the mp3

solid tapir Mar 2, 2023, 12:20 AM

#

Guys, how do I get access to GPT3.5 model in the API playground?

lavish ferry Mar 2, 2023, 12:21 AM

#

sleek pebble Could be your internet connection. Internally at least with whisper-asr-webservi...

Hmm, even taking that into account it seems like mp3s run a little faster. For reference, a 5:33 min mp3 (5.5mb) takes 22 seconds to transcribe. The same file in wav format (21mb) takes 52 seconds to transcribe. On my current (terrible hotspot) connection, internet speeds account for roughly 19 seconds of overhead. That's still a 11 second delta. Would appreciate seeing results from others here.

sleek pebble Mar 2, 2023, 12:35 AM

#

lavish ferry Hmm, even taking that into account it seems like mp3s run a little faster. For ...

I'm getting the same performance +/- 0.1 seconds

lavish ferry Mar 2, 2023, 12:43 AM

#

sleek pebble I'm getting the same performance +/- 0.1 seconds

Thanks a ton.

radiant solstice Mar 2, 2023, 1:55 AM

#

lavish ferry PSA: transcode your files to mp3 before uploading them to the transcription endp...

Great tip

marble night Mar 2, 2023, 4:55 AM

#

when will have a tts model?

night niche Mar 2, 2023, 5:26 AM

#

hey there folks, i'm working on a project that requires me to transcribe audio live from the user input and store it in a variable. i'm a noob regarding this and would like to know if there's any way to achieve this

peak elm Mar 2, 2023, 7:37 AM

#

hi all, I feel like I'm missing something really obvious. How am I supposed to structure my request using generic fetch?

        const form = new FormData();
        form.append("file", fs.readFileSync("audio.mp3"));
        form.append("model", "whisper-1");

        const response = await fetch(
            "https://api.openai.com/v1/audio/transcriptions",
            {
                method: "POST",
                headers: {
                    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                },
                body: form,
            }
        );```

#

I keep getting

error: {
    message: '1 validation error for Request\n' +
      'body -> file\n' +
      "  Expected UploadFile, received: <class 'str'> (type=value_error)",
    type: 'invalid_request_error',
    param: null,
    code: null
  }```

wheat kindle Mar 2, 2023, 7:47 AM

#

hi guys, for transcribing 21seconds speech it's taking 5 seconds. Is there any way to decrease the latecy?

#

it's making it unusuable

#

5 seconds for 21 seconds is way too much

quartz cove Mar 2, 2023, 7:51 AM

#

Is live transcription with the new API possible and if yes then what’s the best resource to learn about it? Has anyone done it yet?

peak elm Mar 2, 2023, 7:59 AM

#

wheat kindle hi guys, for transcribing 21seconds speech it's taking 5 seconds. Is there any w...

What type of file are you uploading? MP3 seems to be the most performant

wheat kindle Mar 2, 2023, 7:59 AM

#

peak elm What type of file are you uploading? MP3 seems to be the most performant

mp3 only

peak elm Mar 2, 2023, 8:00 AM

#

wheat kindle mp3 only

For 90s clip of audio, it takes me roughly 5s of processing

wheat kindle Mar 2, 2023, 8:01 AM

#

well I guess it can process fast in terms of maximum time but the minimum time is not low enough

peak elm Mar 2, 2023, 8:01 AM

#

I'm processing a 5s clip in roughly 900ms

wheat kindle Mar 2, 2023, 8:02 AM

#

i guess we can expect it to improve only overtime as I presume it's the model issue itself

peak elm Mar 2, 2023, 8:03 AM

#

I think it's just due to the nature of how the system is setup. It's processes faster with longer audio files since overhead plays less of a role

#

There's probably a good ~300ms or so for a job to be created and assigned to a ready GPU

wheat kindle Mar 2, 2023, 8:05 AM

#

yes that's what i was thinking

#

but it rather defeats the purpose of STT

#

it should ideally be near real-time

peak elm Mar 2, 2023, 8:06 AM

#

i'd love that too

wheat kindle Mar 2, 2023, 8:06 AM

#

I hope OpenAI will do it eventually

peak elm Mar 2, 2023, 8:06 AM

#

right now for my project I've been sending requests to whisper every 500ms or so and then taking the latest response when it's needed

#

but it's pretty unideal since it can lose words at the end

wheat kindle Mar 2, 2023, 8:06 AM

#

yeah

peak elm Mar 2, 2023, 8:07 AM

#

potentially thinking of using the prompt with the current translation to let me send shorter segments (rather than the whole built up buffer)

plain gyro Mar 2, 2023, 9:56 AM

#

marble night when will have a tts model?

On your tatas

azure chasm Mar 2, 2023, 12:08 PM

#

Hi all
I want to use whisper api but I get this error when I send form data to the api, anyone knows what's the issue?

azure chasm Mar 2, 2023, 12:12 PM

#

wheat kindle hi guys, for transcribing 21seconds speech it's taking 5 seconds. Is there any w...

Hi
Can you share you code? I have some issues with the api

still wave Mar 2, 2023, 1:21 PM

#

Is it possible to Train the chatGpt and get the trained data from ChatGpt by Api Hit

sleek pebble Mar 2, 2023, 3:32 PM

#

azure chasm Hi all I want to use whisper api but I get this error when I send form data to t...

Try omitting the content-type.

azure chasm Mar 2, 2023, 3:32 PM

#

sleek pebble Try omitting the content-type.

which content type?

#

https://platform.openai.com/docs/api-reference/audio
in here they mentioned about multipart/form-data

OpenAI API

An API for accessing new AI models developed by OpenAI

sleek pebble Mar 2, 2023, 3:33 PM

#

azure chasm which content type?

Content-Type: multipart/form-data in your js

azure chasm Mar 2, 2023, 3:34 PM

#

sleek pebble Content-Type: multipart/form-data in your js

I have this

#

you can see in the second image

sleek pebble Mar 2, 2023, 3:36 PM

#

azure chasm I have this

formData will automatically add the content type and the mutipart boundaries required to do a form post.

azure chasm Mar 2, 2023, 3:36 PM

#

sleek pebble formData will automatically add the content type and the mutipart boundaries req...

#

what's the problem in here?

sleek pebble Mar 2, 2023, 3:37 PM

#

Remove "Content-Type": "mutipart/form-data"

azure chasm Mar 2, 2023, 3:40 PM

#

sleek pebble Remove `"Content-Type": "mutipart/form-data"`

Thank you so much, it's working now

deep bane Mar 2, 2023, 5:06 PM

#

Having an issue where I'm only getting the first sentence of a transcript back...

INPUT
"key": "file",
"data": "IMTBuffer(369427, binary, THE BINARY DATA",
"fileName": "file.m4a",
"fieldType": "file"

OUTPUT

"data": { "text": "Scientific Advertising, by Claude C. Hopkins." }, "fileSize": 56

#

More context: Totally Noob way to run whisper i know - (via make.com http module) -- Seems that OPEN AI is requesting the file in binary, so I'm using an HTTP model to convert the file into binary, then sending via the api... if I just put the file URL in the http request it does not seem to work at all

however, this only returns the first sentence of the file for whatever reason

#

an alternative approach - does not work

error i recieve: 1 validation error for Request
body -> file
Expected UploadFile, received: <class 'str'> (type=value_error)

#

Trying in postman - no luck

#

must have been an issue with the file - works fine with an MP3

deep bane Mar 2, 2023, 5:44 PM

#

Random question - if it's an interview between two people, can you instruct via the prompt to label the speakers?

wheat kindle Mar 2, 2023, 5:47 PM

#

deep bane Random question - if it's an interview between two people, can you instruct via ...

no there's no speaker diarization functionality offered so far

deep bane Mar 2, 2023, 5:50 PM

#

Thanks, makes sense - would be nice to instruct it to output in Markdown, though that could be done via GPT of course

hidden summit Mar 2, 2023, 5:59 PM

#

anyone else getting a huge error when using the nodejs library to get a transcript?

#

looks like its just printing out the request

#

trying to use fetch just gives me 400 bad request

#

stiff leaf Mar 2, 2023, 6:12 PM

#

yeah, the nodejs usage of createTranscription seems to be broken

hidden summit Mar 2, 2023, 6:12 PM

#

is the entire api broken or am i doing something wrong?

#


formData.append("file", fs.createReadStream("/home/simon/Projects/resident/mpthreetest.mp3"));
formData.append("model", "whisper-1");
formData.append("language", "en");

const res = await fetch("https://api.openai.com/v1/audio/transcriptions", {
    headers : {
        Authorization: `Bearer ${apiKey}`,
    },
    method: "POST",
    body: formData,
});

console.log(res)
let data = await res.json();
console.log(data)```

stiff leaf Mar 2, 2023, 6:15 PM

#

try adding a third argument to the first append, that includes the filename: { filename: "test.mp3" }

hidden summit Mar 2, 2023, 6:16 PM

#

nope

#

#

anyone have working fetch code i could see as a reference?

#

adding content-type gives a different error

stiff leaf Mar 2, 2023, 6:24 PM

#

i got things working client-side only, still trying to translate it to server-side code, hitting similar errors as you @hidden summit

hidden summit Mar 2, 2023, 6:29 PM

#

stiff leaf i got things working client-side only, still trying to translate it to server-si...

what does buffer equal in this case?

#

got it

#

wet herald Mar 2, 2023, 9:37 PM

#

How can I access the Whisper AI through the OpenAI API? The documentation isn't too clear

sour ermine Mar 2, 2023, 10:00 PM

#

wet herald How can I access the Whisper AI through the OpenAI API? The documentation isn't ...


import { Configuration, OpenAIApi } from "openai";
import fs from 'fs';

const mp3 = fs.createReadStream("audio.mp3")
const resp = await openai.createTranscription(mp3, "whisper-1");
console.log(resp)

#

Has anyone managed to pass a .mp3 from an external host?

marble abyss Mar 2, 2023, 10:11 PM

#

Hi is there an app like otter.ai that uses whisper?

sterile apex Mar 2, 2023, 10:46 PM

#

Hi, how do I pass response_format="srt" in python?

import openai
import unicodedata
audio_file= open("./audio.mp3", "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

subtle radish Mar 2, 2023, 10:57 PM

#

anyone experiencing the below error when using the whisper API?

"type": "server_error",
"param": null,
"code": null
}
} 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 6fa27ee8db754a9596de625d704367ce in your email.)', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 02 Mar 2023 22:43:44 GMT', 'Content-Type': 'application/json', 'Content-Length': '365', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'X-Request-Id': '6fa27ee8db754a9596de625d704367ce'}

quaint jay Mar 3, 2023, 12:21 AM

#

subtle radish anyone experiencing the below error when using the whisper API? "type": "server...

contact our help center at help.openai.com and include the request ID (6fa27ee8db754a9596de625d704367ce) in your email.

hidden summit Mar 3, 2023, 12:48 AM

#

sour ermine Has anyone managed to pass a .mp3 from an external host?

im trying to now'

#

it gives an odd error though

#

sometimes it works, but sometimes it gives the error TypeError [ERR_INVALID_STATE]: Invalid state: chunk ArrayBuffer is zero-length or detached

storm thorn Mar 3, 2023, 2:41 AM

#

anyone else getting this error message (401) using the whisper api : You didn't provide an API key. You need to provide your API key..., I get this error using curl requests or with the axios library (trying to get it browser compatible without nodejs-specific code)

#

would like to talk to anyone that has managed to make a whisper api curl request, the 401 message above does not go away...

unique anchor Mar 3, 2023, 5:48 AM

#

does Whisper support text-to-speech synthesis ?

native ruin Mar 3, 2023, 6:59 AM

#

Gm. So i have a customer who has possible multiple use for the project. Will the api alone provide access to various use cases?

errant narwhal Mar 3, 2023, 7:06 AM

#

both .webm and .wav being created by the standard Chrome MediaRecorder as audio/webm or audio/webm not working. Whisper says invalid file format. Is it due to codec used on MediaRecorder side?

errant narwhal Mar 3, 2023, 7:18 AM

#

errant narwhal both .webm and .wav being created by the standard Chrome MediaRecorder as audio/...

Got it. Issue resolved by setting the filetype and filename. curl_file_create($voiceFile, $fileType, "audio.wav")

storm thorn Mar 3, 2023, 9:13 AM

#

errant narwhal Got it. Issue resolved by setting the filetype and filename. curl_file_create($v...

What are you using to make the request? Running into the same issue with little success, seeing a code snippet that works would be great

amber sage Mar 3, 2023, 9:39 AM

#

sterile apex Hi, how do I pass response_format="srt" in python? ```python import openai impo...

You could try the following code.

import requests

headers = {
    'Authorization': 'Bearer sk-***',
}

files = {
    'file': open('./audio.mp3', 'rb'),
    'model': (None, 'whisper-1'),
    'response_format': (None, 'srt'),
}

response = requests.post('https://api.openai.com/v1/audio/transcriptions', headers=headers, files=files)

print(response.txt)

ionic valley Mar 3, 2023, 9:45 AM

#

How do I send prompt to whisper api?

proper kayak Mar 3, 2023, 9:51 AM

#

Where can I get the secret key for whisper model?

ionic valley Mar 3, 2023, 9:56 AM

#

proper kayak Where can I get the secret key for whisper model?

You can use same secret key that is available for your openai account

proper kayak Mar 3, 2023, 9:57 AM

#

@ionic valley OK, thank you, I go to try it now

indigo flame Mar 3, 2023, 10:15 AM

#

Hi, anyone had any success in using power automate to call the whisper api?

hearty prism Mar 3, 2023, 12:28 PM

#

Does Whisper support timestamps for words as well?

inland epoch Mar 3, 2023, 1:13 PM

#

Hey guys! I'm trying to do a POST request to OpenAI Whsiper in React Native with Expo. I get an audio recording in an .m4a file, but the response I get from the API is

{"error": {"code": null, "message": "The audio file could not be decoded or its format is not supported.", "param": null, "type": "invalid_request_error"}}

Any ideas? I don't think this is related to the format, but maybe I'm missing something?

inland epoch Mar 3, 2023, 1:33 PM

#

Okay figured it out. Will leave it here for someone else having troubles.

When I was recording the audio, I had put this options to the Expo Audio object:

ios: {
                extension: ".m4a",
                audioQuality: Audio.IOSAudioQuality.HIGH,
                sampleRate: 44100,
                numberOfChannels: 2,
                bitRate: 128000,
},

Somehow this doesn't work well with OpenAI. Changed it to this one (as in the Expo docs):

ios: {
                extension: ".m4a",
                outputFormat: Audio.IOSOutputFormat.MPEG4AAC,
                audioQuality: Audio.IOSAudioQuality.MAX,
                sampleRate: 44100,
                numberOfChannels: 2,
                bitRate: 128000,
                linearPCMBitDepth: 16,
                linearPCMIsBigEndian: false,
                linearPCMIsFloat: false,
},

Working good now!

hidden summit Mar 3, 2023, 1:39 PM

#

Anyone have code that can feed a remote url into whisper?

void egret Mar 3, 2023, 2:00 PM

#

hidden summit Anyone have code that can feed a remote url into whisper?

I don't believe that's an option as they require you to send the data. You'll have to fetch it yourself and send it

hidden summit Mar 3, 2023, 2:00 PM

#

Yeah

#

That would be a step

vivid sparrow Mar 3, 2023, 2:45 PM

#

hearty prism Does Whisper support timestamps for words as well?

check out whisperX on github

hidden summit Mar 3, 2023, 3:22 PM

#

is there a way to separate speakers with the api?

patent shale Mar 3, 2023, 3:26 PM

#

hidden summit is there a way to separate speakers with the api?

There is not an option for speaker identification at the moment.

hidden summit Mar 3, 2023, 3:26 PM

#

Darn

void egret Mar 3, 2023, 3:27 PM

#

hidden summit is there a way to separate speakers with the api?

No, speaker diarization is not supported by whisper. You'll find a few projects on the discussions tab of the repo that handle it but that means running it locally

patent shale Mar 3, 2023, 3:27 PM

#

hearty prism Does Whisper support timestamps for words as well?

the response_format can be used to generate a SRT file which should have timestamps

void egret Mar 3, 2023, 3:28 PM

#

Iirc it's phrase level timestamps not word level sadly sad

hidden summit Mar 3, 2023, 3:28 PM

#

hidden summit Anyone have code that can feed a remote url into whisper?

    let file = await response.blob();   

    formData.append("file", file, "audio.m4a");```

btw

void egret Mar 3, 2023, 3:30 PM

#

Yup that should work, probably also add logic to segment it if the file is too large

patent shale Mar 3, 2023, 4:33 PM

#

Wondering if we can co-opt this thread that was meant for the open source version of Whisper and include the API version of Whisper? Or start another thread to avoid confusion...

void egret Mar 3, 2023, 4:43 PM

#

co-opting it should be fine, it was one of the least used channels before

faint latch Mar 3, 2023, 5:16 PM

#

Did any one know how to contact openai sale?

timber shard Mar 4, 2023, 1:30 AM

#

faint latch Did any one know how to contact openai sale?

Yes https://openai.com/contact-sales

Contact sales

We’re happy to answer questions and get you acquainted with OpenAI, including connecting you with helpful resources, exploring use cases for your team, and discussing pricing options.

#

Business api is like 250k I think

hushed crystal Mar 4, 2023, 2:41 AM

#

Anyone has got a way to search for common-password.txt

#

just looking for a common word less than 16 characters

fickle coyote Mar 4, 2023, 5:03 AM

#

Using playground's whisper just gives me whitespace when uploading MP3s

unkempt wolf Mar 4, 2023, 5:48 AM

#

ionic valley How do I send prompt to whisper api?

have you found out yet?

ionic valley Mar 4, 2023, 9:05 AM

#

unkempt wolf have you found out yet?

Nope

potent niche Mar 4, 2023, 9:58 AM

#

I'm getting this error while trying to use the nodeJS library

RequiredError: Required parameter model was null or undefined when calling createTranscription.

Code:

const openai = new OpenAIApi(configuration);
  openai.createTranscription({
    file: mp3Content,
    model: 'whisper-1',
    responseFormat: 'text',
  }).then((response) => {
    console.log(response.data);
    message.reply(response.data);
  }).catch((error) => {
    console.error(error);
  });

#

I'm using the latest library so idk what could be causing the issue

ionic valley Mar 4, 2023, 11:10 AM

#

Replace above code with this code and try:

const openai = new OpenAIApi(configuration);
  openai.createTranscription({
    file: mp3Content,
    engine: 'whisper-1',
    responseFormat: 'text',
  }).then((response) => {
    console.log(response.data);
    message.reply(response.data);
  }).catch((error) => {
    console.error(error);
  });

red zephyr Mar 4, 2023, 11:28 AM

#

Getting error 429 @ionic valley

dim jacinth Mar 4, 2023, 3:02 PM

#

Hey, I've been using the Whisper API for a bit now, but for me none of the calls to the transcription endpoint seem to be logged in the Usage panel 🤔 (although I can see API calls to other endpoints just fine), is anyone else experiencing the same?

remote gazelle Mar 4, 2023, 8:49 PM

#

LIquefaction induced failure of shallow foundations

spiral hawk Mar 4, 2023, 8:58 PM

#

anyone experimented with temperature and beam_size? I have various repetitive gaps in my transcribes, and I want to "fix" it. Anyone had the same problem ?

lavish jungle Mar 5, 2023, 4:42 AM

#

Code bot chat GPT

amber sage Mar 5, 2023, 7:06 AM

#

ionic valley How do I send prompt to whisper api?

You could specify prompt in the following way.

import requests

headers = {
    'Authorization': 'Bearer sk-***',
}

files = {
    'file': open('test.mp3', 'rb'),
    'model': (None, 'whisper-1'),
    'response_format': (None, 'srt'),
    'language': (None, 'en'),
    'prompt': (None, 'The transcript is about a message \
                sent From Sarah Cranmer To Laila Alizadeh \
                regarding travel arrangements.')
}

response = requests.post('https://api.openai.com/v1/audio/transcriptions', 
                         headers=headers, 
                         files=files)
print(response.text)

peak elm Mar 5, 2023, 7:45 AM

#

I think the types of the nodejs library are messed up for whisper

#

const { Configuration, OpenAIApi } = require("openai");
const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
const resp = await openai.createTranscription(
  fs.createReadStream("audio.mp3"),
  "whisper-1"
);

The example given gives me a typing error:

Argument of type 'ReadStream' is not assignable to parameter of type 'File'.

#

https://github.com/openai/openai-node/issues/77 ah ok it's a known issue

GitHub

[Whisper] cannot call `createTranscription` function from Node.js d...

Describe the bug Cannot call createTranscription function like below: ... const audio = await fs.readFile('path/to/audio.mp4'); // Compile Error at the first argument const response...

rain bloom Mar 5, 2023, 7:59 AM

#

There is a way to stream audio?

pallid slate Mar 5, 2023, 7:59 AM

#

That would be insane

sonic cloud Mar 5, 2023, 8:35 AM

#

guys i had a question and i dont seem to find the answer anywhere , the question is that i used my phone nymber multiple times for other email ids but i dont have those emaild ids anymore and i cant seem to use my phone number anymore so if anyone could help me about this plsssss help me

shut spoke Mar 5, 2023, 10:40 AM

#

potent niche I'm getting this error while trying to use the nodeJS library ```js RequiredErr...

were you able to fix it

shut spoke Mar 5, 2023, 10:57 AM

#

does whisper have a max file size on the server side?

restive beacon Mar 5, 2023, 1:14 PM

#

I'm sending request to openai whisper api it said write model but i writed model in code.
Here is my code:


    fetch("https://api.openai.com/v1/audio/transcriptions", {
      method: "POST",

      body: {
        contentType: "multipart/form-data",

        filename: file,

        model: "whisper-1",
      },

      headers: {
        Authorization: `Bearer ${this.token}`,

        model: "whisper-1",
      },
    })
      .then((res) => res.json())
      .then((json) => {
        if(this.debug === true) console.log(json);

        return json;
      });

unkempt wolf Mar 5, 2023, 1:16 PM

#

How long does it take for whisper to transcribe something for you guys? For me it's taking like 3 seconds (using mp3), which for my purposes is pretty slow (would need it to be halved or so)

sonic mango Mar 5, 2023, 1:36 PM

#

What's the lenght of your audio file ?

rare furnace Mar 5, 2023, 1:39 PM

#

Does it support dual channel transcribing?

restive beacon Mar 5, 2023, 1:43 PM

#

restive beacon I'm sending request to openai whisper api it said write model but i writed model...

.

sonic mango Mar 5, 2023, 1:44 PM

#

unkempt wolf How long does it take for whisper to transcribe something for you guys? For me i...

I'm dealing with large audio file only, not sure it helps for you. for an audio file of 1hour it takes me 10 Min.

sonic mango Mar 5, 2023, 1:47 PM

#

spiral hawk anyone experimented with temperature and beam_size? I have various repetitive ga...

what's the use of "temperatue" and "beam_size" ?

sonic mango Mar 5, 2023, 1:48 PM

#

amber sage You could specify prompt in the following way. ``` import requests headers = {...

you can use Prompt for whisper ? Does it impact the integrity of what you are transcribing ?

#

is the prompt usage only through API or does it work with Whisper natively ?

restive beacon Mar 5, 2023, 1:53 PM

#

restive beacon I'm sending request to openai whisper api it said write model but i writed model...

@inland epoch can you check this?

unkempt wolf Mar 5, 2023, 2:14 PM

#

sonic mango What's the lenght of your audio file ?

It's for a conversation, so anywhere between 1 to 15 seconds

#

The file in question was like 3 sec

unkempt bolt Mar 5, 2023, 4:44 PM

#

new here, so this is just my first reading...to me it looks like your command is trying to use a lib, that is looking for modern cpu features (for example AVX2) that perhaps your cpu/gpu does not support that. Have you installed an nvidia card and it's drivers/libraries? Might check the path mentioned /usr/lib64-nvidia is populated with expected stuff and is part of your path

#

you running that with gpu or tpu accelerator?

potent niche Mar 5, 2023, 5:08 PM

#

shut spoke were you able to fix it

yes thx

restive beacon Mar 5, 2023, 6:20 PM

#

How to fix could not parse multipart form error?

sour stag Mar 5, 2023, 7:05 PM

#

@restive beacon Depends on what you're using to send the request and how you're formatting it.

restive beacon Mar 5, 2023, 7:13 PM

#

sour stag <@852853360612605952> Depends on what you're using to send the request and how y...

Okay

sour stag Mar 5, 2023, 7:21 PM

#

@restive beacon I see you posted code up above, seems like you're using JS and Fetch. Assuming you are pulling file from an HTML Input element, you could try something like this:

var inputElement = document.getElementById("myHTMLInputElement");
var file = inputElement.files[0];
var data = new FormData();
data.append("file", file);

var params = {
    model: "whisper-1",
};

data.append(JSON.stringify(params));

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
      if(this.debug === true) console.log(json);

      return json;
    });

restive beacon Mar 5, 2023, 7:22 PM

#

@sour stag I'm not using html

#

I will create whisper npm package

#

But

sour stag Mar 5, 2023, 7:23 PM

#

Node JS then?

restive beacon Mar 5, 2023, 7:23 PM

#

Yeah

#

#

In now i'm getting this error

#

But idk how to fix

sour stag Mar 5, 2023, 7:27 PM

#

nodejs should be something like this:

const fs = require('fs');
const fetch = require('node-fetch');
const FormData = require('form-data');

var data = new FormData();
data.append("file", fs.createReadStream('example-file.json'));

var params = {
    model: "whisper-1",
};

data.append(JSON.stringify(params));

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
      if(this.debug === true) console.log(json);

      return json;
    });

sour stag Mar 5, 2023, 7:43 PM

#

@restive beacon Actually, node's version of fetch and FormData require some special treatment, try this:

import fs from 'fs';
import fetch from 'node-fetch'
import FormData from 'form-data';

var testfile = fs.createReadStream('./audio_only_spanish.wav');

var data = new FormData();
data.append("file", testfile);
data.append("model", "whisper-1");

fetch("https://api.openai.com/v1/audio/transcriptions", {
    method: "POST",
    body: data,
    headers: {
      Authorization: `Bearer ${this.token}`,
    },
  }).then((res) => res.json())
    .then((json) => {
        console.log(json);
    });

restive beacon Mar 5, 2023, 7:48 PM

#

sour stag nodejs should be something like this: ```javascript const fs = require('fs'); co...

This is commonjs

sour stag Mar 5, 2023, 7:49 PM

#

the second one is with "type": "module", in the package.json

#

@restive beacon I just tried the second one I posted with my own API-Key and I got a proper response:

{
  text: 'por tenernos la confianza de venir hasta acá, de invertir en su viaje del español. Nosotros aprendimos de ustedes y creo que ustedes aprendieron un poquito de español de nuestra ciudad y de Querétaro. Así es, y bueno, Querétaro es una ciudad muy bonita. De hecho, vamos a hablar un poquito del estado, no sólo de la ciudad en este episodio, pero la verdad es que hay muchos otros lugares en México que vale la pena conocer. ¡Gracias por ver el video!'
}

restive beacon Mar 5, 2023, 7:59 PM

#

sour stag the second one is with `"type": "module",` in the package.json

I know

#

And

#

Thanks so much 🙏

unkempt bolt Mar 5, 2023, 8:21 PM

#

hmm, sorry sergio, I got same initial errors about tensorflow libs, in colab with gpu, but they were not blocking me. the invalid argument about ae.mp3 and the low confidence that it was an mp3, and the failed to read the expected frame size says ae.mp3 might be corrupt.

spiral hawk Mar 5, 2023, 9:12 PM

#

sonic mango what's the use of "temperatue" and "beam_size" ?

It's for setting prediction window, I want to narrow it a little bit to have more frequent phrase transcribed

north relic Mar 6, 2023, 5:49 AM

#

1

pearl schooner Mar 6, 2023, 9:26 AM

#

isn't this just midjourney?

#

also, why advertise in a voice ai channel

normal ravine Mar 6, 2023, 4:16 PM

#

Do you guys know if it's possible to use the "Whisper model" (openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt) as an orthographic corrector?

void egret Mar 6, 2023, 5:14 PM

#

normal ravine Do you guys know if it's possible to use the "Whisper model" (openaipublic.azure...

What exactly do you mean by orthographic corrector? It won't help with sentence structure but could help with spelling? You'll probably want to use medium or large though

heady star Mar 6, 2023, 9:01 PM

#

Folks, is there a way to just detect the language with Whisper API?

warm nymph Mar 6, 2023, 9:17 PM

#

There was about 50 seconds of silence at the ending in my recording, the only audio was "Do you like computers" and this was what it transcribed. "Do you like to use computers? Well, yes. I'd like to show you... the product. First of all it offers scans easy. It scans to a specified area. First of all, it is physically novel. There is no woman. If you had the opportunity, you could Project a virtual church scholar. or actually a university student... to engage in friends with one another... "

dense mountain Mar 6, 2023, 11:25 PM

#

Is it possible to pass url to whisper so that it download the audio there instead of uploading? My workflow is that someone upload to cloud storage. Downloading from cloud storage and then uploading it to OpenAI feels like a bandwidth wasted. 🤔

void egret Mar 6, 2023, 11:29 PM

#

dense mountain Is it possible to pass url to whisper so that it download the audio there instea...

You have to pass the file in, there's no way to pass in a url.

noble crystal Mar 7, 2023, 2:14 AM

#

Anyone have an idea when ChatGPT will be aware of whisper api so it can provide help with it ?

void egret Mar 7, 2023, 2:17 AM

#

Likely not for a while. There's been zero mention of them updating the knowledge base so far

tepid escarp Mar 7, 2023, 3:58 AM

#

hii

fluid locust Mar 7, 2023, 1:52 PM

#

Hi all. i found that the Whisper API doesn’t work properly when returning anything other than JSON, ie. srt, vtt. An OpenAI.error is returned with a HTTP code of 200 (the correct output is actually returned, just in the error). Seems like the API isn’t able to handle the other accepted formats

#

Anyone find similar problem?

patent shale Mar 7, 2023, 3:08 PM

#

I have not found that problem with the Whisper API. It returned to be both SRT and VTT correctly during testing last week Friday.

#

Confirmed that I am seeing the same today.

sonic mango Mar 7, 2023, 4:33 PM

#

Is there some kind of roadmap for whisper functionality ?

#

functionalities*

patent shale Mar 7, 2023, 6:09 PM

#

sonic mango functionalities*

I haven't seen a roadmap yet, but I would assume they'd expand the number of languages that can be transcribed at some point.

hoary surge Mar 7, 2023, 8:16 PM

#

playing around with the whisper api that openai released a week or so ago. Wondering how I can get timestamps working as in the selfhosted Whisper?

merry junco Mar 7, 2023, 8:47 PM

#

do anyone know why recognize_whisper function in speech_recognition of python lib not working, seem like it not generating any output?

flat horizon Mar 7, 2023, 9:25 PM

#

Can the Whisper API handle video input, or only audio?

patent shale Mar 7, 2023, 9:26 PM

#

audio only

#

File types supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm

flat horizon Mar 7, 2023, 9:34 PM

#

Thanks. Maybe I can use Transloadit's audio extraction API in the middle.

toxic ingot Mar 7, 2023, 9:35 PM

#

Hi, Do you know if there is any company that offers the services of a chatbot but with integrated chatgpt technology?

Thanks

flat horizon Mar 7, 2023, 9:39 PM

#

What sort of turnaround times per minute are people getting with the Whisper API?

toxic ingot Mar 7, 2023, 9:39 PM

#

or some way that I can use chatgpt technology to train it with my website and other pdf documents, and integrate it into my website like search engine

upbeat totem Mar 7, 2023, 10:43 PM

#

Hi guys, a question, how would you make whisper create a vtt file if said file weighs more than 25MB? miduNotLikeThis

void egret Mar 8, 2023, 12:22 AM

#

upbeat totem Hi guys, a question, how would you make whisper create a vtt file if said file w...

You'll need to split the file into 25mb chunks and then adjust the times in the response based on the duration of the previous ones

upbeat totem Mar 8, 2023, 12:48 AM

#

void egret You'll need to split the file into 25mb chunks and then adjust the times in the ...

so, no prompt usage ?

void egret Mar 8, 2023, 12:50 AM

#

I haven't tried messing around with prompt but i'm pretty sure it's won't effect the format style. It should effect the actual text

upbeat totem Mar 8, 2023, 12:51 AM

#

i see, ty

fluid locust Mar 8, 2023, 5:37 AM

#

Anyone tried combining whisper api and pyannote library to get speaker diarization?

autumn bolt Mar 8, 2023, 5:36 PM

#

Anyone following the Windows 11 voice command preview? Pretty cool the way it works, the show numbers/grid thing is really nice, I imagine you'll be able to relabel these numbers with some text once you know the few you're looking for, it would be great if it could just monitor what you're doing for the day and give you the voice commands, then let you tailor the keyword mapping etc, it could even calculate the amount of time it took you to complete certain tasks and suggest where voice commands would have been faster.

Should be able to build something similar with Whisper... also handy if you want to talk to ChatGPT 😀

final mica Mar 8, 2023, 8:42 PM

#

Has anyone else noticed Whisper get stuck in loops when run on GPU? I am running it on the gpu on my local system, and it gives basically perfect results when run on cpu, and on gpu I'm seeing the expected device system usage and compute-time speedups, but the output is filled with stuff like this, as one example:
[03:33.500 --> 03:34.500] Oh man, no way.
[03:34.500 --> 03:35.500] Oh man, no way.
[03:35.500 --> 03:36.500] Oh man, no way.
[03:36.500 --> 03:37.500] Oh man, no way.
[03:37.500 --> 03:38.500] Oh man, no way.
(x12)
The above text does actually appear in the audio, but only once and about a minute earlier in the clip -- the above timestamps have been pushed later because of the correctly-transcribed text occurred in the interim.

dim viper Mar 9, 2023, 12:32 AM

#

fluid locust Anyone tried combining whisper api and pyannote library to get speaker diarizati...

whisper api doesn't have the ability to do speach diarization
instead u can use something like pyannote.audio and then pipe the broken down chunks to whisper

opaque oak Mar 9, 2023, 2:10 AM

#

Is Whisper able to translate from English to other languages? What would be the best way to turn existing English subtitles into another language? Would GPT be better?

void egret Mar 9, 2023, 2:43 AM

#

opaque oak Is Whisper able to translate from English to other languages? What would be the ...

Pretty sure whisper can do other languages to English, not the other way around. If you already have subtitles then you wouldn't need whisper as it just transcribes. You'd probably want to use gpt or a translations service like deepl or google translate

opaque oak Mar 9, 2023, 2:58 AM

#

void egret Pretty sure whisper can do other languages to English, not the other way around....

got it, thank you! I've found Google Translate to not really be the best in terms of accuracy, so maybe I'll give GPT a shot. Thanks!

gloomy ice Mar 9, 2023, 3:20 AM

#

Hi all, does the Whisper API support word level timestamps? If not, is it on the roadmap?

round osprey Mar 9, 2023, 6:38 AM

#

is whisper as good as azure STT combined with azure NR for avg quality audio ?

wooden torrent Mar 9, 2023, 11:09 AM

#

JOE

lean girder Mar 9, 2023, 12:15 PM

#

I've been struggling with this for the last few hours. This keeps returning "message": "you must provide a model parameter". Anyone know where I'm going wrong?

$url = 'https://api.openai.com/v1/audio/transcriptions';

$file_path = $_FILES['file']['tmp_name'];
$file_name = basename($_FILES['file']['name']);
$file_type = mime_content_type($file_path);
$file_data = file_get_contents($file_path);

$body = array(
'model' => 'whisper-1',
'file' => array(
'name' => $file_name,
'type' => $file_type,
'bits' => $file_data,
),
);

$headers = array(
'Authorization' => 'Bearer ' . $OPENAI_API_KEY,
);

$request_args = array(
'method' => 'POST',
'headers' => $headers,
'body' => $body,
'timeout' => 0,
);

$response = wp_remote_post($url, $request_args);

summer void Mar 9, 2023, 12:22 PM

#

Any clue what might be going wrong when i try to use the createTranscription function?

Whenever I use the createTranscription function i get a "TypeError: localVarFormParams.getHeaders is not a function".

I'm in a react native project, recording the microphone, so the code to trigger whisper is simply:

  const recordingURI = recording.getURI()
  const response = await fetch(recordingURI)

  openai
  .createTranslation(response.body, 'whisper-1')
  .then((response) => {
    //do something here
  });```


What am I doing wrong here?

final wolf Mar 9, 2023, 1:34 PM

#

ChatGPT has helped me ok with Whisper. I was able to create all the scripts I wanted today
It was a bit wrestling. But with python-openai lib it is really simple. So apart from the request itself, chatgpt can help you with everything else

final wolf Mar 9, 2023, 1:36 PM

#

summer void Any clue what might be going wrong when i try to use the createTranscription fun...

What do you want to do with the whisper reponse? I mean, like where do you want it to go?

stiff leaf Mar 9, 2023, 4:14 PM

#

is there any way to explicitly tell the whisper API the language spoken in the audio, rather than rely on it being auto-detected?

void egret Mar 9, 2023, 4:24 PM

#

Yup it's a param you can supply https://platform.openai.com/docs/api-reference/audio/create#audio/create-language

stiff leaf Mar 9, 2023, 4:26 PM

#

ah nice, thanks @void egret! i kept looking for something that looked like a proper API reference page and couldn't find it on https://platform.openai.com/docs/guides/speech-to-text - guess i didn't look hard enough. cheers!

OpenAI API

An API for accessing new AI models developed by OpenAI

fair sierra Mar 9, 2023, 4:54 PM

#

what is whisper?

void egret Mar 9, 2023, 4:57 PM

#

Whisper is a speech to text model developed by openai. Given audio it generates a transcription of it

round osprey Mar 9, 2023, 5:25 PM

#

round osprey is whisper as good as azure STT combined with azure NR for avg quality audio ?

I’ll reiterate

#

Did anyone test whisper against for instance azure noise reduction + STT?

#

Calibrated audio likely boost STT performance

void egret Mar 9, 2023, 6:05 PM

#

I've exclusively used whisper, you might find some performance tests on the discussions tab of the repi though

boreal hill Mar 10, 2023, 12:24 AM

#

anyone know how to solve this error? error': {'message': 'Maximum content size limit (26214400) exceeded (68638790 bytes read)', 'type': 'server_error', 'param': None, 'code': None

void egret Mar 10, 2023, 12:33 AM

#

How large was the file?

fluid locust Mar 10, 2023, 12:00 PM

#

dim viper whisper api doesn't have the ability to do speach diarization instead u can use...

tried this before, haven't been able to get it right, the speaker just got messed up when I combined both

#

anyone knows how to circumvent the 25 MB file size limit?

subtle radish Mar 10, 2023, 12:48 PM

#

fluid locust anyone knows how to circumvent the 25 MB file size limit?

Trying to work it out right now as well by splitting the file into segments of 24mb. Haven’t figured it out yet.

upbeat totem Mar 10, 2023, 1:13 PM

#

for some reason the transcriptions in vtt arrive truncated in my heroku deploy but perfect on my local pc, any idea what it could be?

solemn dirge Mar 10, 2023, 1:18 PM

#

when I code a English file into Chinese, here is the code I entered "!whisper "test.mp3" --language Chinese", but the result is only few Chinese others are English, is there anyone can help, BTW I don't know python at all

fluid locust Mar 10, 2023, 1:20 PM

#

subtle radish Trying to work it out right now as well by splitting the file into segments of 2...

just got it worked, split chunks into 25 seconds, haven't really looked at accuracy, but it works on file larger than 25 MB

subtle radish Mar 10, 2023, 1:24 PM

#

fluid locust just got it worked, split chunks into 25 seconds, haven't really looked at accur...

Do you mind sharing code?

fluid locust Mar 10, 2023, 1:25 PM

#

subtle radish Do you mind sharing code?

def transcribe():
file = request.files['file']
audio = file.read()

try:
    audio_file = BytesIO(audio)
    audio_segment = AudioSegment.from_file(audio_file, format="mp3")

    # Split the audio file into chunks of 25 seconds
    chunks = audio_segment[::25000]

    # Transcribe each chunk and concatenate the results
    results = []
    for chunk in chunks:
        with BytesIO() as chunk_file:
            chunk.export(chunk_file, format="mp3")
            chunk_file.seek(0)
            chunk_file.name = "audio.mp3"
            transcription = openai.Audio.transcribe("whisper-1", chunk_file)
            text = transcription['text']
            results.append(text)

#

shoutout gpt3.5 turbo for that code

upbeat totem Mar 10, 2023, 1:32 PM

#

upbeat totem for some reason the transcriptions in vtt arrive truncated in my heroku deploy b...

im doing multiple request at same time i think a limitation truncated the response but why it doesn't happens in my local pc ? 🤔

dim viper Mar 10, 2023, 8:35 PM

#

fluid locust tried this before, haven't been able to get it right, the speaker just got messe...

search on google github Majdoddin nlp (cant post link)

its a project built with pyannote audio and whisper might be able to help you figure out whats going on with yours by digging in theirs

stone chasm Mar 11, 2023, 1:54 AM

#

hey guys. who do i speak to for feedback on whisper?

stone chasm Mar 11, 2023, 2:43 AM

#

okay. what is this? i got muted for reporting a bug with the API?!?!? this is absolute bullocks! LOL

fluid locust Mar 11, 2023, 2:55 AM

#

Anyone experience whisper hallucinating on empty sections? is there anyway to prevent this? maybe with vad filter?

stone chasm Mar 11, 2023, 2:57 AM

#

fluid locust Anyone experience whisper hallucinating on empty sections? is there anyway to pr...

i literally just posted the same question, including all the hallucinated responses... but i got automodded.... lol... because one of the the responses was a 'high risk word'

fluid locust Mar 11, 2023, 2:57 AM

#

sorry to hear that
you got any headway on that?

stone chasm Mar 11, 2023, 2:59 AM

#

i haven't. but most of my hallucinations are due to short audio segments. i'm thinking of either combining them into the longer segments or just drop them altoghter

#

do you get things like www dot globalonenessproject dot org ? i got lots of those...

stone chasm Mar 11, 2023, 4:13 AM

#

can i please create a thread to share all the hallucinations? they are so funny... i had another one here: "Produced & Uploaded by Houthi Movies"

#

and another.... "For more information on Geography Now, visit geography(.)nsw(.)ca"

fiery warren Mar 11, 2023, 8:47 AM

#

Anyone know if there is only one or more whisper models available via api?

amber dune Mar 11, 2023, 9:35 AM

#

Afaik only whisper-1 is available rn

peak vergeBOT Mar 11, 2023, 10:39 AM

#

Пожалуйста, говорите по-английски.

@stone chasm, your message was removed. We do not currently have capacity to support other languages.

stone chasm Mar 11, 2023, 10:41 AM

#

come on mods... who can i chat to for feedback? lol

fluid locust Mar 11, 2023, 4:02 PM

#

stone chasm i haven't. but most of my hallucinations are due to short audio segments. i'm th...

I get things like "subs by broth3rmax"

uncut cipher Mar 11, 2023, 5:46 PM

#

Hi, does anyone knows a project/algorithm that could reliably detect when someone finished speaking using real time transcription with whisper?
I want to make an API call as soon as the user finished speaking/at the end of his sentence

dusty willow Mar 11, 2023, 5:48 PM

#

uncut cipher Hi, does anyone knows a project/algorithm that could reliably detect when someon...

no

#

But I am leaving I have had to change too many things And then they want my phone number which I'm not giving so I'm leaving this server id dumb

stone pine Mar 11, 2023, 5:54 PM

#

Does anyone have language detection examples to use?

autumn bolt Mar 11, 2023, 6:33 PM

#

can somebody help me with the whisper thing I want to to take audio from the microphone and turn it into text for python btw

stone chasm Mar 11, 2023, 7:53 PM

#

uncut cipher Hi, does anyone knows a project/algorithm that could reliably detect when someon...

you want to google diarization

#

there are a few libraries out there. i've tried a few and they work pretty well, but my computer do not have capacity to process large files. it took 1hr for pyannote to diarize 20mins of audio!!! so i now diaritize using google api remotely and then pass the segments to whisper API for processing.
one thing i noted tho, the whisper API (unlike the whisper library which you process locally) tend to make things up when the audio is not very clear, or when the audio segment is too short.

stone chasm Mar 11, 2023, 8:16 PM

#

fluid locust anyone knows how to circumvent the 25 MB file size limit?

you can also convert the files to mp3, then if you're performing diarization, a longer audio file with complete sentences will give you better results

hearty fable Mar 11, 2023, 9:13 PM

#

I have a MP3 player, maybe I can help??

stone chasm Mar 11, 2023, 9:49 PM

#

insyaallah brother

stone chasm Mar 11, 2023, 11:10 PM

#

"please like and subscribe.... bye bye bye bye bye......." this is so bad.....
actual transcript in the video / audio during the 17.3 seconds were:
[speaker 1] okay, let's have a look. share screen.
[speaker 2] it's not popping up for me.
[speaker 1] you haven't got that?
[speaker2] nope
[speaker 1] okay let's try again.
[speaker 2] and if anybody's listening, audio to the recording afterwards i will walk and talk you through this.
[speaker 1] have you got it now?
[speaker 2] okay, we've got it, we've got it, we've got it.

earnest horizon Mar 12, 2023, 12:22 AM

#

I'm trying to hit the Whisper API from javascript. Is the correct order of endpoints to hit the files endpoint first to upload my audio file, then to pass the uploaded path to the transcription endpoint?

If so, what value should I pass for the purpose parameter to the files endpoint? The examples for the files endpoint all have purpose="fine-tune in them, but that doesn't sound right for the purpose of transcription.

void egret Mar 12, 2023, 12:53 AM

#

earnest horizon I'm trying to hit the Whisper API from javascript. Is the correct order of endpo...

You shouldn't be using files at all

fair coyote Mar 12, 2023, 4:12 AM

#

earnest horizon I'm trying to hit the Whisper API from javascript. Is the correct order of endpo...

Yeah I think you just send the file directly from your computer to the whisper endpoint.

slow urchin Mar 12, 2023, 7:31 AM

#

Can you hear me

gentle sinew Mar 12, 2023, 1:32 PM

#

Hello, I'm a novice with a few good computer skills and I've started a project to dub an English video into French. It is a mentorship accessible on YouTube.
I used Whisper to transcribe the 3 hours long video.
I had a few mistakes, like for example, "How" which became "So" according to the model.
The voice is that of a man who speaks American with a strong accent. I have the impression that the "tiny.en" model gave the best results. I didn't try to tune the settings, but do you think it could be improved for American?
The next part of the project will be the translation, either ChatGpt or Deepl I don't know yet, and I wanted to know if you had any advice on which tools to use.
Text to speech, time-stamped? to then integrate them to the video. Do you think that it is possible to automate it with the time-stamping.
I hope I'm not going off topic. Thanks in advance, Rom

gentle sinew Mar 12, 2023, 2:14 PM

#

My project is to make accessible to people who have difficulty with English, a series of videos with always the same person, the same voice.
Is it possible to train Whisper on a particular voice?
For example, to translate perfectly several passages or a complete video and to provide it to Whisper as an example?

final wolf Mar 12, 2023, 3:33 PM

#

Does anyone have a github repo recommendation for generating subtitles?
I wonder how you get the lines to be properly timestamped. Is that something you do during the transcription, or afterwards?

Both seem complicated

gentle sinew Mar 12, 2023, 3:42 PM

#

whisper do it perfectly

#

Vtt or srt or json

final wolf Mar 12, 2023, 4:32 PM

#

But not the whisper API, I think. I'm not sure if my laptop can run the whisper locally

void egret Mar 12, 2023, 6:17 PM

#

final wolf But not the whisper API, I think. I'm not sure if my laptop can run the whisper ...

Specify the response_format in the request to the api, it supports json, text, srt, verbose_json, or vtt

bold cipher Mar 12, 2023, 8:10 PM

#

Hello!!!

#

Timestamps how ?????????????

bold cipher Mar 12, 2023, 8:11 PM

#

final wolf Does anyone have a github repo recommendation for generating subtitles? I wonde...

lol just saw you were asking the same

gentle sinew Mar 12, 2023, 10:18 PM

#

final wolf But not the whisper API, I think. I'm not sure if my laptop can run the whisper ...

Run it on Google colab for example

earnest horizon Mar 13, 2023, 12:50 AM

#

fair coyote Yeah I think you just send the file directly from your computer to the whisper e...

Ahh so just send the pure audio data, without needing any concept of something that exists on disk. Ok! Thanks

autumn bolt Mar 13, 2023, 1:07 AM

#

if any1 has good experience in tkinter module pls hit me up, its a basic help ( im trying to build a GUI for tic tac toe)

fair coyote Mar 13, 2023, 1:34 AM

#

earnest horizon Ahh so just send the pure audio data, without needing any concept of something t...

Idk if you're using curl or an SDK but here's the endpoint I think:

curl https://api.openai.com/v1/audio/transcriptions \
  -X POST \
  -H 'Authorization: Bearer TOKEN' \
  -H 'Content-Type: multipart/form-data' \
  -F file=@/path/to/file/audio.mp3 \
  -F model=whisper-1

From https://platform.openai.com/docs/api-reference/audio/create

OpenAI API

An API for accessing new AI models developed by OpenAI

earnest horizon Mar 13, 2023, 2:39 AM

#

Was hoping someone could give some insight into an error I keep getting:

Whisper data from response:  {"error":{"message":"Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']","type":"invalid_request_error","param":null,"code":null}}
iPhone:
topicsResult:  null

void egret Mar 13, 2023, 2:46 AM

#

Are you uploading data that was in one of those formats?

earnest horizon Mar 13, 2023, 2:49 AM

#

Sorry, was having trouble posting the rest - this is a very strict server!

#

Here is the full description. I think I am uploading an mp4. My blob's type is "video/mp4"

outer summit Mar 13, 2023, 4:39 AM

#

Hi is it possible to do real time transcription using Whisper opensource model?

dense pulsar Mar 13, 2023, 4:43 AM

#

I heard somewhere whisper only natively supports 30 second or less audio clips, meaning it can provide the best accuracy as long as your clip is around that length. Is this 100% true? I'm trying to transcribe lengthy audio clips (some being hours in length), and I'd rather split them into smaller clips larger than 30s each to make the most of the 50 requests per minute limit. Would splitting the clips into 60 or 120 second clips (or longer) be fine to do as well? Or would it be less accurate than 30s for each clip?

gentle sinew Mar 13, 2023, 9:41 AM

#

search a project name : openai_whisper_stt
on a site called huggingface*co

#

you can search in a database with many project ideas

#

you can also search on GCollab or jupyter

gentle sinew Mar 13, 2023, 9:43 AM

#

dense pulsar I heard somewhere whisper only natively supports 30 second or less audio clips, ...

good question, i transcript a 3h long video and i find the result good

keen gulch Mar 13, 2023, 4:49 PM

#

whispering

#

whisper is honestly cool

final wolf Mar 13, 2023, 6:15 PM

#

void egret Specify the response_format in the request to the api, it supports `json, text, ...

These are from the openai-python library?

#

I wonder what the verbose json includes. I'll have to try it out I guess

void egret Mar 13, 2023, 6:28 PM

#

Yup, they're specified here in the docs https://platform.openai.com/docs/api-reference/audio/create#audio/create-response_format

remote jasper Mar 13, 2023, 10:13 PM

#

I've got a collection of phone calls made to 911 in the middle of a flood from almost 30 years ago, and I'd like to use Whisper to transcribe them. The audio quality is generally poor; they were originally recorded on giant clunky reel-to-reel tape systems. Whisper seems to do a generally decent job on them -- the medium.en model works best based on my testing.

I did a test batch of three calls using the following PowerShell command:

ls -file | % { & Write-Host "Now working on $_" -ForegroundColor green; whisper $_.FullName --model medium.en --language en }

It did the first call fine, then got stuck on the second one. After it spent two hours transcribing a call that only lasted 90 seconds, I canceled the job. That particular call file ends in about seven seconds of silence. My theory is that Whisper is failing to detect that, and gets stuck in a loop analyzing the same period of silence over and over looking for speech that isn't there. There are probably quite a few calls like that out of the roughly 2,000 calls in the collection.

So my question is: how does the --no_speech_threshold parameter work? My googling has led me to Github tickets discussing things, but so far none of them have helped. It defaults to 0.6. Should the value be higher or lower in order to make Whisper less sensitive to chunks of silence?

tacit hill Mar 13, 2023, 11:52 PM

#

Did Microsoft bought the OpenAI?

stone chasm Mar 14, 2023, 1:08 AM

#

tacit hill Did Microsoft bought the OpenAI?

https://openai.com/blog/openai-and-microsoft-extend-partnership

OpenAI and Microsoft extend partnership

We’re happy to announce that OpenAI and Microsoft are extending our partnership.

stone chasm Mar 14, 2023, 1:09 AM

#

remote jasper I've got a collection of phone calls made to 911 in the middle of a flood from a...

I think your answer is to perform preprocessing. Ie getting rid of the chunks of silent audio is easier than to mess around with whisper’s settings.

noble snow Mar 14, 2023, 1:51 AM

#

does removing the silent audio make whisper cheaper / faster to run?

noble snow Mar 14, 2023, 1:53 AM

#

earnest horizon Here is the full description. I think I am uploading an mp4. My blob's type is "...

I'm not sure where your pulling the video/ audio from but you might need to turn the data into a stream buffer than turn that into a blob

#

This is how you do it via a S3 link

#

// Set the parameters for the S3 getObject operation
const params = {
Bucket: bucketName,
Key: fileKey,
};

// Call the S3 getObject operation with the specified parameters
const getObjectOutput = await s3.send(new GetObjectCommand(params));
//@ts-ignore
const body = getObjectOutput.Body;

// Extract the Readable stream from the SdkStream objecta
// @ts-ignore
const readableStream = Readable.from(body);

// Convert the stream data to a buffer
const chunks = [];
for await (const chunk of readableStream) {
chunks.push(chunk);
}
const buffer = Buffer.concat(chunks);

// Create a new Blob object from the buffer
return new Blob([buffer], { type: "application/octet-stream" });

#

const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

#

const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

#

AND than transcribe it const transcribe = async (fileBlob: Blob, fileKey: string) => {
// append stream with a file
// getObjectResult.Body is a ReadableStream
const formData = new FormData();

formData.append("file", fileBlob, fileKey);

formData.append("model", "whisper-1");

const response = await fetch(
"https://api.openai.com/v1/audio/transcriptions",
{
method: "POST",
body: formData,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY},
},
}
);
return await response.json();
};

stone chasm Mar 14, 2023, 3:12 AM

#

noble snow does removing the silent audio make whisper cheaper / faster to run?

I would say the accuracy goes up. Whisper often just make things up when there is no audio.

noble snow Mar 14, 2023, 3:14 AM

#

stone chasm I would say the accuracy goes up. Whisper often just make things up when there i...

Ya it returned me back Russian looking text when it was a audio with a ton of blank noise lol

slow urchin Mar 14, 2023, 10:16 AM

#

Where my people at?

gentle sinew Mar 14, 2023, 12:17 PM

#

stone chasm I think your answer is to perform preprocessing. Ie getting rid of the chunks of...

I had the same idea

gentle sinew Mar 14, 2023, 12:24 PM

#

gentle sinew Hello, I'm a novice with a few good computer skills and I've started a project t...

No one has any idea about the question I asked above. I found this python script but couldn't get it to work. it's on github
It's on GitHub :
End-to-end-Youtube-audio-translation-aws-serverless

tepid plover Mar 14, 2023, 2:02 PM

#

Hi, i’m trying to work with the whisper api for the first time. i wanted to know if there’s any way to get the transcription of a youtube video without downloading its audio.
i tried to use YouTubeDL to extract the info but i’m stuck at the file parameter

small juniper Mar 14, 2023, 2:09 PM

#

stone chasm I would say the accuracy goes up. Whisper often just make things up when there i...

Can we fine-tune it to detect silence?

remote jasper Mar 14, 2023, 3:25 PM

#

I was afraid of that. Manually checking all 2000+ calls for areas of silence is probably not practical. I think I'll write a Python script to handle the batch job and build in a timer on each job so I can kill the process and move on to the next one if something takes too long.

remote jasper Mar 14, 2023, 6:39 PM

#

Actually I did some more testing, and setting --no_speech_threshold 0.25 on the command seems to have made it work fine. I think I'll try running the whole batch with that. I'll just have to keep an eye on it, and if it gets stuck I'll move the finished files to some other folder, move the problem file to quarantine until I can go through and remove the silence, and then restart the loop on the remaining files.

noble snow Mar 14, 2023, 6:51 PM

#

Anybody keep getting file not Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

#

the video file I am sending is indeed webm and. this was working yesterday

dense pulsar Mar 14, 2023, 7:11 PM

#

anyone else getting invalid responses and error statuses when making a post request to whisper?

#

got status code 100 before, now its always 0

stone chasm Mar 14, 2023, 7:12 PM

#

tepid plover Hi, i’m trying to work with the whisper api for the first time. i wanted to know...

How do you transcribe the audio without downloading the audio?

#

Do you mean video?

#

You can use the ytdl library to download only the audio stream from YT. Then pass that audio stream to YT to transcribe it.

#

If you really don’t want to use the audio, then maybe try to use ytdl to download the subtitles? I haven’t tried but I think that’s a function.

autumn bolt Mar 14, 2023, 10:57 PM

#

are there any more whisper apps or programs that can aid in language learning?

#

such as the one mentioned on the open AI website?

#

also, are there any extensions to have whisper integrated into chatgpt?

modest yacht Mar 15, 2023, 12:48 AM

#

how to make navbar aesthetic

carmine tree Mar 15, 2023, 1:37 PM

#

Man whisper combined with other tools really makes transcription embarrassingly easy - I wanted to learn about a popular Java library that somehow had very few videos so I took a video in Hebrew and ran it through Whisper
22 lines of code, including all the setup, and I have an mp4 and also something that's decently generic for future similar needs
(the subtitle utility in Whisper changed a bit so I did have to lock in on an older version of the whisper python library so I could use some code I had already used in the past)

high mountain Mar 15, 2023, 1:42 PM

#

is there whisper for personal usage (not api) ?

void egret Mar 15, 2023, 1:44 PM

#

high mountain is there whisper for personal usage (not api) ?

Yup, it's open source so you can run it locally https://github.com/openai/whisper

high mountain Mar 15, 2023, 1:45 PM

#

But is there any web version so people with no coding skills use it as well?

carmine tree Mar 15, 2023, 1:56 PM

#

there are lots of hosted versions to do various different things on hugging face, but your mileage may vary and for anything of significant length they will likely time out

#

what are you trying to do?

high mountain Mar 15, 2023, 2:00 PM

#

I want to send a mp3 file and want whisper to transcribe it.

#

Ok I found this model on the website called repliacate. The best thing is that I wasn't even asked for an API key 🙂

small juniper Mar 15, 2023, 2:23 PM

#

high mountain is there whisper for personal usage (not api) ?

You can import Whisper into python from HuggingFace's transformers package.

high mountain Mar 15, 2023, 2:25 PM

#

I mentioned that I am not very good at programming

carmine tree Mar 15, 2023, 2:31 PM

#

if it's just a single mp3 you're probably best off just using a service and you can probably do it for free with a trial although a lot of these AI transcription services are probably using outdated technology at this point (I may be entirely wrong on this) so your mileage may vary

#

otherwise I could probably help you with a google collab notebook that you could run for free, but then again - you probably shouldn't be running code from random people on the internet if you don't know what you're doing 😆

main salmon Mar 15, 2023, 7:51 PM

#

Hi everyone. Just want to know how to improve the inferencing speed of Whisper if I run in my local PC (not using the Whisper API service). The Whisper model that was released in September 2022.

severe spruce Mar 15, 2023, 8:14 PM

#

high mountain Ok I found this model on the website called repliacate. The best thing is that I...

Yeah replicate is probably your best bet, first 30 minutes of compute free (~2hrs of audio content), then ~$0.30 per hour of audio (on the large model; smaller will be cheaper but worse, esp for non-English

gentle sinew Mar 15, 2023, 8:58 PM

#

high mountain I mentioned that I am not very good at programming

You can use it in Google colab, I know a script if you need it

#

It's on GitHub :
Youtube Videos Transcription with OpenAI's Whisper

remote jasper Mar 15, 2023, 9:00 PM

#

In the first 24 hours of execution, Whisper transcribed 204 phone calls for me, running on a local machine. 1,744 to go! I estimate another eight and a half days to finish the entire collection.

gentle sinew Mar 15, 2023, 9:05 PM

#

gentle sinew It's on GitHub : Youtube Videos Transcription with OpenAI's Whisper

gentle sinew Mar 15, 2023, 9:12 PM

#

tepid plover Hi, i’m trying to work with the whisper api for the first time. i wanted to know...

On GitHub, search and launch on gcolab

#

And read the code.
Or install on your own, and launch, it's few line in a shell

wintry cobalt Mar 15, 2023, 9:37 PM

#

is there any example project for real-time transcription from mic using whisper and pyaudio?

rare raft Mar 16, 2023, 12:28 AM

#

remote jasper In the first 24 hours of execution, Whisper transcribed 204 phone calls for me, ...

specs? i wanna host my own whisper model on a vps and wondering what i would need. i will only be using english

unborn pecan Mar 16, 2023, 5:34 AM

#

cloud you help me to learn python

wise sparrow Mar 16, 2023, 7:17 AM

#

Does the openai whisper api host the audio

#

or we have to host it first and then transcribe

dry harbor Mar 16, 2023, 9:08 AM

#

Can I ask it to code a website for me from here?

trim sinew Mar 16, 2023, 9:40 AM

#

Does anyone know if there is documentation available for the 'prompt' flag and how to use it? I need Whisper to create more sequences, preferably after each comma. Currently, Whisper can put 2 sentences into one sequence. There is also an issue with synchronizing the end time of one sequence with the start time of the next one. Is it possible to generate, for example, a JSON file with the start time of each word?

stone chasm Mar 16, 2023, 11:27 AM

#

trim sinew Does anyone know if there is documentation available for the 'prompt' flag and h...

https://platform.openai.com/docs/guides/speech-to-text

OpenAI API

An API for accessing new AI models developed by OpenAI

mild basin Mar 16, 2023, 11:39 AM

#

a few more examples on how to use the prompt most effectively (especially for assisting translation) would be nice. I'm doing some testing and will post if I find anything useful

mild basin Mar 16, 2023, 12:16 PM

#

According to this graphic from the Github page, there is timing data being used. No idea how to access it and may need to be run locally and modified

mild basin Mar 16, 2023, 12:41 PM

#

trim sinew Does anyone know if there is documentation available for the 'prompt' flag and h...

If you set resopnse_format to verbose_json in your requests, it will return timing information like this. See the API reference for more info: https://platform.openai.com/docs/api-reference/audio

OpenAI API

An API for accessing new AI models developed by OpenAI

trim sinew Mar 16, 2023, 12:46 PM

#

mild basin If you set resopnse_format to verbose_json in your requests, it will return timi...

Thank you for your response. I have read those details carefully and I am using that JSON verbose format exactly. However, the problem is that some sentences have 20-25 words and should be split into at least 4-5 parts. These captions are unreadable if they cover half of the video player's screen. If it were possible to obtain the start time for each word, I would easily split them myself.

mild basin Mar 16, 2023, 12:50 PM

#

trim sinew Thank you for your response. I have read those details carefully and I am using ...

You could check the output when using the srt and vtt response formats. I haven't looked at them yet but they might be different
Another thing you could try is breaking your audio into smaller few-second chunks to restrict how long they are

trim sinew Mar 16, 2023, 12:53 PM

#

mild basin You could check the output when using the srt and vtt response formats. I haven'...

I have checked srt and vtt formats and encountered the same problem. Dividing the audio into several seconds-long chunks is not a good solution. How can I find out where the spoken words end in, let's say, a one-hour long audio file so that the splitting does not cut off any words? Perhaps the new API will bring some sensible solutions.

tropic hare Mar 16, 2023, 2:08 PM

#

Has anyone made an easy to use version of Whisper yet? And how would you say it compares to machine translating that was done earlier for media?

#

Also, how would i ever start learning all this stuff, i don't think standard cs degree comes even close to this, and I'm 18 now, by the time i get through all the procedural learning, the field will already have become saturated like the rest of innovations

patent shale Mar 16, 2023, 2:45 PM

#

wise sparrow Does the openai whisper api host the audio

No, you send it to the API, its response is a MP3. That file does not persist unless you save it to somewhere else.

warped sedge Mar 16, 2023, 6:19 PM

#

is there an ability to differentiate between people?

wise sparrow Mar 16, 2023, 6:21 PM

#

patent shale No, you send it to the API, its response is a MP3. That file does not persist un...

Oh so you dont need to upload to a storage first?

#

Then send the link to api

void egret Mar 16, 2023, 6:21 PM

#

warped sedge is there an ability to differentiate between people?

Nothing official, you might find some projects that manage to do but they would require running it locally

void egret Mar 16, 2023, 6:21 PM

#

wise sparrow Then send the link to api

You have to send the audio data, not a link

patent shale Mar 16, 2023, 6:21 PM

#

wise sparrow Then send the link to api

Correct. You are posting the audio file to the API.

wise sparrow Mar 16, 2023, 6:22 PM

#

Ahh thats so nicee other platforms require you to upload to s3 or sum first

void egret Mar 16, 2023, 6:22 PM

#

It has a 25mb max so you'll need to split the data into chunks if it's too much

wise sparrow Mar 16, 2023, 6:23 PM

#

I wish they have voice cloning aswell

warped sedge Mar 16, 2023, 6:23 PM

#

void egret Nothing official, you might find some projects that manage to do but they would ...

neat, will try and find one. thank you.

feral cedar Mar 16, 2023, 7:36 PM

#

Trying to use it with English language to transcipt audio - works well even though audio was in Ukrainian, it output English translation.
But when use parameter language=uk to get Ukranian it return code 0 and but get money for that work)

#

Is anybody try to work with language except English?

magic lance Mar 16, 2023, 7:42 PM

#

how do you get timestamps out of .transcribe?

void egret Mar 16, 2023, 9:03 PM

#

magic lance how do you get timestamps out of .transcribe?

Whisper doesn't provide word level timestamps

magic lance Mar 16, 2023, 9:04 PM

#

void egret Whisper doesn't provide word level timestamps

i dont need word level, I just need phrase-level

void egret Mar 16, 2023, 9:05 PM

#

magic lance i dont need word level, I just need phrase-level

Set response_format to verbose_json

magic lance Mar 16, 2023, 9:06 PM

#

void egret Set response_format to verbose_json

can I do this with the python lib or do I have to use the rest api to add response_format?

#

I tried in the params arg, but didn't work

void egret Mar 16, 2023, 9:07 PM

#

Python lib as in running it locally or their api library?

magic lance Mar 16, 2023, 9:08 PM

#

void egret Python lib as in running it locally or their api library?

yeah their api library

#

void egret Mar 16, 2023, 9:09 PM

#

Add response_format="verbose_json"

#

In that function call

serene plume Mar 16, 2023, 9:10 PM

#

Has anyone did a comparison of whisper to Google speech to text?

magic lance Mar 16, 2023, 9:10 PM

#

void egret Add `response_format="verbose_json"`

awesome that worked, cheers

void egret Mar 16, 2023, 9:10 PM

#

serene plume Has anyone did a comparison of whisper to Google speech to text?

You'll probably find it in the discussion tab or readme of the github repo

magic lance Mar 16, 2023, 9:10 PM

#

wonder why it wasn't showing up in the type hints in vs code

serene plume Mar 16, 2023, 9:10 PM

#

I wrote a python script that lets me record from my mic and it sends it to both. Pretty cool to see the differences.

void egret Mar 16, 2023, 9:11 PM

#

How well does google speech to text handle named entity recognition?

serene plume Mar 16, 2023, 9:11 PM

#

Really good, think Google Home Assistant good.

void egret Mar 16, 2023, 9:12 PM

#

~~I don't use any home assistant type of devices~~

serene plume Mar 16, 2023, 9:12 PM

#

Or maybe I misunderstood your question

void egret Mar 16, 2023, 9:12 PM

#

Picking up when to capitalize the names of people or products

serene plume Mar 16, 2023, 9:13 PM

#

I'm going to see right now actually

patent shale Mar 16, 2023, 9:17 PM

#

serene plume Has anyone did a comparison of whisper to Google speech to text?

I've compared both and it seemed that Whisper was more accurate, identified punctuation and capitalization better than Google.

void egret Mar 16, 2023, 9:18 PM

#

That tracks with what I've heard as they leverage their llm for whisper

serene plume Mar 16, 2023, 10:36 PM

#

Whisper is more accurate with sentence structure, capitalization. But it spelling is more likely to be wrong. It's more phonetic but you can understand what it meant. Models are down for me so I can't continue testing

slim hull Mar 16, 2023, 10:36 PM

#

is there a way to get a word from embedding vector with node.js ?

void egret Mar 16, 2023, 10:39 PM

#

You can't go from embeddings back to text, you need to store what the text maps to

slim hull Mar 16, 2023, 10:40 PM

#

so

#

how can I get the distance between that embeddings? I'm a little confused

void egret Mar 16, 2023, 10:41 PM

#

You'll want to use cosine similarity, if you use a vector database like weaviate or pinecone they do the math for you. #dev-chat is the channel for this

slim hull Mar 16, 2023, 10:42 PM

#

I need to know which portion of the content each embedding vector corresponds to

#

I was trying with Supabase pgvector dallef . So I will try with pinecone, ty!

long matrix Mar 17, 2023, 12:54 AM

#

I want to translate english audio into other languages, so far it looks like the translate flag is only for converting to english.. how can i specify another language for that?

void egret Mar 17, 2023, 1:00 AM

#

They don't currently translate into other languages

slim hull Mar 17, 2023, 1:02 AM

#

long matrix I want to translate english audio into other languages, so far it looks like the...

try this https://github.com/openai/openai-node/issues/93

gleaming meadow Mar 17, 2023, 5:52 AM

#

can we transcribe in realtime? like text streaming from a live recording

raven gate Mar 17, 2023, 11:12 AM

#

Can someone tell me why this code isn't returning a response when doing a post request?:

// Next.js API route support: https://nextjs.org/docs/api-routes/introduction
const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")

const key = process.env.OPEN_AI_KEY
const model = "whisper-1"

export default async function handler(req, res) {
    if (req.method !== "POST") {
        res.status(405).send({ message: "Only POST requests allowed" })
        return
    }

    return new Promise((resolve, reject) => {
        let formObj = new formidable.IncomingForm()

        const formData = new FormData()
        formData.append("model", model)

        formObj.parse(req, function (error, fields, file) {
            let filepath = file.fileupload.filepath
            formData.append("file", fs.createReadStream(filepath))
            axios
                .post("https://api.openai.com/v1/audio/transcriptions", formData, {
                    headers: {
                        Authorization: `Bearer ${key}`,
                        "Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
                    },
                })
                .then((res) => {
                    return res.status(200).send({ res: res.data })
                })
                .catch((err) => {
                    return res.status(500).send({ error: err })
                })
                .finally(() => res.status(204).end())
        })
    })
}```

#

Is it because of the boundary of the form data?

lost thistle Mar 17, 2023, 11:24 AM

#

It seems the issue is not with the boundary of the form data but with how you are handling the response from the axios post request. In your .then block, you are using the res variable as both the response from your Next.js API route and as the axios response, causing a conflict.

#

To fix this issue, you can rename the axios response variable to avoid conflicts with the Next.js API route's res. Here's the updated code:

#

const dotenv = require("dotenv").config()
const axios = require("axios")
const fs = require("fs")
const FormData = require("form-data")
const formidable = require("formidable")

const key = process.env.OPEN_AI_KEY
const model = "whisper-1"

export default async function handler(req, res) {
    if (req.method !== "POST") {
        res.status(405).send({ message: "Only POST requests allowed" })
        return
    }

    return new Promise((resolve, reject) => {
        let formObj = new formidable.IncomingForm()

        const formData = new FormData()
        formData.append("model", model)

        formObj.parse(req, function (error, fields, file) {
            let filepath = file.fileupload.filepath
            formData.append("file", fs.createReadStream(filepath))
            axios
                .post("https://api.openai.com/v1/audio/transcriptions", formData, {
                    headers: {
                        Authorization: `Bearer ${key}`,
                        "Content-Type": `multipart/form-data; boundary=${formData._boundary}`,
                    },
                })
                .then((axiosRes) => {
                    return res.status(200).send({ res: axiosRes.data })
                })
                .catch((err) => {
                    return res.status(500).send({ error: err })
                })
                .finally(() => res.status(204).end())
        })
    })
}```

#

In this updated code, I changed the axios response variable from res to axiosRes to avoid the conflict. This should resolve the issue and return a response as expected.

raven gate Mar 17, 2023, 11:40 AM

#

Alright thanks for pointing that out

#

However, still not getting a response. This is what I have in the frontend:

const submitHandler = (event) => {
        event.preventDefault()

        const data = new FormData(event.target)
        data.set("fileupload", data.get("fileupload"))

        const config = {
            headers: { "content-type": "multipart/form-data" },
        }

        axios
            .post("/api/whisper", data, config)
            .then((res) => console.log(res))
            .catch((err) => console.log(err))
    }

    return (
        <>
            <Head>
                <title>Lirical App</title>
                <meta name='viewport' content='width=device-width, initial-scale=1' />
                <link rel='icon' href='/favicon.ico' />
            </Head>
            <main className={styles.main}>
                <div className={styles.center}>
                    <form onSubmit={submitHandler}>
                        <label htmlFor='fileupload'>Upload Audio file</label>
                        <input required type='file' name='fileupload' accept='audio/*' />
                        <button type='submit'>Submit</button>
                    </form>
                </div>
            </main>
        </>
    )

#

Are there any py or open ai dependencies I need installed?

upbeat elm Mar 17, 2023, 1:07 PM

#

Hello, Im trying to kind open source code for whisper in python

void egret Mar 17, 2023, 1:10 PM

#

upbeat elm Hello, Im trying to kind open source code for whisper in python

You can find info on running the model here locally here https://github.com/openai/whisper

upbeat elm Mar 17, 2023, 1:10 PM

#

Okay thanks, Im trying to use it to make a chatbot, just for the stt

raven gate Mar 17, 2023, 8:53 PM

#

Ok I got my file to post to my whisper endpoint correctly, but how should I append to formData if the file doesn't include a file path from client?

cobalt pivot Mar 17, 2023, 10:34 PM

#

hey, how can I avoid unsupported chars ? ```

{
"text": "Dis bonjour \u00e0 ma m\u00e8re."
}```

autumn bolt Mar 17, 2023, 11:41 PM

#

anyone has tried sending a recorded audio file from safari straight to whisper endpoint v1?

#

it always complains

"message": "Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']",

#

i have tested my code on chrome, it works, but not on safari

#

even chatgpt4 didnt help hehe

raven gate Mar 17, 2023, 11:45 PM

#

Strange things indeed

#

Safari is kinda like IE imo

autumn bolt Mar 17, 2023, 11:48 PM

#

const speechToText = async (blob: Blob) => {
const formData = new FormData();
formData.append('file', new File([blob], 'audio.mp3', { type: 'audio/mp3' }));
formData.append('model', 'whisper-1');

return axios({
method: 'POST',
url: 'https://api.openai.com/v1/audio/transcriptions',
data: formData,
headers: {
'Authorization': 'Bearer <TOKEN>,
'Content-Type': 'multipart/form-data',
},
})
.then((response) => {
return response.data.text;
});
}

#

it is, safari is becoming a headache.

raven gate Mar 18, 2023, 12:08 AM

#

autumn bolt const speechToText = async (blob: Blob) => { const formData = new FormData(); ...

What backend are you using?

#

I used formidable in the backend to intercept the incoming form and get the file from the files object

#

I’ll also try my code on safari soon

#

Also, after I think I got everything working right, how would I be hitting the quota limit already? Can we not use it with free trial?

idle basin Mar 18, 2023, 12:30 AM

#

NOOOO

raven gate Mar 18, 2023, 12:41 AM

#

I find some youtube vids misleading then when they say it's free

odd heron Mar 18, 2023, 6:05 AM

#

raven gate However, still not getting a response. This is what I have in the frontend: ```j...

I think you need to put await

const submitHandler = async (event) => {
        event.preventDefault()

        const data = new FormData(event.target)
        data.set("fileupload", data.get("fileupload"))

        const config = {
            headers: { "content-type": "multipart/form-data" },
        }

        await axios
            .post("/api/whisper", data, config)
            .then((res) => console.log(res))
            .catch((err) => console.log(err))
    }

forest rover Mar 18, 2023, 11:45 AM

#

trying to understand this

Cell In [49], line 1
----> 1 result = whisper.transcribe(audio='talking.wav', model="Whisper")

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/whisper/transcribe.py:75, in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, **decode_options)
     32 """
     33 Transcribe an audio file using Whisper
     34
   (...)
     72 the spoken language ("language"), which is detected when `decode_options["language"]` is None.
     73 """
     74 dtype = torch.float16 if decode_options.get("fp16", True) else torch.float32
---> 75 if model.device == torch.device("cpu"):
     76     if torch.cuda.is_available():
     77         warnings.warn("Performing inference on CPU when CUDA is available")

AttributeError: 'str' object has no attribute 'device'

#

on MacOs M1 processor

#

it seems in transcribe.py model is a string by default, but it is expecting a property model.device

#

hmm. Seems to be a bug, they probably wanted to load the model when a string is provided, but didn't write that part ; you can get past it by providing a model instance instead of a string, for example model = whisper.load_model("base") then ... model=model)

#

so, question answered I guess. might file a bug, not sure if this even latest code, will look into it 🙂

gaunt plank Mar 18, 2023, 11:53 AM

#

How to get rid of your toxic thoughts

fickle cave Mar 18, 2023, 11:54 AM

#

What r you guys talking about

strong pendant Mar 18, 2023, 12:41 PM

#

^^^i am also curious

void egret Mar 18, 2023, 4:42 PM

#

Whisper is an open source speech to text transcription model developed by openai.

dense pulsar Mar 18, 2023, 9:05 PM

#

anyone know how to significantly speed up the offline models in whisper? like medium/large?

short rover Mar 19, 2023, 12:37 AM

#

Need to use GPU inference and viable resources.

#

If you’re asking about the details beyond that. Quantisation of the inference data types. For example lower resolution float types or the AI data types which seems to be a thing now. (Top of my head)

exotic cairn Mar 19, 2023, 6:06 AM

#

GPU >>

#

such a big difference for sure

left mist Mar 19, 2023, 11:10 AM

#

@dense pulsar gladia is a service that claims to have massively sped up the large model, haven't had the time to test it out yet.

#

also haven't tested out the official openai whisper model on the platform, but I assume it's quite fast.

#

Else you could have a look at using the huggingface transformers versions since they might be more easy to optimize. There are also a bunch of discussions on the whisper github that you could check out.

craggy heart Mar 19, 2023, 11:14 AM

#

hello rvry one

#

hello evry one

mellow nacelle Mar 19, 2023, 4:52 PM

#

👽

autumn bolt Mar 19, 2023, 5:57 PM

#

x

lapis flicker Mar 19, 2023, 6:01 PM

#

is anyone able to look at my post and tell me if its possible to do this https://discord.com/channels/974519864045756446/1087070780615045180

torpid coral Mar 20, 2023, 3:00 AM

#

any way to use the whisper API just using fetch / curl ?

#

Passing in a link to a mp3 or audio file

barren compass Mar 20, 2023, 8:23 AM

#

no, you need to send the file data

#

fetch it yourself from the url, then send it

hybrid flint Mar 20, 2023, 8:31 AM

#

#gpt-realtime make a game in python

zealous cedar Mar 20, 2023, 9:01 AM

#

give me a two direction stepwise regression code

#

#gpt-realtime give me a two direction stepwise regression code

willow gale Mar 20, 2023, 10:22 AM

#

#gpt-realtime make a game in python

wise lily Mar 20, 2023, 10:23 AM

#

are you there?

#

#gpt-realtime are you there?

wicked saffron Mar 20, 2023, 11:26 AM

#

#gpt-realtime sorry

stone chasm Mar 20, 2023, 12:10 PM

#

Lol, is this your real key?

uneven kayak Mar 20, 2023, 12:19 PM

#

I was thinking the exact same thing. If so, you need to make a couple of changes and quite urgent.

static pagoda Mar 20, 2023, 1:31 PM

#

🤣

mortal plover Mar 20, 2023, 3:58 PM

#

I have question about whisper use - can it translate from english to other language?

neon zinc Mar 20, 2023, 4:15 PM

#

I paid for the gpt plus chat subscription to the account registered by mail maksim543761@gmail.com. What should I do? I didn't even get an email\

raven gate Mar 20, 2023, 11:40 PM

#

@dense zodiac I hope you aren’t under the paid plan with that key

#

Or else someone gonna spam up to 120 usd with your key

peak saffron Mar 21, 2023, 2:52 AM

#

Has anyone gotten whisper to work with audio recorded in Safari?

#

(I mean the whisper api - not sure which channel is the correct location)

tacit hedge Mar 21, 2023, 4:31 AM

#

Hi, I need to transcribe an audio file in non English language and transalate the transcribed output to other language with actual time stamp so the TTS can be in sync with the original file.

I have tried WshiperX but not that great results. Please help me here.

regal dagger Mar 21, 2023, 5:33 AM

#

the tricky part is synchronization

dark parrot Mar 21, 2023, 6:18 AM

#

Yumi Karahashi, the princess of my heart, has gotten married. I am stunned.

mortal plover Mar 21, 2023, 7:47 AM

#

tacit hedge Hi, I need to transcribe an audio file in non English language and transalate th...

how do you translate to another language than English?

high mountain Mar 21, 2023, 8:18 AM

#

Hello, is Whisper works 'on the fly' as Google Speech-to-text? I mean is it as fast as google? I would like to make a conversational bot using whisper API , chatgpt API and elevenlab API. I worry that whisper is too slow and the user experience won't be fluent.

mortal plover Mar 21, 2023, 8:57 AM

#

high mountain Hello, is Whisper works 'on the fly' as Google Speech-to-text? I mean is it as ...

no its not that fast at all. its a bummer as for example Pixel 6/7 with Tensor can do this immediately on the phone. for online its still far from perfect in my opinion

high mountain Mar 21, 2023, 9:03 AM

#

What API do you reccomend then? Is google Speech-to-text best on the market?

mortal plover Mar 21, 2023, 9:21 AM

#

I do not know exactly how quick in response can be Whisper API so perhaps you still should give it a try. My results are based on own hardware and so far its always slower then 1x speed

#

what makes it not best for path you are looking for. My assumption is you want to make kind of own google assistant/Siri like bot. Thumbs up for this + integration with homeassistant open platform 🙂

storm oak Mar 21, 2023, 11:52 AM

#

what should the initial prompt look like?
just the start of the audio?
random list of important words?

gray thunder Mar 21, 2023, 1:33 PM

#

What's whisper

small juniper Mar 21, 2023, 2:02 PM

#

What s whisper

analog willow Mar 21, 2023, 2:20 PM

#

Hello everyone, I have a program on Github that will run whisper in batch mode if anyone wants it? Message me directly because OpenAI won't let me post a simple link to it on Github!

autumn bolt Mar 21, 2023, 10:18 PM

#

guys

#

who uses the microsft chat gpt like the cht in microsoft edge

#

@cosmic lanternone

grim hound Mar 22, 2023, 2:26 AM

#

autumn bolt who uses the microsft chat gpt like the cht in microsoft edge

Well, other question, why should I?

barren compass Mar 22, 2023, 7:35 AM

#

autumn bolt who uses the microsft chat gpt like the cht in microsoft edge

i do

covert shuttle Mar 22, 2023, 9:05 AM

#

neon zinc I paid for the gpt plus chat subscription to the account registered by mail maks...

I’d recommend that you remove the email from discord messages

#

Unless you want someone to send you phishing mails or scam mails, it is just a friendly reminder since some people are “not nice”

#

As far as your question goes, I’d recommend contacting OpenAI through their support mail

trim rampart Mar 22, 2023, 9:39 AM

#

I was trying Whisper with Bengali. It doesn't seem to figure out that it is Bengali and just translates to Hindi. I had more luck with Kannada. Does Whisper support Bengali?

#

Whisper is quite good with Hindi BTW.

mortal plover Mar 22, 2023, 11:05 AM

#

trim rampart I was trying Whisper with Bengali. It doesn't seem to figure out that it is Beng...

You were translating from Hindi to English or from English to Hindi?

still briar Mar 22, 2023, 11:22 AM

#

so i have a few long audio files, around 10-15 minutes what I want to do is separate the audio files at the end of each sentence for that I have a test document separating them. can whisper know when a portion of audio relative to the text starts and ends so that I can extract that data to separate the large audio files into small audio files of each sentence

trim rampart Mar 22, 2023, 11:31 AM

#

@mortal plover I was trying to transcribe Bangla.

trim rampart Mar 22, 2023, 12:10 PM

#

Sorry for saying translate. 🤦‍♂️ \

covert skiff Mar 22, 2023, 3:21 PM

#

Can this do anything? Per say math?

#

And………. Does it work while I don’t have safari going

pure veldt Mar 22, 2023, 4:27 PM

#

Hello, is it possible to implement a feature like ':on-progress' (which calls a function) in Whisper ASR, similar to the ':verbose true' functionality? I would like to use in backend.. I can't catch the stdout just at the end. Any idea? (at transcribe fn)

autumn bolt Mar 22, 2023, 5:31 PM

#

hey i am trying to install whisper following this video https:// www. youtube .com/ watch?v=XX-ET_-onYU

#

but i have to install a ''git'' , what is it? ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

the video didnt explained this

#

post the link with space pls

autumn bolt Mar 22, 2023, 5:35 PM

#

autumn bolt but i have to install a ''git'' , what is it? ERROR: Cannot find command 'git' ...

On OSX just "brew install git" and solved.

autumn bolt Mar 22, 2023, 5:35 PM

#

autumn bolt On OSX just "brew install git" and solved.

?

lilac karma Mar 22, 2023, 5:35 PM

#

autumn bolt but i have to install a ''git'' , what is it? ERROR: Cannot find command 'git' ...

Git is a tool for developers and we can manage our project there. You can upload the source code of your program in github and use git to make changes there

#

For install we need first know what is your distribution?

autumn bolt Mar 22, 2023, 5:36 PM

#

autumn bolt ?

in the terminal console. I don't know what you use..

autumn bolt Mar 22, 2023, 5:37 PM

#

autumn bolt On OSX just "brew install git" and solved.

brew?

lilac karma Mar 22, 2023, 5:37 PM

#

Do you use linux distributions? @autumn bolt

autumn bolt Mar 22, 2023, 5:37 PM

#

windows

#

1

#

0

autumn bolt Mar 22, 2023, 5:37 PM

#

autumn bolt brew?

brew .sh ok, sorry, that is Mac

#

https:// hub. tcno.co/ai/whisper/install/ i was following this made it all the way to the last line

#

but then got the error

#

pip3 install git+https: //github.com/openai/whisper.git
(the last line)

#

because the tutorial is incomplete

#

what is the error message?

#

ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

#

i didnt install this git

#

pip3 install git

#

ok

#

wait.. I check it

#

C:\Users\gusta>pip3 install git
ERROR: Could not find a version that satisfies the requirement git (from versions: none)
ERROR: No matching distribution found for git

#

pip3 install GitPython

lilac karma Mar 22, 2023, 5:40 PM

#

autumn bolt C:\Users\gusta>pip3 install git ERROR: Could not find a version that satisfies t...

I didn't chk this command but try it in PowerShell:
winget install git

autumn bolt Mar 22, 2023, 5:41 PM

#

Yes, maybe that is Windows specific

autumn bolt Mar 22, 2023, 5:41 PM

#

lilac karma I didn't chk this command but try it in PowerShell: winget install git

he is thinking now

#

downloaded it

#

now installing but quite slowly

#

#

''redifine the entry''

#

ops ''refine the entry''

#

the message is: Several packages matched entry criteria. Refine the entry.

#

now what do i do? @autumn bolt

lilac karma Mar 22, 2023, 5:51 PM

#

autumn bolt

What is your language? We need to translate to understand what is it saying...

autumn bolt Mar 22, 2023, 5:51 PM

#

its portuguese

#

i translated

#

it

autumn bolt Mar 22, 2023, 5:51 PM

#

autumn bolt the message is: Several packages matched entry criteria. Refine the entry.

here

#

first he asked if i agree with terms of contract i said y (YES)

lilac karma Mar 22, 2023, 5:52 PM

#

autumn bolt here

Ah OK tnx

autumn bolt Mar 22, 2023, 5:52 PM

#

autumn bolt the message is: Several packages matched entry criteria. Refine the entry.

then this

lilac karma Mar 22, 2023, 5:53 PM

#

www.atlassian.com/git/tutorials/install-git

#

Visit this website

autumn bolt Mar 22, 2023, 5:54 PM

#

ok i will try thanks

lilac karma Mar 22, 2023, 5:56 PM

#

Okay if it is not successfully, tell us

autumn bolt Mar 22, 2023, 5:58 PM

#

autumn bolt now what do i do? <@456226577798135808>

I use Mac, so that is sound like a Windows issue. I don't have experience in this.

lilac karma Mar 22, 2023, 5:59 PM

#

autumn bolt I use Mac, so that is sound like a Windows issue. I don't have experience in thi...

If you have issue too you can visit my above link message
I mean this www.atlassian.com/git/tutorials/install-git

autumn bolt Mar 22, 2023, 5:59 PM

#

.

#

#

is this normal right

#

My issue is here.. #gpt-realtime message .. is this place.. the OpenAI support or just community?