lilac karma Mar 22, 2023, 6:02 PM

#

I have no idea for that

autumn bolt Mar 22, 2023, 6:04 PM

#

#

well seems like its working now

#

#

but when i type ''whisper'' should be appearing something

#

showing that its installed..

#

huh..

#

actually the pc is still thinking, quite slow god

#

have you tried running it in collab?

#

you can run the large model and not have to worry about your GPU crying

autumn bolt Mar 22, 2023, 6:07 PM

#

autumn bolt have you tried running it in collab?

never used this lol

#

i am quite average pc user

autumn bolt Mar 22, 2023, 6:08 PM

#

autumn bolt never used this lol

same, I learned a ton of python and node.js thanks to tutor GPT

#

explains things so well

#

you can just do this on Google Collab

📎 message.txt

#

since its here, its installed?

#

try running it

#

now you just need a path for the transcription

#

transcription folders

#

its simply not running through cmd, i will trying to figure it out how to use this collab now, one step at time

#

i am trying*

autumn bolt Mar 22, 2023, 6:22 PM

#

autumn bolt you can just do this on Google Collab

copy and paste these lines

#

here?

#

yep

#

ok

autumn bolt Mar 22, 2023, 7:00 PM

#

hey
!whisper "name.mp3" --language pt --task transcribe --model medium
is this the code to transcript an audio in portuguese?

#

first time doing this

#

idk if the format is right

trim rampart Mar 22, 2023, 7:02 PM

#

Is there anyone from OpenAI here who can tell me why does Whisper transcribe Bengali as Hindi? Does Whisper not support Bengali?

autumn bolt Mar 22, 2023, 7:03 PM

#

API discussions, We Are OpenAI, my guy.
developers piling as a community playing and improving the tech

autumn bolt Mar 22, 2023, 7:04 PM

#

trim rampart Is there anyone from OpenAI here who can tell me why does Whisper transcribe Ben...

did you check on !whisper -h?

autumn bolt Mar 22, 2023, 7:04 PM

#

autumn bolt hey !whisper "name.mp3" --language pt --task transcribe --model medium is this t...

OH!!! i know

#

do you have access to chatgpt still?

#

or is it buggy for you too?

#

yes i have

#

subscription

#

let chatgpt read the whole command list

#

and ask it for what you need

trim rampart Mar 22, 2023, 7:06 PM

#

@autumn bolt I was trying the Whisper API from the OpenAI playground. I want ASR with multilingual support.

autumn bolt Mar 22, 2023, 7:07 PM

#

#

idk if this working or just slow

autumn bolt Mar 22, 2023, 7:07 PM

#

autumn bolt

that's take a while

#

this is*

autumn bolt Mar 22, 2023, 7:07 PM

#

trim rampart <@456226577798135808> I was trying the Whisper API from the OpenAI playground. I...

That is works fine for me with large or large-v2 model for Hungarian. [asr]

autumn bolt Mar 22, 2023, 7:07 PM

#

trim rampart <@456226577798135808> I was trying the Whisper API from the OpenAI playground. I...

you can't use Whisper AI from OpenAI playground

#

you need python notes

autumn bolt Mar 22, 2023, 7:09 PM

#

autumn bolt

why not run the large one?
you've got access to a cloud gpu that has 16Gb of RAM
and 50Gb of VRAM

#

autumn bolt Mar 22, 2023, 7:10 PM

#

autumn bolt why not run the large one? you've got access to a cloud gpu that has 16Gb of RAM...

i am just testing if works one

#

smaller score is better @trim rampart see the chart.. but around 99 languages supported in the ASR too..

#

you can get more info in the github repo

autumn bolt Mar 22, 2023, 7:12 PM

#

autumn bolt smaller score is better <@814131463909802010> see the chart.. but around 99 lan...

What, no it is not

#

have you not compared the quality?

#

that is the official chart from OpenAI

trim rampart Mar 22, 2023, 7:13 PM

#

@autumn bolt Bengali/Bangla is not in the list. It is one of the most spoken languages.

autumn bolt Mar 22, 2023, 7:14 PM

#

autumn bolt that is the official chart from OpenAI

you haven't tried it yourself then?

#

_>

#

whisper transcribes audios from video files? or has to be audio files?

#

go try it and compare the transcription quality

XD

autumn bolt Mar 22, 2023, 7:14 PM

#

autumn bolt you haven't tried it yourself then?

Yes, I use ASR transcribe modul. [locally, not API]

autumn bolt Mar 22, 2023, 7:15 PM

#

trim rampart <@456226577798135808> Bengali/Bangla is not in the list. It is one of the most s...

I don't know, I'm not from OpenAI team.

#

doesnt do anythiing in 7 minutes, geez

autumn bolt Mar 22, 2023, 7:17 PM

#

autumn bolt you haven't tried it yourself then?

I use via Clojure interop (trough JNA) with Python.

autumn bolt Mar 22, 2023, 7:20 PM

#

autumn bolt doesnt do anythiing in 7 minutes, geez

Try with 5 sec audio.. I think your VM have too low RAM, therefore slow.

#

Normally if you would like to run ASR, do you need for normal speed at least 20Gb ram free. x1 == large model

autumn bolt Mar 22, 2023, 7:23 PM

#

autumn bolt Try with 5 sec audio.. I think your VM have too low RAM, therefore slow.

#

#

this is needed

#

@autumn bolt still running?

autumn bolt Mar 22, 2023, 7:48 PM

#

autumn bolt <@456226577798135808> still running?

now i am running again, but its working finally i learned this lol

autumn bolt Mar 22, 2023, 7:49 PM

#

autumn bolt now i am running again, but its working finally i learned this lol

congrats

autumn bolt Mar 23, 2023, 12:52 AM

#

i am using this to transcript: !whisper "exemplo.mp3" --language pt --task transcribe --model medium

but how can i transcript just a specific part of the audio? like from 3:00 to 10:00

reef python Mar 23, 2023, 1:20 AM

#

Hi, I keep getting 400 errors when trying to use whisper API for translation with language ja - what am I doing wrong>

gray wigeon Mar 23, 2023, 5:42 AM

#

Hi all 👋
I'm wondering if anyone had experienced an issue where whisper seems to try to translate instead of transcribing?

#

I just said in English "how is that possible"
and it transcribed it as "Kako je to možno?"
which according to google translate is Croatian

#

I can reproduce it with the audio

#

if I trim the silence at the start it transcribes properly

#

autumn bolt Mar 23, 2023, 8:15 AM

#

autumn bolt i am using this to transcript: !whisper "exemplo.mp3" --language pt --task trans...

You filter by the timestamps or cut the audio file. But for this do you need to know python programming.

autumn bolt Mar 23, 2023, 8:44 AM

#

gray wigeon Hi all 👋 I'm wondering if anyone had experienced an issue where whisper seems ...

That is interesting since.. [.. to-English speech translation] possible. If know good.. "We only support translation into english at this time."

#

But Im not from the OpenAI team...

mortal plover Mar 23, 2023, 10:10 AM

#

autumn bolt That is interesting since.. [.. to-English speech translation] possible. If kno...

exactly, I am hustling now with translation from English to other languages and its supposed to be not possible but hey - its done here...

pure veldt Mar 23, 2023, 12:05 PM

#

mortal plover exactly, I am hustling now with translation from English to other languages and ...

Transcribe ASR and translate with OpenAI API 😉

versed scroll Mar 23, 2023, 2:09 PM

#

So is it possible to have like an audio in Russian and have the transcript in like Spanish instead of English?

#

Or even have like an audio in Spanish and have the transcript/result as Spanish too, for like deaf people

tough tangle Mar 23, 2023, 7:57 PM

#

hi ¯_(ツ)_/¯

#

(╯°□°)╯︵ ┻━┻

autumn bolt Mar 23, 2023, 8:48 PM

#

versed scroll So is it possible to have like an audio in Russian and have the transcript in li...

Check out Gladia if this is your usecase

peak saffron Mar 23, 2023, 8:57 PM

#

Has anyone managed to get microphone recording in Safari or Firefox to work through the Whisper API?

#

I have it working well on chrome but never the other two

split palm Mar 23, 2023, 10:19 PM

#

Does Whisper work both ways? TTS and STT

barren compass Mar 23, 2023, 10:53 PM

#

no

sonic mango Mar 24, 2023, 8:22 AM

#

Hello Everyone, does anybody know if whisper is working on diarization ?

pure veldt Mar 24, 2023, 11:11 AM

#

sonic mango Hello Everyone, does anybody know if whisper is working on diarization ?

Maybe do you would like to create/ use like this? huggingface. co/spaces/vumichien/whisper-speaker-diarization

#

That can transcribe and do diarization.

rough badge Mar 24, 2023, 12:24 PM

#

hey guys

#

I want to work with open Ai to stress test it

#

our team has found many bugs and wants to help

pure veldt Mar 24, 2023, 12:32 PM

#

Bug-reports channel

autumn bolt Mar 24, 2023, 12:51 PM

#

has anyone tried applying for a transcription gig with whisper here?

#

I'm trying to install Whisper with pip3 install git+https://github.com/openai/whisper.git on MacOS and I'm getting "error: metadata-generation-failed" anyone know how I can fix this? Here's the full output when trying to install https://pastebin.com/qGGvgKaE

bronze delta Mar 24, 2023, 1:07 PM

#

Hi everyone! I am trying to use Whisper via API to transcribe an audio recorded in the browser.
The way to the server (which is a NodeJS express backend) works. I have used both OpenAI's npm lib as well as talking directly to the OpenAI endpoint https://api.openai.com/v1/audio/transcriptions.

I always get a 400, bad request back and I can't work out why. I am sending the file as webm and via mulitpart/form-data

The function is in the screenshot

Any ideas? Many thanks in advance for an hint!
The file is in the Blob format /* let blob = new Blob([Buffer.from(audioFile, 'base64')]); */

bronze delta Mar 24, 2023, 3:34 PM

#

bronze delta Hi everyone! I am trying to use Whisper via API to transcribe an audio recorded ...

we're continuing the discussion here https://discord.com/channels/974519864045756446/1088568452396093562

lone orchid Mar 24, 2023, 5:09 PM

#

Does Whisper have a recording length restriction? Im having issues where its not translating the entire file im sending it

#

And its under 25mb

faint cloud Mar 24, 2023, 5:26 PM

#

I know that Whisper can produce either a transcription into input language, or translation into English. Is there a way to produce both in one go? That is, feed audio file to Whisper and get 2 transcripts as output, one in the original language, and one in English?

lone orchid Mar 24, 2023, 5:30 PM

#

Not on the same call

#

You would have to do two seperate calls, one to translation and one to transcription

hybrid surge Mar 24, 2023, 6:07 PM

#

#gpt-realtime

austere nimbus Mar 24, 2023, 6:32 PM

#

Hi there, I'm thinking about implementing Whisper with MP4s or MP3s. Can anyone tell me how long it takes in seconds for a result? For let's say 30 words.

faint cloud Mar 24, 2023, 6:40 PM

#

lone orchid You would have to do two seperate calls, one to translation and one to transcrip...

but why? this doesn't seem optimal. As far as I understand the difference starts at the tokenizer creation stage

faint cloud Mar 24, 2023, 6:41 PM

#

austere nimbus Hi there, I'm thinking about implementing Whisper with MP4s or MP3s. Can anyone...

This heavily depends on the hardware you're running the process on

austere nimbus Mar 24, 2023, 6:50 PM

#

faint cloud This heavily depends on the hardware you're running the process on

Sorry, I mean for the Whisper API requests.

peak saffron Mar 24, 2023, 8:22 PM

#

austere nimbus Sorry, I mean for the Whisper API requests.

It'd be very fast. I'm sending recordings of about that length and it's maybe a second?

#

2 sometimes

whole tundra Mar 24, 2023, 10:11 PM

#

how can i get to the whisper website

wet herald Mar 24, 2023, 10:54 PM

#

API requests getting timeout?

#

Anyone?

patent shale Mar 24, 2023, 10:57 PM

#

Seeing timeout on transcription

#

translation is working as expected

wet herald Mar 24, 2023, 10:59 PM

#

Thanks for the info 🙂 Getting timeout on transcription too

gritty galleon Mar 24, 2023, 11:00 PM

#

What is #gpt-realtime?

wet herald Mar 24, 2023, 11:03 PM

#

Audio to text converter / translator

wet herald Mar 24, 2023, 11:35 PM

#

is there a .tflite or .mlmodel large-v2 model available?

slow urchin Mar 25, 2023, 12:32 AM

#

What is the status

pure veldt Mar 25, 2023, 7:25 AM

#

You could try out. Large-v2 works, I tested.

bold orchid Mar 25, 2023, 7:41 AM

#

what do you eat today

knotty trail Mar 25, 2023, 1:54 PM

#

Does anyone know where I can read docs on silero vad? Trying to fix Whisper hallucinations

compact wind Mar 25, 2023, 5:48 PM

#

Whisper AI api is regularly returning Welsh text when I give it english voice input. Why is it also doing a translation? (If I hand the text to ChatGPT and ask it to identify the language and provide an English translation it is identifying it as welsh and giving the essence of what I said)

compact wind Mar 25, 2023, 5:48 PM

#

compact wind Whisper AI api is regularly returning Welsh text when I give it english voice in...

I don’t have a welsh accent!

knotty trail Mar 25, 2023, 8:44 PM

#

compact wind Whisper AI api is regularly returning Welsh text when I give it english voice in...

Are you setting the language as English?

#

But you can read in the Whisper paper that a lot of their 'Welsh' data was actually just English but was labelled as Welsh, I'm guessing that's the source of the issue

compact wind Mar 25, 2023, 8:47 PM

#

knotty trail Are you setting the language as English?

Possibly not. It might be set to auto. I'm using an ios shortcut I found on the internet - I've not found the API docs myself to cross reference. It's strange it's auto detecting english and then translating to welsh though. Most of the time it returns english.

The api request appears to be posting to /v1/audio/transcriptions, with a request body containing model: whisper-1 and the audio file. No other optional parameters.

knotty trail Mar 25, 2023, 8:49 PM

#

compact wind Possibly not. It might be set to auto. I'm using an ios shortcut I found on the ...

Screen_Shot_2023-03-25_at_9.49.17_PM.JPG

#

Just a random guess it has something to do with this.

compact wind Mar 25, 2023, 8:51 PM

#

Looks likely. Do you know what parameter I need to set to inform whisper that the input is English? It's becoming annoying to dictate minutes of off the cuff thinking, and have it come back in Welsh! (With no recording to fall back on) 😉

#

I looked online for the API docs, but the web site just refers me to discord - hence my question.

knotty trail Mar 25, 2023, 8:53 PM

#

compact wind Looks likely. Do you know what parameter I need to set to inform whisper that th...

No clue I don't use the API, I run freesubtitles.ai , that may be a better solution though I don't know which iOS thing you're using.

#

(Hope I'm not breaking TOS by posting that feel free to tell me if I am mods)

compact wind Mar 25, 2023, 8:55 PM

#

ios has a built in WYSIWYG shortcuts system for building extensions. My shortcut is triggered from SIRI when I say 'convert with whisper', captures some audio, calls the whisper API, and then shows the text and copies it into the paste buffer. Very useful.

knotty trail Mar 25, 2023, 8:56 PM

#

Nice. If you can figure out how to pass the language that will probably fix it. Though I have had a lot of people use auto-detect language with my service and I can't remember seeing it incorrectly identified as Welsh

compact wind Mar 25, 2023, 9:00 PM

#

Must be something about my voice 😉

#

Where can I find the API docs then I wonder.

knotty trail Mar 25, 2023, 9:50 PM

#

freesubtitles.ai is my site

#

working pretty well over 1 million minutes transcribed already

#

not sure about that. I think there are Telegram bots perhaps

rough badge Mar 26, 2023, 12:18 AM

#

Hey i need to get into contact with a discord dev

#

who do I talk to?

#

@hushed pier

#

mod mail dont work like that huh

humble crater Mar 26, 2023, 4:48 AM

#

Hello and good day. I report this error due to its persistence, I have deleted the cookies, I have reloaded the page among other things and this error continues, could you help me? Thank you

#

autumn bolt Mar 26, 2023, 4:56 AM

#

humble crater Hello and good day. I report this error due to its persistence, I have deleted t...

Server's flooded
Don't bother with chat.openai

Just use the API keys.
It's better in everyway

humble crater Mar 26, 2023, 4:56 AM

#

How can I use the API keys?

errant geyser Mar 26, 2023, 1:17 PM

#

rough badge Hey i need to get into contact with a discord dev

Post your problem or something, they’re busy people and if your question is worth it they’ll probably respond.

barren relic Mar 26, 2023, 6:05 PM

#

Who has access to GPT plugins yet? I have a really cool project that I want to collaborate on, my company is called Whop (whop.com) shoot me a dm!

rough badge Mar 26, 2023, 8:14 PM

#

errant geyser Post your problem or something, they’re busy people and if your question is wort...

right so the reason I need to talk to a discord dev is bc there is a glitch built into discord that allows certain people to use the dev portal

#

the people who are using it

#

aren't good people btw

errant geyser Mar 26, 2023, 8:28 PM

#

what does that have to do with openai @rough badge ?

rough badge Mar 26, 2023, 8:33 PM

#

right

#

so I wont go into detail

#

but people have been able to glitch open Ai in ways thought to be un do-able think the worst possible out come

#

and that worst possible out come is where the glitching community started

light nacelle Mar 27, 2023, 1:14 AM

#

Hi everyone, new here. Don't know if I am writing in the wrong section. But where can I search for developers with experience tinkering with Whisper and Gradient API?

I am interested in a joint venture, I have 2 companies eager to pay for gpt3. 5 or gpt4 implementation in their customers service.

inland sable Mar 27, 2023, 2:15 AM

#

So I am talking to gpt4?

woven bluff Mar 27, 2023, 11:46 AM

#

Does whisper support transcription of non words like coughing, laughing, and so on, for example with a prompt? Usually for coughing I get "thank you" which is polite but incorrect 😅

#

I can at least detect the non speech probability and most of the time it's good but still curious about it

lethal thorn Mar 27, 2023, 5:28 PM

#

Hey all--I'm a node.js developer, and am wondering if Whisper supports word-level time stamping out of the box? Or do I need additional libraries?

knotty trail Mar 27, 2023, 5:47 PM

#

lethal thorn Hey all--I'm a node.js developer, and am wondering if Whisper supports word-leve...

It does with the latest version if you pass the param

lethal thorn Mar 27, 2023, 6:37 PM

#

knotty trail It does with the latest version if you pass the param

Ah fantastic! Can you direct me to where the most up-to-date Node.js API is for Whisper? I'm having trouble finding it

#

Right now, in the openai.createTranscription() method, I don't see a param for word-level timing or anything like that.

knotty trail Mar 27, 2023, 7:11 PM

#

lethal thorn Ah fantastic! Can you direct me to where the most up-to-date Node.js API is for ...

I don't think it's offered with the API but you can use the open source Whisper module

lethal thorn Mar 27, 2023, 7:21 PM

#

Ah, I see. Sorry to be reptitive, but which is that? I see so many on npm and github! lol

knotty trail Mar 27, 2023, 7:21 PM

#

https://github.com/openai/whisper

GitHub

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale ...

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

lethal thorn Mar 27, 2023, 7:27 PM

#

Thank you! It appears to only be Python-compatible at the moment, is that correct? I'm a JS-only guy, currently. But I have some Python devs who can implement this, if need be.

knotty trail Mar 27, 2023, 7:33 PM

#

You can call it with spawn from node, like I do here: github[dot]com/mayeaux/generate-subtitles

lethal thorn Mar 27, 2023, 7:52 PM

#

Oh I was not aware of this! Thanks so much for your help!

peak saffron Mar 27, 2023, 8:19 PM

#

They actually added whisper support to the openai node package

#

not sure when, but I only noticed it last week

#

Oh weird, i don't see it in their code example - I swear I saw it. It still seems to work for me! I'm calling it like this:

const response = await openai.createTranscription(
  fs.createReadStream(audioPath),
  "whisper-1"
);
  return response.data;

#

Where openai is an openai client with key added through configuration (they have it in their other api examples)

lethal thorn Mar 27, 2023, 8:43 PM

#

Thanks!! I did see (and have used) this part of the API... but I don't see anything in there for adding the word-level time-stamping, unfortunately.

#

The transcription part works perfectly, though.

knotty trail Mar 27, 2023, 9:08 PM

#

lethal thorn Thanks!! I did see (and have used) this part of the API... but I don't see anyt...

It seems the API only gives back the text transcription and not the .srt , vtt or .json files that Whisper as a standalone does

lethal thorn Mar 27, 2023, 9:18 PM

#

Darn. That's frustrating...

knotty trail Mar 27, 2023, 9:31 PM

#

@lethal thorn I might add the word level timestamps functionality to freesubtitles.ai , it also has API access

peak saffron Mar 27, 2023, 9:39 PM

#

https://platform.openai.com/docs/api-reference/audio/create

OpenAI API

An API for accessing new AI models developed by OpenAI

#

Are you sure? Allegedly it can accept a parameter for that

#

lethal thorn Mar 27, 2023, 9:46 PM

#

well it definitely returns json, but the only thing in it is the transcription. Weird, tbh.

lethal thorn Mar 27, 2023, 9:46 PM

#

knotty trail <@314181981514629120> I might add the word level timestamps functionality to fre...

Ah cool page! I'll keep an eye out for that 🙂

knotty trail Mar 27, 2023, 9:47 PM

#

peak saffron https://platform.openai.com/docs/api-reference/audio/create

Ah thanks I missed that

peak saffron Mar 27, 2023, 9:48 PM

#

sure thing - I think they weirdly have documentation in two places and only one mentions the node package and some of the features

#

I don't think whisper is top of OpenAi's priorities though

lethal thorn Mar 27, 2023, 9:51 PM

#

Yeah, the open source one appears to only be in python? Which makes this more confusing lol

#

But agreed, it doesn't seem to be top of their list. All I want is word-level time stamping 😄 and to be able to do that with node lol

knotty trail Mar 27, 2023, 10:01 PM

#

peak saffron I don't think whisper is top of OpenAi's priorities though

I am sure their hands are full with ChatGPT atm

knotty trail Mar 27, 2023, 10:02 PM

#

lethal thorn But agreed, it doesn't seem to be top of their list. All I want is word-level ti...

Most of the AI stuff runs on Python tools so the only way to really run it is to plug into the CLI tools with Node

#

Though it seems they will roll out word_timestamps functionality for the API at some point: https://github.com/openai/whisper/pull/869#issuecomment-1459431437

GitHub

word-level timestamps in `transcribe()` by jongwook · Pull Request ...

peak saffron Mar 27, 2023, 10:07 PM

#

does using the response_format parameter not work on the api? srt or vtt should have timestamps I thought? (though i thought it was at ~utterance level, rather than word level)

#

(that utterance comment is coming from using the local model (python), not the api so I don't really know)

patent shale Mar 27, 2023, 10:08 PM

#

the srt and vtt outputs do have timestamps, but they are not at the per word level. they are phrase timing.

lethal thorn Mar 27, 2023, 10:10 PM

#

Interesting, what does Whisper consider a "phrase"?

#

Just a group of words without a significant pause?

lethal thorn Mar 27, 2023, 10:14 PM

#

knotty trail Most of the AI stuff runs on Python tools so the only way to really run it is to...

sad 😄 I guess I'll have to put in a little more legwork than I'd hoped haha oh well.

patent shale Mar 27, 2023, 10:18 PM

#

lethal thorn Just a group of words without a significant pause?

Yes. Those formats are generally used in conjunction with closed captions on video. So the timing is what is readable on a screen during that time. Example SRT: 0:00:00,000 --> 00:00:08,000

#

Appears the chunks are in about 8 second blocks

lethal thorn Mar 27, 2023, 11:12 PM

#

Very interesting, thank you! Hopefully the word-level stamping comes soon. So many applications for it!

knotty trail Mar 27, 2023, 11:41 PM

#

lethal thorn Very interesting, thank you! Hopefully the word-level stamping comes soon. So ma...

What do you need word level timestamps for specifically? Just curious

lethal thorn Mar 28, 2023, 1:17 AM

#

No problem! I work for an educational publishing company, and we currently pay a vendor a lot of money to manually tag the timestamps for each word in our eBooks so that they can highlight as they are read

#

Figured I could save them a lot of money (and time!) every year relatively easily with the time stamping

peak saffron Mar 28, 2023, 2:32 AM

#

Oh wow that's like a textbook 'you should start a startup and sell to your current company' situation it sounds like

dense pulsar Mar 28, 2023, 2:46 AM

#

Is it better to use offline whisper models or the API? Which would grant me more accurate results (and preferably speed)

#

Let’s say I want to transcribe an 8 hour audio file. Would it be faster and more accurate using the API or an offline large model (RTX 3080)

lethal thorn Mar 28, 2023, 3:13 AM

#

peak saffron Oh wow that's like a textbook 'you should start a startup and sell to your curre...

Hahaha 😂😂 honestly not a bad idea!! All I need is the word level timestamping and i can add a few 0s to my bank account

fluid locust Mar 28, 2023, 6:16 AM

#

dense pulsar Let’s say I want to transcribe an 8 hour audio file. Would it be faster and more...

With the API, 1 hour audio is about 5-10 mins.

autumn bolt Mar 28, 2023, 8:57 AM

#

lethal thorn Yeah, the open source one appears to only be in python? Which makes this more co...

You can load Python into Java/ Clojure (all JVM based.. like Scala..) via JNA. I saw C++ Whisper solution too like Jojo. I don't think, that is a core problem at ASR. At API.. very easy to write a lib for this. Very clear.. not like Google things sometimes 😉

slow urchin Mar 28, 2023, 9:13 AM

#

Do you remember the last conversation we had?

slow urchin Mar 28, 2023, 9:49 AM

#

📎 whisper-20230314.zip

dense pulsar Mar 28, 2023, 10:04 AM

#

fluid locust With the API, 1 hour audio is about 5-10 mins.

How do you know this

#

Oh I see

knotty trail Mar 28, 2023, 10:27 AM

#

dense pulsar Is it better to use offline whisper models or the API? Which would grant me more...

I've transcribed over a million minutes of content with the offline version but I am going to play around with the API today to see how well it works and th en I'll report back

dense pulsar Mar 28, 2023, 10:28 AM

#

knotty trail I've transcribed over a million minutes of content with the offline version but ...

Thanks friend

storm oak Mar 28, 2023, 10:41 AM

#

how much does the large v2 take in vram?

analog pulsar Mar 28, 2023, 12:39 PM

#

I've faced the word-loop issue for Japanese transcription, and found the same issue in the community.
https://community.openai.com/t/whisper-has-looped-in-a-phase/107642

Does anyone know the possible workaround for this?

OpenAI API Community Forum

Whisper has looped in a phase

Anyone get the looped text in the output of whisper? Currently we meet around 5-10% of recording files get the looped output in Chinese speech recognition.

boreal roost Mar 28, 2023, 12:45 PM

#

=-0p97t5§1QW2P0-[

knotty trail Mar 28, 2023, 1:39 PM

#

storm oak how much does the large v2 take in vram?

I believe around 10 GB or so

knotty trail Mar 28, 2023, 1:40 PM

#

analog pulsar I've faced the word-loop issue for Japanese transcription, and found the same is...

Only real solution right now is to use a VAD to strip out non-speech audio, and then transcribe using word level timestamps and rebuild the subtitles. Or you can run your own Whisper implementation and run some custom code which seems to fix it but isn't merged yet.

knotty trail Mar 28, 2023, 1:41 PM

#

dense pulsar Thanks friend

Seems there's more hallucinations in the API than running your own instance

dense pulsar Mar 28, 2023, 1:51 PM

#

@knotty trail Hallucinations like it thinking its saying something when theres silence for example?

How is the speed

knotty trail Mar 28, 2023, 1:55 PM

#

dense pulsar <@566441614369751040> Hallucinations like it thinking its saying something when ...

Yeah exactly. It's quite fast though about 1m processing to 10m transcribed. You can also cut up the audio file and transcribe multiple chunks at once so you could optimize to have it process at basically any speed (could do an hour in 1 minute if you chunk it enough)

#

1
00:00:00 --> 00:00:01
Jaj, premanja na Joj,

2
00:00:01 --> 00:00:03
sijanje na sijanja na joj.

3
00:00:03 --> 00:00:05
Use ove se.

4
00:00:05 --> 00:00:07
Edna, a drugi je.

5
00:00:07 --> 00:00:08
Use.

6
00:00:08 --> 00:00:09
Taj je.

7
00:00:09 --> 00:00:10
Pa, jaj.

8
00:00:10 --> 00:00:11
Jaj, jaj.

9
00:00:11 --> 00:00:12
Jaj, jaj, jaj, jaj.

10
00:00:12 --> 00:00:13
Jaj, jaj, jaj, jaj, jaj.

11
00:00:13 --> 00:00:14
Jaj, jaj, jaj, jaj, jaj.

But for example, this came out of thin air. This doesn't appear when I run it with my own instance, I don't know what settings they have but it's a bit strange.

analog pulsar Mar 28, 2023, 3:58 PM

#

knotty trail Only real solution right now is to use a VAD to strip out non-speech audio, and ...

Thanks! Then it's difficult for me... I hope this non-English transcription issue is found by OpenAI folks. Anyway, thanks a lot! 🙏

knotty trail Mar 28, 2023, 4:40 PM

#

analog pulsar Thanks! Then it's difficult for me... I hope this non-English transcription issu...

I think this will help a lot: https://github.com/openai/whisper/pull/1155

Not sure when it will get merged and implemented in the API though. You can also try cutting the file into smaller chunks that should reduce the chance of hallucination loops. How long into your content does it start to get stuck into a loop?

GitHub

Update decoding.py by FernanOrtega · Pull Request #1155 · openai/wh...

Following the suggestions of @Jeronymous in #914 and #924, it solves the problem of endless loop.

knotty trail Mar 28, 2023, 4:48 PM

#

lethal thorn Figured I could save them a lot of money (and time!) every year relatively easil...

Shouldn't be too tough, just need to install whisper on a GPU server and then pass the CLI param for the word timestamps.

analog pulsar Mar 28, 2023, 4:55 PM

#

knotty trail I think this will help a lot: https://github.com/openai/whisper/pull/1155 Not s...

Thanks again! I've subscribed the PR.
In my typical case, the loop occurs in 10 min audio Japanese transcription.
According to your suggestion, I'm going to try shorten the audio file in prior to transcribe. The attached is the actual loop, just in case.
Thank you very much!

knotty trail Mar 28, 2023, 7:01 PM

#

analog pulsar Thanks again! I've subscribed the PR. In my typical case, the loop occurs in 10 ...

You can use VAD to detect speech in the file and then cut at moments of silence, that's what I just implemented today it works pretty well, then you can remake the SRT/VTT/TXT afterwards

small juniper Mar 28, 2023, 7:19 PM

#

Anyone fine-tuning Whisper? Would adding timestamps to the training data improve accuracy of the fine-tuned model?

high mountain Mar 28, 2023, 8:19 PM

#

Hi I want a chrome/edge extension that allows you to speech-to-text on any text input inside the browser using Whisper. I want the same functionality as Voice In but using Whisper. Is there anything like this?

lethal thorn Mar 29, 2023, 1:34 AM

#

knotty trail Shouldn't be too tough, just need to install whisper on a GPU server and then pa...

Oh, I thought that they hadn't merged any word timestamp features yet? Would you be able to show me an example of this, or the specific documentation of it w/ examples?

#

Sorry for the hand-holding request lol the documentation seems to be all over the place

lethal thorn Mar 29, 2023, 1:40 AM

#

lethal thorn Oh, I thought that they hadn't merged any word timestamp features yet? Would you...

I also have an RTX 3080Ti, if that helps me do this myself 😄

fickle coyote Mar 29, 2023, 8:35 AM

#

will it be possible that whisper would have data when a certain sentence is being said in the future?

knotty trail Mar 29, 2023, 10:24 AM

#

fickle coyote will it be possible that whisper would have data when a certain sentence is bein...

It already has that if you pass -F response_format="verbose_json" \

knotty trail Mar 29, 2023, 10:24 AM

#

lethal thorn Oh, I thought that they hadn't merged any word timestamp features yet? Would you...

It's merged just not deployed to the API servers yet.

proven fjord Mar 29, 2023, 12:00 PM

#

Just wanted to drop by and say that Whisper has been incredible for creating Text To Speech datasets from voice recordings.

knotty trail Mar 29, 2023, 12:21 PM

#

proven fjord Just wanted to drop by and say that Whisper has been incredible for creating Tex...

That sounds interesting, how do you achieve that?

proven fjord Mar 29, 2023, 12:22 PM

#

Well, what you need to train it is a bunch of ca 5 sec audios, which you can get from an audiobook or a podcast, etc. Then you have to provide transcripts in a file like:
filename.wav | This is the transcript of that file

#

one line per file

#

splitting sound into smaller chunks is a oneliner with pydub

#

then writing the file for the transcripts with whisper's API

#

is like 5 lines of python

#

and that's what you need to give something like Tacotron2 for training a voice

#

I've done it with my own voice for testing

#

it is uncanny

autumn bolt Mar 29, 2023, 12:25 PM

#

Are you speak Arabic

peak saffron Mar 29, 2023, 1:12 PM

#

proven fjord I've done it with my own voice for testing

Are you following a guide for this? Sounds neat

proven fjord Mar 29, 2023, 1:13 PM

#

A combination of things, also asking questions to GPT-4. It proved very useful when helping me adapt code for NVIDIA GPUs to M1 Macs

#

I found the guide by FakeYou of big help too

#

but that's for the model training

#

The Fake You discord is pretty awesome

#

This is the FakeYou guide

#

ah! Can't post links

#

ok go to Fake You's discord

#

and you will find it

peak saffron Mar 29, 2023, 1:19 PM

#

Thanks!

lethal thorn Mar 29, 2023, 4:07 PM

#

knotty trail It's merged just not deployed to the API servers yet.

Ahhh okay!

#

Has anyone done a comparison of Whisper's accuracy (both for transcription and for word time stamping) vs Google Speech To Text?

small juniper Mar 29, 2023, 4:24 PM

#

Just wanted to drop by and say that

#

Has anyone done a comparison of Whisper

knotty trail Mar 29, 2023, 8:48 PM

#

lethal thorn Ahhh okay!

I DM'd you btw it might be in Message Requests

grand comet Mar 29, 2023, 9:11 PM

#

Hi everyone, how do you make whisper return a file in .vtt format?

knotty trail Mar 29, 2023, 9:58 PM

#

grand comet Hi everyone, how do you make whisper return a file in .vtt format?

From the API?

spark glade Mar 30, 2023, 2:00 AM

#

Hi, do you know what is the size (MB) limit to import an audio file to convert to text?

knotty trail Mar 30, 2023, 10:11 AM

#

spark glade Hi, do you know what is the size (MB) limit to import an audio file to convert t...

25 MB

copper elbow Mar 30, 2023, 11:06 AM

#

Hey everyone, sorry if this is a stupid question but I've been searching for an answer and haven't found anything. I'm running whisper locally and it's working, but all of the subtitles are uncapitalised? Is there a setting I need to change to enable capital letters?

EDIT: The marked answer here seems to help: https://github.com/openai/whisper/discussions/194

GitHub

No punctuation for the first 75 minutes of the video. What could be...

Hello. I am generating subtitles for my this video : https://www.youtube.com/watch?v=77iDUQd4x90 I have provided the video file directly to the wisher with language en and model large However as ca...

copper elbow Mar 30, 2023, 11:53 AM

#

Whenever I use initial prompts, it's forcing all of the subtitles into 30 second chunks? Is there a way to stop this behaviour whilst still providing an initial prompt? Without an initial prompt it chunks them logically, but I then have my previous error of not having any capital letters

lost isle Mar 30, 2023, 12:42 PM

#

How to seperate & label multiple voices from one Audio file ?

knotty trail Mar 30, 2023, 4:30 PM

#

copper elbow Hey everyone, sorry if this is a stupid question but I've been searching for an ...

I actually saw that for the first time ever on one of my transcriptions today

west vapor Mar 30, 2023, 11:30 PM

#

Hello everyone,
What’s the difference between these two approaches used by whisper to transcribe speech?

grand crag Mar 31, 2023, 1:42 AM

#

ah

autumn bolt Mar 31, 2023, 8:20 AM

#

k

quick bluff Mar 31, 2023, 10:13 AM

#

ok

unique anchor Mar 31, 2023, 12:24 PM

#

Hello everyone, how to use the Whisper API to send binary audio data.

royal eagle Mar 31, 2023, 4:18 PM

#

response_format

fallow needle Mar 31, 2023, 9:19 PM

#

guys there is a way to convert avg_logprob to a confidence that is 0-100%?

wet herald Apr 1, 2023, 4:58 AM

#

Been having slow response times for some hours

wet herald Apr 1, 2023, 4:59 AM

#

west vapor Hello everyone, What’s the difference between these two approaches used by whisp...

Bump

west vapor Apr 1, 2023, 5:46 AM

#

wet herald Bump

What do you mean by Bump?

wet herald Apr 1, 2023, 5:46 AM

#

I was making your message visible again as I am interested in the answer as well

west vapor Apr 1, 2023, 7:22 AM

#

wet herald I was making your message visible again as I am interested in the answer as well

Ahh right👍 . I used both approaches and had different results

night talon Apr 1, 2023, 9:59 AM

#

Does anyone know is the model from official API is the same as open sourced large-v2?

#

I found the result from api is better than mine with self-hosted large-v2

sullen cairn Apr 1, 2023, 11:38 AM

#

Why a Chinese video whisper API response something like before decoding text?

fallow needle Apr 1, 2023, 12:07 PM

#

Guys there is someone that is working in how to obtain confidence???

#

@outer scarab @left stag

knotty trail Apr 1, 2023, 5:52 PM

#

fallow needle Guys there is someone that is working in how to obtain confidence???

Per word?

knotty trail Apr 1, 2023, 5:52 PM

#

night talon I found the result from api is better than mine with self-hosted large-v2

It should be the same I'm guessing. It's possible they use different options (beam_size etc)

fallow needle Apr 1, 2023, 5:53 PM

#

knotty trail Per word?

for segment is ok too

opal steppe Apr 1, 2023, 6:02 PM

#

q

knotty trail Apr 1, 2023, 6:12 PM

#

fallow needle for segment is ok too

-F response_format="verbose_json" \

Pass this for the response_format, if it's not there then it's not available via the API

fallow needle Apr 1, 2023, 6:13 PM

#

knotty trail -F response_format="verbose_json" \ Pass this for the response_format, if it's ...

with verbose i will obtain conf?

knotty trail Apr 1, 2023, 6:13 PM

#

fallow needle with verbose i will obtain conf?

I can't remember. With the standalone module you can get it for word_timestamps I'm not sure about for segments

fallow needle Apr 1, 2023, 6:14 PM

#

ty i ll try

low condor Apr 1, 2023, 10:18 PM

#

Hey guys, is there any user interface for using Whisper to transcribe speech, for people who don't know how to write and execute code?

light shale Apr 1, 2023, 11:02 PM

#

Hi for the last several days, my GPT Plus account has been downgraded despite paying this months subscription twice.

The first payment was my usual monthly fee.

The second payment was an attempt to make a new subscription from the same account due to urgently needing use.

I am now out of pocket, with zero support response.

I cannot even attempt to make a new account in case the money is once again taken without providing me with what I paid for .

I have seen that numerous people have expereinced the same issue.

Is there some sort of offical update to this?

dull cradle Apr 2, 2023, 3:07 AM

#

Ai

sand rune Apr 2, 2023, 6:03 AM

#

1

pure veldt Apr 2, 2023, 8:22 AM

#

low condor Hey guys, is there any user interface for using Whisper to transcribe speech, fo...

Im not from openai team.. but I can see, what you want to get: https://platform.openai.com/account/usage scroll down to language model usage.. exactly every used token there.. if I understand good what is your problem.

OpenAI API

An API for accessing new AI models developed by OpenAI

knotty trail Apr 2, 2023, 8:32 AM

#

low condor Hey guys, is there any user interface for using Whisper to transcribe speech, fo...

You can use www[dot]freesubtitles.ai which is my site that I made so people can use Whisper without coding

forest iron Apr 2, 2023, 1:53 PM

#

hey guys where I can find supported languages in v1/audio/transcriptions api endpoint?

autumn bolt Apr 2, 2023, 2:37 PM

#

copper elbow Hey everyone, sorry if this is a stupid question but I've been searching for an ...

thanks, the initial prompt is new for me! what is the limitation of this?

cinder nebula Apr 2, 2023, 4:05 PM

#

Hello, I have an issue running Whisper API in Python (Jupyter Notebook). I followed all the recommendations I found on GitHub related to the ffmpeg error. I uninstalled ffmpeg and installed ffmpeg-python, and now instead of saying that the module ffmpeg has no input the error says: ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18348\3212043240.py in <module>
----> 1 import whisper

~\anaconda3\lib\site-packages\whisper_init_.py in <module>
9 from tqdm import tqdm
10
---> 11 from .audio import load_audio, log_mel_spectrogram, pad_or_trim
12 from .decoding import DecodingOptions, DecodingResult, decode, detect_language
13 from .model import ModelDimensions, Whisper

~\anaconda3\lib\site-packages\whisper\audio.py in <module>
3 from typing import Optional, Union
4
----> 5 import ffmpeg
6 import numpy as np
7 import torch

ModuleNotFoundError: No module named 'ffmpeg'

#

Any advice?

#

The call before was pip install ffmpeg-python

#

response: Request satisfied, path: \anaconda3\Lib\site-packages\ffmpeg_python-0.2.0.dist-info

#

Hey Martini, did you solve it? I am running into the same issue...

leaden lake Apr 2, 2023, 9:04 PM

#

Hello Team so i have a query related to Whisper

I am trying to integrate Whisper in my nodejs project. To do so i am using this code snippet to make an API call using openai npm package.

 const configuration = new Configuration({
    apiKey: process.env.CHAT_GPT_ACCESS_KEY,
  });
  const openai = new OpenAIApi(configuration);

const transcription = await openai.createTranscription(
        fs.createReadStream(audioFilePath),
        'whisper-1',
        undefined,
        'json',
        0.2,
        'en',
      );

when it comes to passing the audioFilePath i am passing a path where my stream is located, and reading it at the same time.

The code for the file/route includes:

Reading file from user's request.
Passing it to multer and creating a Buffer() from it.
Passing that buffer file to fs.createWriteStream() and then streaming at a file location.
Once the writing is done then reading the file content using fs.createReadStream().
Finally passing that readStream to openai.createTranscription()

I have deployed my nodejs code on render.com. But whenever i try to hit the API route that has this code from my iPhone, then the request is failing with status-code: 400 (BAD REQUEST).

What i am not sure is what exactly am i passing wrongly in the openai API. Because the same API route returns data when i call it from my Desktop browser/ Android Browser.

knotty trail Apr 2, 2023, 10:13 PM

#

forest iron hey guys where I can find supported languages in v1/audio/transcriptions api end...

const whisperLanguagesString = 'af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,hi,hr,ht,hu,hy,id,is,it,iw,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh';

#

From my own code

noble igloo Apr 2, 2023, 11:05 PM

#

uhhhhhhhhhhhhh, im listening to my microphone, no sounds around, no system sounds, getting some data streams in return from whisper when sending anything, but those are not mine

#

oh, im getting it even unprompted

north flame Apr 3, 2023, 2:12 PM

#

cinder nebula Hello, I have an issue running Whisper API in Python (Jupyter Notebook). I follo...

sorry if it sounds too obvious but have you tried "!pip install ffmpeg" (instead of -python)?

cinder nebula Apr 3, 2023, 2:24 PM

#

That’s what I did first. Yes. Did you have a look at the GitHub convo? I did everything - install ffmpeg , uninstall it and installed ffmpeg-Python - none of these options work in my case

#

I think it’s because Python takes the ffmpeg from its cache (which I deleted as well, yet it still does it)

knotty trail Apr 3, 2023, 2:56 PM

#

cinder nebula That’s what I did first. Yes. Did you have a look at the GitHub convo? I did eve...

Take that error and feed it into ChatGPT4 and it should give you different things to try

silk wren Apr 3, 2023, 9:42 PM

#

Is there anybody who has experience with kore.ai?

modern obsidian Apr 3, 2023, 9:48 PM

#

Hey guys, I’m doing a personal project. All the coding part is done for the moment but when I want to run the code and go and see how looks the website, it doesn’t work. The problem is bout the API keys that openai gave me. I wanted to know if anyone knows how to determine if a API key is for chatgpt or DALL-E. if I solve that I will be able to put my 2 different API keys in my JavaScript code and run the code

peak saffron Apr 3, 2023, 9:57 PM

#

Has anyone had any issues with Whisper assuming the wrong language? i have a user who is a native non-english speaker but IS speaking english. The model (through endpoint, Large with language specified as "en") keeps assuming they're speaking their native language. I'm guessing this might be an accent issue? Any ideas or experience?

north flame Apr 4, 2023, 7:52 AM

#

modern obsidian Hey guys, I’m doing a personal project. All the coding part is done for the mome...

I think the OpenAI API key is valid for all services. DALL-E is listed as one of several models in the documentation:
https://platform.openai.com/docs/models

north flame Apr 4, 2023, 8:10 AM

#

peak saffron Has anyone had any issues with Whisper assuming the wrong language? i have a use...

I think it's just the accent. Had a similar experience with automatically generated subtitles on YouTube (posted an example here a few minutes ago, AutoMod muted me for 5min 🙄).

peak saffron Apr 4, 2023, 12:35 PM

#

north flame I think it's just the accent. Had a similar experience with automatically genera...

Dang, well thanks for that!

autumn bolt Apr 4, 2023, 3:42 PM

#

north flame I think it's just the accent. Had a similar experience with automatically genera...

If I remember good.. do you need to drop and transfer to the API.. in chunks, max 5 mins. Maybe in the documentation..?

patent shale Apr 4, 2023, 3:44 PM

#

autumn bolt If I remember good.. do you need to drop and transfer to the API.. in chunks, ma...

There is a 25 MB file size limit to audio files.

autumn bolt Apr 4, 2023, 3:47 PM

#

patent shale There is a 25 MB file size limit to audio files.

True, I checked now, no 5 mins limit. 25Mb ~22mins

meager schooner Apr 4, 2023, 7:16 PM

#

note: if openai's Audio module isn't recognized, run pip install --upgrade openai

#

apparently my package was older than Whisper

trail quiver Apr 4, 2023, 11:39 PM

#

8okay87

autumn bolt Apr 5, 2023, 3:29 PM

#

meager schooner apparently my package was older than Whisper

pip install -U openai-whisper

meager schooner Apr 5, 2023, 7:35 PM

#

autumn bolt pip install -U openai-whisper

that's not the api, thats running it locally

peak saffron Apr 5, 2023, 11:59 PM

#

Has anyone else ever gotten someone else's transcription back from a request?

#

I submitted one that had the audio of "I'm sorry, could you repeat that" and got this (screenshot from my app)

pliant sandal Apr 6, 2023, 10:07 AM

#

No, but I've got some wildly inconsistent translations if the recording has been low. But not as far of as that!

novel tartan Apr 6, 2023, 12:13 PM

#

hekllo

small juniper Apr 6, 2023, 1:44 PM

#

Has anyone else ever gotten someone else

kind kayak Apr 6, 2023, 9:32 PM

#

Is there a way to get back how confident the model is about pieces of the transcription?

For example: I say "Haskell Nix Python"
Output: "Haskell Licks Python"

I'd imagine licks is less confident?

pearl star Apr 6, 2023, 9:34 PM

#

kind kayak Is there a way to get back how confident the model is about pieces of the transc...

Which trabscription function of Whisper are you using?

kind kayak Apr 6, 2023, 9:43 PM

#

pearl star Which trabscription function of Whisper are you using?

whisper-1 @pearl star

pearl star Apr 6, 2023, 9:45 PM

#

Is there a music volume higher than the sounds in your source?

kind kayak Apr 6, 2023, 9:46 PM

#

No but I am a little sick and stuffy nosed today

#

Otherwise, no sound around me

pearl star Apr 6, 2023, 9:46 PM

#

The whisper-1 model can detect it as a subtitle if the sound source has a musical sound that is louder than the voice of the artist. Sometimes it can return an empty result.

pearl star Apr 6, 2023, 9:48 PM

#

kind kayak No but I am a little sick and stuffy nosed today

i had similar issue with python native whisper module. However, I did not encounter such a problem in the whisper-1 model used with the API.

#

Probably the microphone you are using is not good 🙂

kind kayak Apr 6, 2023, 9:49 PM

#

pearl star The whisper-1 model can detect it as a subtitle if the sound source has a musica...

Do these subtitles give me back the confidence level? I dont think I follow

kind kayak Apr 6, 2023, 9:50 PM

#

pearl star Probably the microphone you are using is not good 🙂

Yeah but I'll need to handle that case for my application

#

We unfortunately cant guarantee our users will have good microphones

#

We wouldnt necessarily neeeed to though if we could get how confident each transcription of each word is

pearl star Apr 6, 2023, 9:51 PM

#

Can you share an audio recording containing the same query for me via private message. I would like to test it with my own whisper-1 model.

#

So I can make a guess as to where the error originates from.

autumn bolt Apr 6, 2023, 9:52 PM

#

Hallo

pearl star Apr 6, 2023, 9:55 PM

#

kind kayak Yeah but I'll need to handle that case for my application

07.04.2023 - 00:54:45 : I am listening...
07.04.2023 - 00:54:47 : Speech-to-text translation is in progress...
07.04.2023 - 00:54:49 : Question: Haskell Nix Python
07.04.2023 - 00:54:55 : Haskell, Nix and Python are all programming languages that have different features and uses.

Haskell is a functional programming language that is known for its strong type system and its ability to handle complex mathematical computations. It is often used for building robust and reliable software systems, as well as for data analysis and machine learning applications.

Nix is a package manager and build system that allows developers to create reproducible and portable software environments. It is often used in large-scale deployments, where consistent and controlled environments are crucial.

Python is a high-level programming language that is easy to learn and use. It is widely used for web development, data science, and machine learning applications, as well as for scripting and automation tasks.

While Haskell and Nix can be considered more specialized languages, Python is popular for its versatility and ease of use. Each language has its own strengths and can be used in a variety of contexts, depending on the needs of the project.
07.04.2023 - 00:54:55 : Text-to-speech translation is in progress...
07.04.2023 - 00:56:20 : Completed!
07.04.2023 - 00:56:20 : ---

#

No problem appears.

#

Question: Haskell Nix Python -> whisper-1 model transcription

#

this source @kind kayak

#

and this destination 🙂

kind kayak Apr 6, 2023, 10:01 PM

#

Im saying the speech to text part is not working perfectly

#

We have different voices

#

The error originates from my stuffy nose lmao

pearl star Apr 6, 2023, 10:04 PM

#

Haskell Licks Python

#

I came to the same conclusion with the audio recording you sent.

kind kayak Apr 6, 2023, 10:06 PM

#

Which conclusion exactly?

pearl star Apr 6, 2023, 10:06 PM

#

Waiting for a short time between words can solve the problem. Don't talk in a row.

pearl star Apr 6, 2023, 10:06 PM

#

kind kayak Which conclusion exactly?

Haskell Licks Python

kind kayak Apr 6, 2023, 10:06 PM

#

Serious question, are you a bot? Cuz theres definitely a misunderstanding here

#

never know these days 😄

#

Cuz yes I already knew these things, that the audio track is unclear

pearl star Apr 6, 2023, 10:08 PM

#

You can't get feedback, you have to come up with your own solution 🙂

#

When I spoke one by one in similar misunderstanding situations, I saw that there was no problem.

kind kayak Apr 6, 2023, 10:11 PM

#

Are you a bot though? Genuinely wondering haha, it would make sense for an OpenAI discord

pearl star Apr 6, 2023, 10:12 PM

#

No I'm not a bot 😄

#

No I'm not a robot 😄

#

bots can't make jokes 🙂

kind kayak Apr 6, 2023, 10:15 PM

#

pearl star Apr 6, 2023, 10:21 PM

#

Haskell Linux Python @kind kayak this record

#

A closer result. It looks like there is a problem with your voice 🙂

late hemlock Apr 6, 2023, 11:19 PM

#

Is it possible to get the same level of timestamped text using whisper-api, similar to if you run it locally? If its possible can someone tell me how, its driving me insane!?

pearl star Apr 6, 2023, 11:28 PM

#

late hemlock Is it possible to get the same level of timestamped text using whisper-api, simi...

This is unfortunately not possible. If you use the whisper-1 model with the whisper api, you will get a faster and more accurate response. the native model works incorrectly in sentences containing mixed language structures. Also, the processing time will vary depending on the size of the model you are using and the processing power of your computer.

#

For example, if you are using the local model with the English language and there is a German word in the sentence. The German word is evaluated in English.

kind kayak Apr 7, 2023, 1:44 AM

#

pearl star Haskell Linux Python <@514965735118995478> this record

What did you change/do?

pearl star Apr 7, 2023, 2:19 AM

#

kind kayak What did you change/do?

I put a 1 second wait between the words haskell and nix in the audio recording.

uncut oasis Apr 7, 2023, 8:40 AM

#

发发

teal orchid Apr 7, 2023, 12:55 PM

#

Is anyone facing an issue with the API? Many times the API is taking more than 30 seconds to respond or it times out...

robust dirge Apr 7, 2023, 12:55 PM

#

yes, for me too

#

@autumn bolt: Is there any issue with the API right now?

patent shale Apr 7, 2023, 3:28 PM

#

You can sign up for email alerts for systems monitoring here: https://status.openai.com/

OpenAI Status

Welcome to OpenAI's home for real-time and historical data on system performance.

pliant sandal Apr 7, 2023, 5:25 PM

#

late hemlock Is it possible to get the same level of timestamped text using whisper-api, simi...

The verbose_json format has timestamps 🙂 could imagine that srt or vtt does as well. Does that do it for you?

late hemlock Apr 7, 2023, 5:36 PM

#

pliant sandal The verbose_json format has timestamps 🙂 could imagine that srt or vtt does as ...

How do you get the json file from the API?

pliant sandal Apr 7, 2023, 5:52 PM

#

I'm afraid I'm misunderstanding what you're talking about. I mean - to answer your question I would simply say "call the API", but that seems wrong 😛

dull cradle Apr 7, 2023, 7:19 PM

#

||ohio||

candid junco Apr 8, 2023, 3:18 AM

#

Hey y’all, I’ve been building with Whisper and using Lambalabs GPU for the computation power. Has anyone tried using CPUs instead to transcribe? How’s the accuracy

sullen cairn Apr 8, 2023, 5:15 AM

#

Hi did anyone encounter encode/decode problem when transcript mandarin audio file?
text in response json look like this: \u4e0d\u559d? \u54c8\u56c9\u5927\u5bb6\u597d

pliant sandal Apr 8, 2023, 5:24 AM

#

Looks about right 🙂 That's just unicode characters

frail dew Apr 8, 2023, 10:21 AM

#

Ummm, can someone sort me out, since I can't get Whisper to read my file

#

(Using NodeJS)

#

Should I supply a file path or a buffered file?

sick vault Apr 8, 2023, 10:21 AM

#

The biggest scam, bots are asking to human if you are a bot

frail dew Apr 8, 2023, 10:22 AM

#

sick vault The biggest scam, bots are asking to human if you are a bot

What?

autumn bolt Apr 8, 2023, 4:24 PM

#

sick vault The biggest scam, bots are asking to human if you are a bot

lmao

full gale Apr 9, 2023, 10:13 AM

#

dull cradle ||ohio||

I live in ohio ||its really (no) joke||

frank basin Apr 10, 2023, 12:24 AM

#

full gale I live in ohio ||its really (no) joke||

you spelt survive wrong

tidal canopy Apr 10, 2023, 11:27 AM

#

any good ideas to improve whisper acc for lyirc recognition?

#

currently running lyric extraction via demucs

#

could finetuning help?

mild basin Apr 10, 2023, 1:25 PM

#

Does anyone have any tips for reducing hallucinations when using the translate API? Happens fairly frequently, inserting things like "Thanks for watching" and "Please subscribe" to the end of the text, and repeating phrases over and over

drifting lark Apr 10, 2023, 2:26 PM

#

@onyx pike why did you send me a friend request

frozen spoke Apr 10, 2023, 3:49 PM

#

Can whisper handle multiple languages in one API request?

late hemlock Apr 10, 2023, 4:02 PM

#

I dont think so

surreal flame Apr 10, 2023, 4:50 PM

#

Does anyone have any good diarization projects, they’ve been able to use successfully together with whisper (not the api)

knotty trail Apr 10, 2023, 6:33 PM

#

surreal flame Does anyone have any good diarization projects, they’ve been able to use success...

WhisperX has a pyannote integration

knotty trail Apr 10, 2023, 6:34 PM

#

mild basin Does anyone have any tips for reducing hallucinations when using the translate A...

For freesubtitles.ai I use a VAD to cut out non-speech audio, that fixes the hallucinations, and then I am building up a db of hallucinated text that I match against to remove before I rebuild the srt/vtt from the word timestamps

abstract bolt Apr 11, 2023, 2:29 AM

#

well done friend

mild basin Apr 11, 2023, 3:16 AM

#

knotty trail For freesubtitles.ai I use a VAD to cut out non-speech audio, that fixes the hal...

aah I see, thanks. I'll look into doing some initial audio processing before hitting the API. Would you be happy to share some of the hallucinated text from the database you're building? I can share what I find too

autumn bolt Apr 11, 2023, 3:38 AM

#

@abstract bolt is a bot ban em

idle mica Apr 11, 2023, 4:51 PM

#

hi, after receiving such a warm response to my last tutorial on using the API, I want to share my brand new video

scarlet dune Apr 11, 2023, 6:00 PM

#

What languages does whisper support? Is there a list?

static sable Apr 11, 2023, 6:40 PM

#

Usa colored supplement logo

dense pulsar Apr 11, 2023, 11:57 PM

#

Anyone got an ffmpeg command to properly split a >25MB mp3 file into multiple segments (without cutting out dialogue).

Trying to transcribe large mp3 files with the whisper API in c++ but obviously they have a 25MB limit, and recommend to split it. I can’t find a sure fire way to do this properly.

My commands still cut out the audio during dialogue sometimes.

I also want to remove any decently large periods of silence from the audio preferably

Or if anyone knows a better way to do this let me know

mild basin Apr 12, 2023, 1:37 AM

#

dense pulsar Anyone got an ffmpeg command to properly split a >25MB mp3 file into multiple se...

if you first get the bitrate and duration of the file with ffprobe, you can then calculate how long each segment needs to be to be under 25MB
ffprobe.exe -v error -show_entries format=duration,size -of default=noprint_wrappers=1:nokey=1 file.mp3
will return the duration of the file in seconds and the size in bytes, then you can calculate how long each segment needs to be like this:
segment_duration = (desired_size * file_duration) / file_size
Then you can use ffmpeg to split it into segments of that duration, for example this will split this file into 40 second segments
ffmpeg.exe -i file.mp3 -f segment -segment_time 40 output_%03d.mp3

ffmpeg also can do silence detection and removal, look into the silencedetect filter

#

If you're still finding that it's cutting out the start/end of the audio, you can include a 1 second overlap between each audio snippet if you loop over the file and use the -ss and -t flags to manually adjust the start time and duration of each snippet. I use this approach for transcriptions and it works well

dense pulsar Apr 12, 2023, 2:49 AM

#

@mild basin Thanks. Would you mind sending me your code? I'm curious about one part that i dont know how to do dynamically. Would be faster if i just read your code. Feel free to DM me.

If you dont want to send it thats ok too, i understand 🙂

mild basin Apr 12, 2023, 2:52 AM

#

dense pulsar <@332404454274105344> Thanks. Would you mind sending me your code? I'm curious a...

I'm live-recording a media element on a webpage with Javascript, which is going to be very different from how I think you're doing it. You have the complete recording already as an mp3 file, correct?

dense pulsar Apr 12, 2023, 2:52 AM

#

mild basin I'm live-recording a media element on a webpage with Javascript, which is going ...

Yes i have a full mp3 file that I pass in to my program via the command line

mild basin Apr 12, 2023, 3:00 AM

#

dense pulsar Yes i have a full mp3 file that I pass in to my program via the command line

I'll DM you with some implementation details

dense pulsar Apr 12, 2023, 3:00 AM

#

Means a lot, thanks 🙂

lost isle Apr 12, 2023, 10:05 AM

#

Does anyone have any good diarization

violet zephyr Apr 12, 2023, 10:30 AM

#

Whisper can I use in iPhone ?

tidal canopy Apr 12, 2023, 11:53 AM

#

violet zephyr Whisper can I use in iPhone ?

yes

violet zephyr Apr 12, 2023, 11:54 AM

#

How ?

violet zephyr Apr 12, 2023, 11:57 AM

#

tidal canopy yes

How ? My native language is Urdu and I was searching on internet many people talking about whisper much high accuracy of in transcription, I want to try it but I didn’t get way to try it like chat gpt, I’m not Devloper finding free way to use this service if I can use in my iPhone it will be much easier for me

mild basin Apr 12, 2023, 12:23 PM

#

Minimising hallucinations

tidal canopy Apr 12, 2023, 12:52 PM

#

violet zephyr How ? My native language is Urdu and I was searching on internet many people tal...

check ggerganov/whisper.cpp on github

#

there are also services where you can upload audio and they run the model for you

violet zephyr Apr 12, 2023, 12:53 PM

#

Do you can give me link

tidal canopy Apr 12, 2023, 12:53 PM

#

no because the moderation bot doesn't allow me to post links kappajail

violet zephyr Apr 12, 2023, 12:54 PM

#

tidal canopy no because the moderation bot doesn't allow me to post links <:kappajail:6322649...

Whisper board application I installed it’s free and behind open ai whisper service

knotty trail Apr 12, 2023, 9:58 PM

#

violet zephyr How ? My native language is Urdu and I was searching on internet many people tal...

freesubtitles[dot]ai is my site you can use Whisper for free there

latent prism Apr 13, 2023, 12:06 AM

#

I'm very nearly done building my personal VoiceGPT Android app. The voice recognition innate to Android seems worse than I'd like, and the available voices aren't great.

Can anybody point me in a direction for how to get better speech recognition and more voice options?

#

I gather that Whisper can do both, but (if so) I'm unable to find things like where I can sample the available voices.

pliant sandal Apr 13, 2023, 4:51 AM

#

Whisper is only voice to text, but not the other way round.

rugged thorn Apr 13, 2023, 5:42 AM

#

scaam

latent prism Apr 13, 2023, 9:44 AM

#

pliant sandal Whisper is only voice to text, but not the other way round.

Ah, thanks!

pseudo hemlock Apr 13, 2023, 9:58 AM

#

hi, nice to meet you

primal halo Apr 13, 2023, 12:28 PM

#

Hi everyone ! I've got a weird message error in google collab. I'm trying to use Whisper to transcribe an audio file. I've created an API Key but the google collab tells me the api KEY is incorrect even though it is really not. Anyone already seen this ?

EDIT : i was dumb 🙂

small juniper Apr 13, 2023, 2:34 PM

#

We need a second channel for those of us not using the Whisper API, who are using the model locally.

autumn bolt Apr 13, 2023, 2:46 PM

#

@empty hinge

patent shale Apr 13, 2023, 2:50 PM

#

small juniper We need a second channel for those of us not using the Whisper API, who are usin...

Agreed. I've seen some confusion about this. It's clear the two need to be separate channels.

graceful karma Apr 13, 2023, 4:56 PM

#

Hi guys! I have a question: in python I have a variable with some bytes (of an audio file) and I want to transcribe this file. But if I call the function openai.Audio.transcribe_raw (I call this and not .transcribe() because I don't want to store the bytes in a file) I get this error: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
But the bytes are of an mp3 file.

Anyone with this issue?

eager helm Apr 13, 2023, 8:31 PM

#

Write a C++ program, using function, to calculate the factorial of an
integer entered by the user at the main program.

neon lichen Apr 14, 2023, 12:44 AM

#

graceful karma Hi guys! I have a question: in python I have a variable with some bytes (of an a...

I'm having this same issue

autumn bolt Apr 14, 2023, 1:45 AM

#

graceful karma Hi guys! I have a question: in python I have a variable with some bytes (of an a...

Code?

waxen shale Apr 14, 2023, 2:06 AM

#

I'm trying to automate caption creation as a function, every tutorial/project I'm seeing is using whisper as a CLI. Can someone clarify if I can use it more as a function, passing in the name+path of a file automatically instead of manual user input?

graceful karma Apr 14, 2023, 8:29 AM

#

autumn bolt Code?

transcript = openai.Audio.transcribe_raw("whisper-1", file=audio_file, filename="a")

where audio_file is <_io.BufferedReader>

autumn bolt Apr 14, 2023, 10:53 AM

#

graceful karma transcript = openai.Audio.transcribe_raw("whisper-1", file=audio_file, filename=...

You're missing the filetype

graceful karma Apr 14, 2023, 10:56 AM

#

autumn bolt You're missing the filetype

reading the code of the function transcribe_raw I see this. you mean that the parameter "file" is actually a tuple with the file and the filetype?

#

ops sorry, you meant in the filename. I'll try now!

graceful karma Apr 14, 2023, 11:00 AM

#

autumn bolt You're missing the filetype

you were right!!

slim blaze Apr 14, 2023, 3:18 PM

#

Hi, i'm detecting a student's ability to speak correctly. But now whisper is so good it can even recognize the mispronounced words. Is there a way to make whisper a little less smart ?

small juniper Apr 14, 2023, 4:23 PM

#

Agreed I ve seen some confusion about

autumn bolt Apr 15, 2023, 3:19 PM

#

slim blaze Hi, i'm detecting a student's ability to speak correctly. But now whisper is so ...

do you mean, it's hearing the malformed English and then correctly reforming it? or just detecting accented but correct word use?

small orbit Apr 16, 2023, 2:08 AM

#

Code?

forest haven Apr 16, 2023, 4:41 AM

#

hey guys, new here.
can whisper api provide timestamps?

empty juniper Apr 16, 2023, 5:43 PM

#

For the python API (e.g. transcript = openai.Audio.transcribe("whisper-1", audio_file) ) does anybody know where I might find actual API documentation for the Python objects? The API documentation on the OpenAI website seems to be mostly the REST API. But there's this small example of using Whisper with Python, but then it says give --form attributes to tweak the output and I just don't know how. I find it too hard to guess and tweak through code completion and the cookbook doesn't have any python/whisper stuff in it. Thanks in advance for any pointers!

lunar wadi Apr 16, 2023, 10:33 PM

#

how can i export the result as a srt file? my current line looks like that: result = model.transcribe("full-full.mp3", language="de", fp16=False)

autumn bolt Apr 17, 2023, 4:17 AM

#

.

tall shell Apr 17, 2023, 7:45 PM

#

anyone know what the issue here is?

gentle canyon Apr 17, 2023, 11:25 PM

#

hi guys. i need help to move on my project. I need to connect Whisper API ; GPT-4 API and Google text to speech in flutterflow. Guys, any of you already did this kind of project?

pliant sandal Apr 18, 2023, 9:11 AM

#

you have a space between "--" and "upgrade" in your pip install command. This results in pytube not being installed and subsequently the import fails.

misty pond Apr 19, 2023, 5:05 AM

#

I am working on a project where I receive a URL from a webhook on my server whenever users share a voice note on my WhatsApp. I am using WATI as my WhatsApp API Provder

The file URL received is in the .opus format, which I need to convert to WAV and pass to the OpenAI Whisper API translation task.

I am trying to convert it to .wav using ffmpeg, and pass it to the OpenAI API for translation processing. However, I am getting an "invalid_request_error"

jovial compass Apr 19, 2023, 7:39 AM

#

We are using the Whisper API in our React Native app, and we are encountering the following error:
ERROR Error asking AI: [RequiredError: Required parameter model was null or undefined when calling createTranslation.]

#

the code
const response = await openai.createTranslation({
file: uri,
model: 'whisper-1',
});

remote pier Apr 19, 2023, 1:22 PM

#

i need help with the openai transcribe function in python

import openai

def transcribe(wave_buffer):
    transcript = openai.Audio.transcribe("whisper-1", wave_buffer)
    message = transcript.text
    
    if message is None or len(message.strip()) == 0:
        return None
    
    return message

i am passing a BufferedReader from the memory but im getting an error AttributeError: '_io.BytesIO' object has no attribute 'name', how do i fix this? currently it works if i save the audio as .wav file in the disk but that's very unintuitive since the recording can be large sometimes, how do i fix this?

old tiger Apr 19, 2023, 1:23 PM

#

I know one can use Whisper API to upload an audio file and then receive a text from it. But I want my speech to be translated to text live as I speak. Does anyone the sources/ideas on how one can build his using whisper API specifically?

remote pier Apr 19, 2023, 1:24 PM

#

old tiger I know one can use Whisper API to upload an audio file and then receive a text f...

yeah thats my problem too, looks like it needs to be saved in the disk...?

old tiger Apr 19, 2023, 1:26 PM

#

remote pier yeah thats my problem too, looks like it needs to be saved in the disk...?

If you already have a web application you can design an upload button. But the only useful source I can find as of now is "web speech recognition" which was there a long time ago.

remote pier Apr 19, 2023, 1:29 PM

#

yep, seems like you cant use bufferedreader to transcribe audios, only via files, openai apis are really halfassed

fluid vale Apr 19, 2023, 1:37 PM

#

Hi!, im trying to do some speech to text with whisper in Spanish language, but it misses some keywords, and doesn't understand well the topic. is there a way to add maybe a text dictionary or do some further training in Spanish?

small juniper Apr 19, 2023, 2:02 PM

#

Try splitting audio on silence using PyDub and send small pieces to Whisper API?

remote pier Apr 19, 2023, 2:03 PM

#

yeah but when is he gonna know when to split? like stop the recording at some point

#

its a live recording

small juniper Apr 19, 2023, 2:09 PM

#

I know. Two asynchronous processes/threads: one for reading live audio and splitting on silence, and one for sending and receiving from Whisper API.

neon lichen Apr 20, 2023, 5:42 PM

#

sometimes I get bad transcriptions, like random characters in other languages:

transcript.text කපමාන්මාන්මාන්මාවක් කළුමන්තස්තුතියට අපි කිරීමට කිරීමට කිරීමට කිරීමෙන් කිරීම කිරීමට කිරීමට කිරීම කිරීම කිරීම කිරීම සහ කරයි.
transcript.text 今度は、私はこのような場所で、私は 私はこのような場所で、私は 私は 私は 私は 私は 私は 私は 私は 私は 私は 私は```

Anyone have any idea why this happens?

final apex Apr 21, 2023, 12:04 AM

#

emoji_12

oak sparrow Apr 21, 2023, 10:21 AM

#

neon lichen sometimes I get bad transcriptions, like random characters in other languages: ...

Yes I do type in my native language which is 🇱🇹 and I get answers in 🇵🇱 pepe_lmao

#

and I think it's AI's key words understanding issue.

lapis anvil Apr 21, 2023, 12:39 PM

#

Hi guys, has anyone tried training the pyannote.audio model with their own data from scratch? The results I have gotten for speaker diarization using the pre-trained pyannote.audio model are not so accurate, therefore I thought of training the model from scratch. Anyone with ideas on how to go about this?

patent shale Apr 21, 2023, 3:12 PM

#

neon lichen sometimes I get bad transcriptions, like random characters in other languages: ...

I've only seen it using the API when there are large gaps of silence.

raven gate Apr 22, 2023, 2:18 AM

#

So if an upload is near or at 25 mb, does whisper still transcribe within 10 seconds?

pliant plover Apr 22, 2023, 5:10 PM

#

i am trying to add whisper to my python script but it does not work
Import "whisper" could not be resolved

#

any ideas why?

#

I have ffmpeg installed
and python 3.10.10

lapis basalt Apr 22, 2023, 5:20 PM

#

pliant plover i am trying to add whisper to my python script but it does not work _**Import "...

Unless you are messing with the OpenAI/Whisper github example, then you just use the OpenAI API

pip install openai
then
import openai

pliant plover Apr 22, 2023, 5:21 PM

#

when I did that this came

📎 output.txt

lapis basalt Apr 22, 2023, 5:21 PM

#

`def synthesize_speech(text):
engine = pyttsx3.init()
engine.setProperty("rate", 150)
engine.save_to_file(text, "output.mp3")
engine.runAndWait()

def transcribe_audio(audio_file):
transcript = openai.Audio.transcribe("whisper-1", audio_file)
return transcript["text"]`

#

OK, you have OpenAI installed in python already

pliant plover Apr 22, 2023, 5:22 PM

#

yes and also the whisper as well

#

but python does not seem to recognize it

#

lapis basalt Apr 22, 2023, 5:23 PM

#

the whisper package on PIP is not the same

pliant plover Apr 22, 2023, 5:23 PM

#

lapis basalt `def synthesize_speech(text): engine = pyttsx3.init() engine.setProperty...

aren't I supposed to use these once whisper is imported first

lapis basalt Apr 22, 2023, 5:23 PM

#

The code sample I provided only needs import openai to work

#

https://platform.openai.com/docs/api-reference/audio

pliant plover Apr 22, 2023, 5:24 PM

#

pliant plover when I did that this came

yeah but according to this I have openai already installed

pliant plover Apr 22, 2023, 5:24 PM

#

pliant plover

but this says the opposite

#

I am on Win10 btw

lapis basalt Apr 22, 2023, 5:26 PM

#

have you tried creating an environment for your project in Python, then installing the requirements via pip to that env? maybe there is a conflict in your default setup.

python -m venv MyProjectEnvironment
./MyProjectEnvironment/Scripts/Activate.ps1

pliant plover Apr 22, 2023, 5:26 PM

#

lapis basalt Apr 22, 2023, 5:28 PM

#

Just to clarify, you realize that when you do pip install whisper it installs the Whisper Database package, not anything to do with the Whisper voice api?

#

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

pliant plover Apr 22, 2023, 5:29 PM

#

how am I then supposed to install it to the machine, since I am trying to use EdgeGPT and whisper to create a voice assistant

#

are there any alternatives for voice recognition?

lapis basalt Apr 22, 2023, 5:30 PM

#

I found a package called whisper-openai, but I haven't used it. You can interact with whisper using just the 'opanai' package, I am not sure why that project has you installing 'whisper' package, that's a DB.

#

I will look it up on GitHub and see what its doing, give me a few

pliant plover Apr 22, 2023, 5:30 PM

#

tyt

lapis basalt Apr 22, 2023, 5:31 PM

#

is it this one? acheong08/EdgeGPT

pliant plover Apr 22, 2023, 5:31 PM

#

correct!

lapis basalt Apr 22, 2023, 5:33 PM

#

odd, this one doesn't mention voice. I know AutoGPT has a voice option, but it looks like EdgeGPT does not.

pliant plover Apr 22, 2023, 5:35 PM

#

weird, the thing is almost any vid I looked up uses the same pip install and can simply import it to their code and it finds the module

but in my case, while running python3 it does not find the module, could it be that my IDE is not compatible?

lapis basalt Apr 22, 2023, 5:35 PM

#

Which IDE are you using? I use VS Code on Win10

pliant plover Apr 22, 2023, 5:35 PM

#

same

lapis basalt Apr 22, 2023, 5:37 PM

#

can you share one of those demos, maybe I can glean some insight from it. if it's on youtube, just provide the watch?v=YM3vT65q4tY part of it? (because links are not allowed)

pliant plover Apr 22, 2023, 5:40 PM

#

oh...

#

nvm wait

lapis basalt Apr 22, 2023, 5:40 PM

#

k

pliant plover Apr 22, 2023, 5:40 PM

#

watch?v=HbY51mVKrcE

#

this is the one i am following

watch?v=aokn48vB0kc&t=248s

lapis basalt Apr 22, 2023, 5:43 PM

#

Oh, ok. So he is using the project on openai/whisper GitHub as a python package. one moment

pliant plover Apr 22, 2023, 5:43 PM

#

alright

lapis basalt Apr 22, 2023, 5:46 PM

#

try running these commands and see if the import succeeds after

pip install git+https://github.com/openai/whisper.git
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

pliant plover Apr 22, 2023, 5:46 PM

#

right away

lapis basalt Apr 22, 2023, 5:47 PM

#

This package is a self-hosted version of the whisper API, which is what threw me off.

pliant plover Apr 22, 2023, 5:48 PM

#

for the first pip install this came out

📎 output.txt

#

the second pip Upgrade

this>

📎 output.txt

#

and it still wasn't able to import

#

this is very odd, since almost every video does the same as I do and it works out for them

lapis basalt Apr 22, 2023, 5:49 PM

#

hmm. the video is a year old, maybe change the import to match the package

import openai-whisper

pliant plover Apr 22, 2023, 5:50 PM

#

lapis basalt Apr 22, 2023, 5:52 PM

#

for openai it says missing import, but for whisper it says undefined variable at line 3, character 15. what is that line in the code?

pliant plover Apr 22, 2023, 5:52 PM

#

#

whisper still does not work on PYTHON 3.11 right?

lapis basalt Apr 22, 2023, 5:54 PM

#

the github says "We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.10 and recent PyTorch versions. "

#

so, no it says python 3.10

pliant plover Apr 22, 2023, 5:55 PM

#

hmm actually do not know what I am doing wrong at this point

#

3.10 is what I have

#

also ffmpeg is installed

lapis basalt Apr 22, 2023, 5:57 PM

#

Ok, the example code shows import whisper.

`import whisper

model = whisper.load_model("base")

load audio and pad/trim it to fit 30 seconds

audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

make log-Mel spectrogram and move to the same device as the model

mel = whisper.log_mel_spectrogram(audio).to(model.device)

detect the spoken language

_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

decode the audio

options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

print the recognized text

print(result.text)`

pliant plover Apr 22, 2023, 5:58 PM

#

should I add it to my code or this is just an example

lapis basalt Apr 22, 2023, 5:58 PM

#

it was an example from the openai/whisper github page. You might model your code after it, but at the moment you are still stuck on the failed import.

pliant plover Apr 22, 2023, 5:59 PM

#

yep,

what could possibly be the reason ? I mean I did everything as it was documented

lapis basalt Apr 22, 2023, 6:05 PM

#

Im on Python 3.11.3, so of course it won't even try to install for me lol

pliant plover Apr 22, 2023, 6:06 PM

#

so there is no solution I suppose

lapis basalt Apr 22, 2023, 6:07 PM

#

You could always fall-back to the OpenAI API, but once your free credit is used/expired, you'd have to pay to use it.

pliant plover Apr 22, 2023, 6:07 PM

#

alright. I will look into that

but thanks for helping

lapis basalt Apr 22, 2023, 6:08 PM

#

You're welcome. The whisper api is priced at $0.006 / minute

raven gate Apr 22, 2023, 7:13 PM

#

So if I were to try to transcribe audio that's near 25 mb, using next js within the 10 second timeout limit, would it be enough time to finish the api request?

#

If anyone knows

sly yew Apr 22, 2023, 7:28 PM

#

I don't know what you are asking. I am new to this. I did, however, do a test with Whisper and Nuxt3 and I got it to convert an audio file to speech very easily.

raven gate Apr 22, 2023, 7:29 PM

#

sly yew I don't know what you are asking. I am new to this. I did, however, do a test wi...

I would do the request in a serverless function, giving me a 10 sec time limit. However, if the file size is near or at 25 mb, would it finish before 10 seconds?

sly yew Apr 22, 2023, 7:30 PM

#

I don't know. I'm new here too. Why the 10 sec time limit?

raven gate Apr 22, 2023, 7:30 PM

#

That's how next works with backend serverless functions on a hobby plan

sly yew Apr 22, 2023, 7:33 PM

#

gotcha

#

what host? vercel?

raven gate Apr 22, 2023, 7:35 PM

#

yes

late hemlock Apr 22, 2023, 7:45 PM

#

Will OpenAI ever release whispers API with the ability to get timestamps, like you can on the local run versions ?

sly yew Apr 22, 2023, 7:46 PM

#

late hemlock Will OpenAI ever release whispers API with the ability to get timestamps, like y...

Can I ask why you need that when Youtube already does it? Just upload a video there to get the transcript with time stamps?

#

and it's free

late hemlock Apr 22, 2023, 7:47 PM

#

because im asking for Whisper. as im in the OpenAI discord? I dont care about youtubes ASR

#

Im developing something that needs timestams. anyone useful able to answer me ?

lapis basalt Apr 23, 2023, 2:59 PM

#

There isnt away to get them directly from whisper. They have a version of the API available on GitHub you could try to modify, otherwise use your own code to insert the timestamps using datetime() or something.

#

OpenAI\Whisper on GitHub

mighty sonnet Apr 24, 2023, 5:13 AM

#

sly yew Can I ask why you need that when Youtube already does it? Just upload a video th...

Whisper significantly better than YouTube speech to text

mighty sonnet Apr 24, 2023, 5:14 AM

#

late hemlock Will OpenAI ever release whispers API with the ability to get timestamps, like y...

Did you try prompting for it? A fair bit of control possible.

#

Not sure actual time stamps…it seems very GPT based which isn’t so smart with time…but organization..you can suggest with the prompt

#

I have GPT able to generate YouTube scripts…with time codes…so maybe not impossible

late hemlock Apr 24, 2023, 9:01 AM

#

mighty sonnet Did you try prompting for it? A fair bit of control possible.

When you run whisper on a local machine, you get time stamps for general sentences. The API does not have that capability, it only return the given text in a single string,

polar pivot Apr 24, 2023, 9:03 AM

#

Any way i can change encoding of the output ?

#

Using api

mild basin Apr 24, 2023, 9:09 AM

#

late hemlock When you run whisper on a local machine, you get time stamps for general sentenc...

If you use the verbose_json option for response_format on the web API, it does include some timing information with the start and end values

sly yew Apr 24, 2023, 9:48 AM

#

mighty sonnet Whisper significantly better than YouTube speech to text

I do not notice the difference, to be honest.

late hemlock Apr 24, 2023, 10:42 AM

#

mild basin If you use the verbose_json option for response_format on the web API, it does i...

😮 HOW! can you please dm me a boilerplate for that >?

late hemlock Apr 24, 2023, 10:44 AM

#

mild basin If you use the verbose_json option for response_format on the web API, it does i...

if this works for me, you literally saved my project haha

mild basin Apr 24, 2023, 10:48 AM

#

late hemlock 😮 HOW! can you please dm me a boilerplate for that >?

Add it to the body of the POST request like you do with the other fields, ie.

formData.append('temperature', 0.2);
formData.append('response_format', 'verbose_json');```

late hemlock Apr 24, 2023, 10:49 AM

#

can i DM you ?

mild basin Apr 24, 2023, 10:49 AM

#

sure

tender cave Apr 24, 2023, 10:56 AM

#

@dapper fjord hey bro

woven folio Apr 24, 2023, 1:31 PM

#

is the whisper api better than running whisper locally?

light raptor Apr 24, 2023, 1:40 PM

#

im gay

tepid fulcrum Apr 24, 2023, 1:50 PM

#

I am not

late hemlock Apr 24, 2023, 2:55 PM

#

woven folio is the whisper api better than running whisper locally?

1000%

#

I have a 3090 and It still takes 15 - 20 mins for an hour video.
OPENAI - "We’ve now made the large-v2 model available through our API, which gives convenient on-demand access priced at $0.006 / minute."

that's $0.36 for an hour of video. and you get it back within 30 seconds.

#

Depending on what you technically need. its better in so many way s

woven folio Apr 24, 2023, 3:34 PM

#

late hemlock 1000%

Thanks. But their large model is not better than the open source one though right?

late hemlock Apr 24, 2023, 3:38 PM

#

What open source one ?

#

whisper ??

late hemlock Apr 24, 2023, 3:38 PM

#

woven folio Thanks. But their large model is not better than the open source one though righ...

^^

woven folio Apr 24, 2023, 3:54 PM

#

late hemlock whisper ??

Yeah

late hemlock Apr 24, 2023, 3:54 PM

#

its using the better large-v2 model

#

the best one they have

late hemlock Apr 24, 2023, 3:56 PM

#

woven folio Yeah

again the api is a lot better

woven folio Apr 24, 2023, 6:09 PM

#

late hemlock again the api is a lot better

Here it says that it's the same: https://github.com/openai/whisper/discussions/661

GitHub

Announcing the large-v2 model · openai whisper · Discussion #661

We are pleased to announce the large-v2 model. This model has been trained for 2.5 times more epochs, with SpecAugment, stochastic depth, and BPE dropout for regularization. Other than the training...

late hemlock Apr 24, 2023, 6:11 PM

#

Dude ffs version 2 is the more efficient better one 😄

#

I use the api for business, as well as the local version

#

Id take api over local any day

woven folio Apr 24, 2023, 7:15 PM

#

late hemlock Dude ffs version 2 is the more efficient better one 😄

You're right, but the local whisper also uses v2! And I have a 4090.. so it's great to put it to use no?

polar pivot Apr 24, 2023, 7:59 PM

#

I also have a 4090 that i want to put to use

#

I also have questions about how safe the data in the api is, risk of eavesdropping etc

late hemlock Apr 24, 2023, 8:02 PM

#

woven folio You're right, but the local whisper also uses v2! And I have a 4090.. so it's gr...

Local does not offer the large-v2 model.. how have you got the v2 model?

woven folio Apr 24, 2023, 8:02 PM

#

late hemlock Local does not offer the large-v2 model.. how have you got the v2 model?

They say it does: https://github.com/openai/whisper/discussions/661

GitHub

Announcing the large-v2 model · openai whisper · Discussion #661

We are pleased to announce the large-v2 model. This model has been trained for 2.5 times more epochs, with SpecAugment, stochastic depth, and BPE dropout for regularization. Other than the training...

woven folio Apr 25, 2023, 8:51 AM

#

For anyone interested, I finally managed to make the large-v2 model work on Windows 11. It works great and fully utilizes my 4090! A 22 min audio was transcribed in about 3 minutes (the API does it in like 30 seconds). But it's free (+ some electricity).
Quality is the same since they're the same model.

dull lotus Apr 25, 2023, 9:05 AM

#

лол

teal orchid Apr 25, 2023, 11:01 AM

#

Anyone facing issue with the Whisper API? Suddenly some requests have started failing with the error Invalid file format. Supported formats

smoky chasm Apr 25, 2023, 3:42 PM

#

is chat gpt working now?

patent shale Apr 25, 2023, 4:05 PM

#

you can use this to monitor all OpenAI systems status: https://status.openai.com/

OpenAI Status

Welcome to OpenAI's home for real-time and historical data on system performance.

remote oasis Apr 26, 2023, 11:52 AM

#

What do you guys whisper for mostly? For development or personal use

toxic creek Apr 26, 2023, 2:17 PM

#

Hi! I am really trying every possible way to make the whisper work in NodeJS but no luck.
Always get this error.
My file is a simple m4a but it does not matter, got the same error with mp3 as well.

#

Mayba someone has faced this in the past

polar pivot Apr 26, 2023, 3:20 PM

#

Use gpt4 🙂

dark knoll Apr 26, 2023, 10:06 PM

#

is whisper api down ? i am getting 502 gateway error and errno: -4077, code: 'ECONNRESET', syscall: 'write',
this error

spare kernel Apr 27, 2023, 1:51 AM

#

Hello - anyone found a good windows cli client for whisper local? Ala whisper.cpp. Can’t use api due to a work requirement, and openais version is cpu or cuda only - my understanding is there is some ports that work on gpus generically?

radiant path Apr 27, 2023, 2:59 AM

#

a discord bot with chat gpt simplified

📎 Chatgpt_integred_bot.py

gilded solstice Apr 27, 2023, 9:03 AM

#

are you facing issues with whisper api? i got the error these 2 days: The server had an error while processing your request. 500 {'error'

#

i face this issue frequently today

toxic creek Apr 27, 2023, 9:24 AM

#

toxic creek Hi! I am really trying every possible way to make the whisper work in NodeJS but...

I have managed to hack this in case if someone needs this:

file type is Express.Multer.File I am using NestJS on the server side
working variable works but that reads from disc but I wanted to solve this via uploads
You can convert Multer file buffer into a stream by this: const hackedData = Readable.from(file.buffer)
Very important you need to add a path as well which can be anything BUT needs to have the correct extension. Otherwise openAI throws an error.
Proper variable naming and TS needed of course but at least this is working for me

tidal cliff Apr 27, 2023, 10:31 AM

#

hey wutsss goood

dense basin Apr 27, 2023, 4:17 PM

#

@woven folio hey bro can you help me set this up. I don’t know anything. Just point me in the right direction and I’ll update you with the progress if you have the patience to help a noob with 0 for experience but a lot of drive and willingness

late hemlock Apr 27, 2023, 4:27 PM

#

dont use m4a. use mp3

woven folio Apr 27, 2023, 5:14 PM

#

dense basin <@452975553335525376> hey bro can you help me set this up. I don’t know anythin...

Sure no worries. What you need to know exactly? How to run whisper locally with your own GPU?

sour ermine Apr 27, 2023, 8:02 PM

#

woven folio For anyone interested, I finally managed to make the large-v2 model work on Wind...

Can I dm you about this? Would love a walkthrough!

spare kernel Apr 27, 2023, 9:23 PM

#

woven folio Sure no worries. What you need to know exactly? How to run whisper locally with ...

Hi, I’m very curious as well - did you use openais version or whisper.cpp/some other port?
I had a lot of difficulty getting CUDA going so I ended up using a port on GitHub with native gpu support, I can’t link it but the link is const-me slash whisper

woven folio Apr 27, 2023, 9:25 PM

#

spare kernel Hi, I’m very curious as well - did you use openais version or whisper.cpp/some o...

openai version, works just fine. There are a couple other I wanna try but this is doing the job just fine tbh

spare kernel Apr 27, 2023, 9:33 PM

#

woven folio openai version, works just fine. There are a couple other I wanna try but this i...

Thanks - I will keep persisting with getting CUDA running on windows 11, coming from a web background it’s like starting from square 1.

woven folio Apr 27, 2023, 9:41 PM

#

I made this script for my workloads to run it automatically at night, just import your audio and you get txt outputs: https://github.com/sdevgill/whisper-auto
Install whisper directly from github like it says in the README, then install CUDA 116 or 117, latest NVIDIA drivers, then follow these instructions: https://github.com/openai/whisper/discussions/47 to remove pytorch and install it again cleanly for CUDA

A bit cumbersome but it's the current process on Windows with NVIDIA

spare kernel Apr 27, 2023, 9:44 PM

#

woven folio I made this script for my workloads to run it automatically at night, just impor...

Thank you! Yes I was looking through some other issues and getting stuck . The const-me version I’ve found does have the advantage of working on non -CUDA gpus but the maintainer just did it as a hobby project. I’m also looking at doing live transcription and displaying on a web page - obviously with some kind of buffer but live ish. The idea is we have some maintenance crews using radio comms and if something happens you can quickly look back the last few minutes and get some context etc

#

From playing around with it seems like you can get high quality transcription faster than real time on a rtx 2060 so it seems feasible, perhaps with a minute delay or so with a buffer

woven folio Apr 27, 2023, 9:47 PM

#

spare kernel Thank you! Yes I was looking through some other issues and getting stuck . The c...

Sounds like a cool project! I recommend playing around with the official model to get the hang of it, then start experimenting with the other projects and go from there. Then you can kind of experiment which model works best, you might even be able to pull it off with the large one, though most likely small/medium, they're not as good as the large, but if it's clear spoken English, they're good as well

spare kernel Apr 27, 2023, 9:50 PM

#

Yeah - currently in that playing around stage but have had some brilliant results just pumping through old recorded comms so it’s definitely possible. Also not sure what the infrastructure will look like if I can scale it up to 10-20 live channels

woven folio Apr 27, 2023, 9:51 PM

#

spare kernel Yeah - currently in that playing around stage but have had some brilliant result...

if not now with your own hardware, sooner or later whisper api will get like 10x cheaper anyway

spare kernel Apr 27, 2023, 9:55 PM

#

@woven folio Agree - unfortunately have a biz requirement it has to be sandboxed, for really no good reason a_skull

gilded solstice Apr 28, 2023, 12:43 AM

#

gilded solstice i face this issue frequently today

Hi everyone. Do u face this issue before? How to solve it?

heavy cove Apr 28, 2023, 3:21 PM

#

How do I auto detect when someone is talking, and I should send it to whisper?

ornate urchin Apr 28, 2023, 4:17 PM

#

heavy cove How do I auto detect when someone is talking, and I should send it to whisper?

Use hark.js to detect speech.

heavy cove Apr 28, 2023, 4:18 PM

#

thanks

urban knot Apr 29, 2023, 4:24 AM

#

I used gpt to write code for Pythonista on iOS phone to leverage whisper for transcribing audio. It works when mp3 is around 10 mb but I start getting nothing when file size gets closer to 19 mb. Nothing meaning errors and no transcription. It’s relatively simple use case. If anyone has idea or would look at my code, I can share it here. Thanks 🙏 ((is api support channel correct path?))

queen escarp Apr 29, 2023, 2:30 PM

#

Really sorry if this is a stupid question, I want to use whisper-1 API to transcribe speech in Urdu, but it works only half the time

There is another language Hindi which verbally sounds exactly the same as Urdu, but the characters and script of the language are completely different. The API constantly detects the speech as Hindi and transcribes it into Hindi text

Is there any way to force or hint to the model which language the speech is in, if there are multiple languages that verbally sound the same?

gleaming briar Apr 29, 2023, 2:33 PM

#

Hello, I currently use Open AI Whisper in a application but i randomly came to know about Whisper Jax, How does it work?

paper schooner Apr 29, 2023, 7:22 PM

#

Hello, i'm working in a project about whisper can i get some help?

autumn bolt Apr 29, 2023, 8:55 PM

#

I'm trying to run google colab for openAI's whisper, but I don't know how to make whisper access the file. Any idea on how to do it?
I uploaded the file to google colab, but I don't know what to do to make whisper access it, basically
the colab in question:

elder radish Apr 29, 2023, 9:10 PM

#

Is there a decent way to format the output of whisper? It’s super accurate but the wall of text is hard to work with

fierce moon Apr 30, 2023, 4:04 AM

#

gm

mighty sonnet Apr 30, 2023, 5:06 AM

#

elder radish Is there a decent way to format the output of whisper? It’s super accurate but t...

if you give GPT an example, it can reformat it for you

steep nacelle Apr 30, 2023, 8:18 AM

#

/start

analog cave May 1, 2023, 7:05 AM

#

Hey guys. I am currently using whisper right now, and even though the language I speak is in english and I only use transcribe method not translate, but in response it becomes another foreign language that I don't understand, my guess is that it sounds like Indonesian language or Malaysian I'm not sure. Is it because of my accent? is it maybe I should speak more fluently or more American or British accent so that the response will still be in english? Thank you guys.

whole oasis May 1, 2023, 7:43 AM

#

is whisper available for js?
either using the api or as I package doesn't matter

south lynx May 1, 2023, 12:41 PM

#

Hi all,

does whisper transcribe filler words in english?

elder radish May 1, 2023, 10:55 PM

#

mighty sonnet if you give GPT an example, it can reformat it for you

Hmm... I suppose I could break it into sections and parse it through the API. Good call, thanks.

raven gate May 1, 2023, 11:36 PM

#

analog cave Hey guys. I am currently using whisper right now, and even though the language I...

perhaps

fallow needle May 2, 2023, 9:48 AM

#

Hi guys. I have a long audio, like 1 hour of audio and at 1 point of this audio like in 02:00 there is another person that is questioning the speaker (so the voice is lower). The model stop to transcribe and allucinate adding "... ... .. . . . ... .. ..". What is the option to avoid this problems and just don't transcribe what the model don't understand?

paper schooner May 2, 2023, 11:12 AM

#

fallow needle Hi guys. I have a long audio, like 1 hour of audio and at 1 point of this audio ...

i think u have to do a pretreatment to the audio before the transcription process

autumn bolt May 2, 2023, 1:30 PM

#

Does whisper allow torch compile for pytorch 2.0? Keep on getting the same errors as
https://github.com/openai/whisper/discussions/819

Does anyone know if its working at all? Have tried every page of google and it seems impossible

fallow needle May 2, 2023, 3:10 PM

#

paper schooner i think u have to do a pretreatment to the audio before the transcription proces...

They doesent have already a VAD dector? I think that We can change the VAD configuration with all the configuration that they provide. Am I wrong?

pearl shell May 2, 2023, 5:45 PM

#

Hi, I need to transcribe more than 25 megabytes of audio to text with Whisper multilingual, how can I do that? Thanks

split stone May 2, 2023, 6:05 PM

#

pearl shell Hi, I need to transcribe more than 25 megabytes of audio to text with Whisper mu...

Why don't you try typing "Whisper multilingual audio to text transcription for large files" into the search bar and see what comes up?

pearl shell May 2, 2023, 9:02 PM

#

I try to upload 24 mb to Whisper (minimum required is 25 mb), but it said it too large?

torn drum May 3, 2023, 7:41 AM

#

Hi everyone. How to have the time positions in the audio file for each word that is recognized. Are there solutions for that? Thanks

paper schooner May 3, 2023, 10:03 AM

#

fallow needle They doesent have already a VAD dector? I think that We can change the VAD confi...

you can use different techniques for preprocessing audio data, including noise reduction and audio segmentation, which can improve the accuracy of speech recognition.

acoustic yarrow May 3, 2023, 10:05 AM

#

Do need to apply to get access to whisper API somewhere? I have an openAI account with key however I don’t see any options, or am I just missing something. I see the speech to text are in the API section but yeah then when reading that area it refers to whisper with link, and then I’m directed to the whisper main page where I don’t see any info on how to access it.

fallow needle May 3, 2023, 10:09 AM

#

paper schooner you can use different techniques for preprocessing audio data, including noise r...

do u have experience on that? I have some but sometimes noise reduction can make the audio worst

paper schooner May 3, 2023, 10:14 AM

#

fallow needle do u have experience on that? I have some but sometimes noise reduction can make...

not really ,i'm sorry but i'll see if i could help you with other information

fallow needle May 3, 2023, 10:24 AM

#

paper schooner not really ,i'm sorry but i'll see if i could help you with other information

later I will go deep on all the inputs option that we can change on the model, I think there is something that can help

paper schooner May 3, 2023, 10:38 AM

#

fallow needle later I will go deep on all the inputs option that we can change on the model, I...

good luck with that 🙂

turbid idol May 3, 2023, 12:16 PM

#

why whisperapi so slow?

thorny needle May 3, 2023, 2:41 PM

#

If you guys want to visualize the whisperapi like so check out mabbu.app

autumn bolt May 3, 2023, 6:33 PM

#

Is there any way to have more shorter sequences? Because some of my sequences are like 20 words long and I am using them for subtitles and it doesn't look good. In the transcribe options there are some settings which I couldn't find any information about, I tried changing no_speech_threshold but it didn't work

paper schooner May 4, 2023, 8:33 AM

#

turbid idol why whisperapi so slow?

it depends on your cpu and gpu and the audio file duration, language and the used paramettres

potent meadow May 4, 2023, 1:11 PM

#

Hi guys. Today I decided to move from my machine with large model installed to a script for whisper API. Can I ask you an example on how to use it with remote videos urls?

woven bronze May 4, 2023, 4:47 PM

#

Does anyone know what the companies in the Whisper paper are?

woven bronze May 4, 2023, 4:49 PM

#

potent meadow Hi guys. Today I decided to move from my machine with large model installed to a...

The api accepts only audio files, not URLs:

https://platform.openai.com/docs/api-reference/audio/create?lang=python

Turn the video into an mp3 and provide it as a file to Whisper. If the file is longer than 25mb you must clip it into segments no larger than that and make separate calls.

potent meadow May 4, 2023, 4:50 PM

#

woven bronze The api accepts only audio files, not URLs: https://platform.openai.com/docs/ap...

I usually have mp4, but I hope they don't go over 25 mb otherwise I should convert them to mp3 before

woven bronze May 4, 2023, 5:15 PM

#

potent meadow I usually have mp4, but I hope they don't go over 25 mb otherwise I should conve...

mp4 works. If you need to clip them and don't want the clip to be mid-word or mid sentence, you can use a VAD to clip only when a long enough silence comes.

autumn bolt May 5, 2023, 1:12 PM

#

Can I use whisper to get live Speech to text not a recording?

tender rivet May 5, 2023, 2:05 PM

#

autumn bolt Can I use whisper to get live Speech to text not a recording?

not on the API, but that would be possible if you run it locally

dull swan May 5, 2023, 4:50 PM

#

acoustic yarrow May 5, 2023, 5:51 PM

#

Hey hey! Has anyone incorporated whisper into a chatBot yet. I’m just about to dive in here and wondering if anyone has any pointers they’d like to share or any pitfalls?

coral lagoon May 6, 2023, 2:06 AM

#

I was thinking about working on that but it would cost so much for how I would make it

untold crystal May 6, 2023, 2:24 AM

#

helllooo guys

#

#

can you help me whis this?

#

#

this is the problem

iron wharf May 6, 2023, 4:58 AM

#

autumn bolt Can I use whisper to get live Speech to text not a recording?

there's whisper-typer-tool on github

autumn bolt May 6, 2023, 2:02 PM

#

,.,

untold crystal May 6, 2023, 2:26 PM

#

help people

#

pls

leaden root May 6, 2023, 10:00 PM

#

II

topaz cargo May 7, 2023, 3:46 AM

#

JOHN

iron wharf May 7, 2023, 11:18 AM

#

acoustic yarrow Hey hey! Has anyone incorporated whisper into a chatBot yet. I’m just about to d...

whisper-typer-tool from github

acoustic yarrow May 7, 2023, 12:06 PM

#

iron wharf whisper-typer-tool from github

Thanks for the tip

paper vapor May 7, 2023, 4:13 PM

#

Does anyone know where i can find this user

low rain May 7, 2023, 4:25 PM

#

yes

#

in this server

urban oasis May 7, 2023, 5:03 PM

#

Hello guys 🙂 glad to be here

#

any tips for using whisper for live transcription ?

iron wharf May 8, 2023, 12:01 AM

#

urban oasis any tips for using whisper for live transcription ?

@acoustic yarrow whisper-typer-tool from github

fallen turret May 8, 2023, 8:45 AM

#

@fallen turret
bsusbus

tender rivet May 8, 2023, 2:04 PM

#

Hey, im having an issue with whisper, im trying to force it to run on a language in particular by running it like this: whisper --model large --language pt --task transcribe input_file_here
but it looks like that despite setting --language the AI still picking up some english words and translating while I would like the AI to not do that but instead just use words of that particular language, is it possible?``

small juniper May 8, 2023, 2:08 PM

#

Hi guys I have a long audio like 1 hour

autumn bolt May 8, 2023, 2:10 PM

#

tender rivet Hey, im having an issue with whisper, im trying to force it to run on a language...

wanted by FBI?

bronze garden May 8, 2023, 3:15 PM

#

it's me

untold crystal May 8, 2023, 4:58 PM

#

anyone know how to solution this problem?

#

or whats the meaning?

#

help pls

hollow locust May 8, 2023, 6:05 PM

#

Does anyone have the code to save an audio file to a txt file?

daring compass May 9, 2023, 4:57 AM

#

hollow locust Does anyone have the code to save an audio file to a txt file?

Transcribe?

open portal May 9, 2023, 5:34 AM

#

Willing to pay someone if they can use Chat GTP to develop an app

timber marsh May 9, 2023, 5:58 AM

#

anyone help!
how can I train whisper ?

distant trail May 9, 2023, 8:47 AM

#

untold crystal or whats the meaning?

This looks like something to do with your Python installation

#

also its not an error its a warning

#

you can go to the url mentioned in the error to learn more.

untold crystal May 9, 2023, 12:42 PM

#

i went to the url

#

but i didnt undersatand

#

i study odontology

#

i dont know about programacion

warm hemlock May 9, 2023, 2:03 PM

#

Hey all! i'm having some trouble using the transcription api in a nextjs api route. I get a ERR_FR_MAX_BODY_LENGTH_EXCEEDEED error even though the file i'm sending a 12.5MB .mp3 file. Here's the code i'm using: https://codeshare.io/nzvj9E

I tried with both fetch and the SDK, but i get weird errors for both:
fetch: i get a 'You must provide a model parameter' error even though it's 100% being included in the request
SDK: ERR_FR_MAX_BODY_LENGTH_EXCEEDED even though the file is only 12.5MB.

any advice?

autumn fox May 10, 2023, 5:09 AM

#

she wanna go viral

woven bluff May 10, 2023, 9:14 AM

#

warm hemlock Hey all! i'm having some trouble using the transcription api in a nextjs api rou...

First one seems to be from the front-end, second one, show your code

timber marsh May 10, 2023, 10:14 AM

#

anyone help! Please
how can I train whisper ?

loud sleet May 10, 2023, 12:54 PM

#

Hello guys. I have a question on whisper. I need to make transcription of the call between two people so transcription is presented in the form of dialog. As far as I know whisper and Open AI speech api do not provide those features.

I know that one solution is to divide the file into smaller subfiles for each speaker and then push each file to open AI API, however I'm wondering is there any better or simpler solution for that problem. May be there is already library made for that task.

low rain May 10, 2023, 3:04 PM

#

psst

#

hey

#

im whispering geddit

#

ok im out

amber edge May 11, 2023, 12:01 AM

#

is there a way I can divide the whisper transcription by user? Say I have 2 users talking. Can I convert the transcript and find out who said what?

iron wharf May 11, 2023, 3:16 AM

#

amber edge is there a way I can divide the whisper transcription by user? Say I have 2 user...

search whisper speaker recognition in huggingface

autumn bolt May 11, 2023, 7:20 PM

#

hello how can I fix this openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. isnt API is for free?

tropic knoll May 11, 2023, 7:53 PM

#

autumn bolt hello how can I fix this openai.error.RateLimitError: You exceeded your current ...

same problem

lean grove May 11, 2023, 8:12 PM

#

i want to send data to the whisper api in base64 format, since im using node js, i cant use the "File" type for the file input.

#

anyone know?

ebon blaze May 11, 2023, 10:12 PM

#

Hello, does Whisper collects my transcript data?

languid oyster May 12, 2023, 1:08 AM

#

what exactly is a whisper guys

tough imp May 12, 2023, 3:58 AM

#

so... sometimes I've received weird languages, but this one was a bit odd...

autumn bolt May 12, 2023, 10:07 AM

#

The whisper API is generating subtitle segments that are way too long (6-8 seconds each in some cases); how can I configure it to return shorter segments?

clever prism May 12, 2023, 10:50 AM

#

autumn bolt The whisper API is generating subtitle segments that are way too long (6-8 secon...

How long is the audio file?

frail flax May 12, 2023, 12:36 PM

#

messi or ronaldo

autumn bolt May 12, 2023, 1:14 PM

#

clever prism How long is the audio file?

About 60 seconds

untold crystal May 12, 2023, 9:09 PM

#

--output_format {txt,vtt} its ok this prompt?

untold crystal May 12, 2023, 11:48 PM

#

its says me error

#

pls help boys

iron oar May 13, 2023, 12:52 AM

#

hey yall check out

#

SkmAI (Beta): AI powered Youtube video search tool (Revolutionizing search and content consumption) on the projets sections

#

https://discord.com/channels/974519864045756446/1106744294661951508

swift sparrow May 13, 2023, 6:09 PM

#

How to use whisper and what is does

last void May 14, 2023, 2:09 AM

#

Hi guys!
A big question. You know some app that it's using whisper as a Note-talking-speech App?
A good question someone ask me because wanna write a book just dictating the story.

main current May 14, 2023, 6:55 AM

#

I use it for tracking fitness etc

errant mason May 14, 2023, 5:51 PM

#

Are there any projects with near real-time use of the whisper API? Haven't really worked with audio before but am basically looking to do real-time transcription and text analysis.

sharp parrot May 14, 2023, 5:55 PM

#

ALL hail Kitty

pliant elbow May 14, 2023, 7:45 PM

#

sharp parrot ALL hail Kitty

sharp parrot May 14, 2023, 7:59 PM

#

still moon May 15, 2023, 2:04 AM

#

Also.. hmm. You can go to github.com/jaggzh/whisperpluck for my interface to whisper. See a video linked from the repo page.

#

Only works in linux right now; but, with the scripts, you can assign hotkeys to the scripts (for Start/Stop), or drag them to the desktop, OR, I have my new UI button overlay which is really quite nice, imo.

simple flare May 15, 2023, 7:57 AM

#

how can i learn english？

autumn bolt May 15, 2023, 12:56 PM

#

simple flare how can i learn english？

Duolingo

errant mason May 15, 2023, 5:04 PM

#

still moon Also.. hmm. You can go to github.com/jaggzh/whisperpluck for my interface to wh...

pretty close to exactly what i'm looking for in terms of implementation thanks

still moon May 15, 2023, 6:06 PM

#

@errant mason it was built up over a week from the initial quick hacks I did while on a screen sharing call with my friend (for whom I was writing it). ... Record audio, somehow kill recording process, transcribe, get it into clipboard

gray seal May 15, 2023, 7:10 PM

#

pliant elbow

This image is a fat kitten

pliant elbow May 15, 2023, 7:11 PM

#

gray seal This image is a fat kitten

business chonk

untold crystal May 15, 2023, 7:13 PM

#

untold crystal --output_format {txt,vtt} its ok this prompt?

help

icy orbit May 15, 2023, 11:19 PM

#

does anyone know of a way to have timestamps is this possible?

whole garnet May 16, 2023, 1:40 AM

#

Hi! someone can help me ? I am trying to use the whisper API with the openia node package, trying to send a local file and I'm getting error 400. Someone knows how May I send this ?

paper schooner May 16, 2023, 10:12 AM

#

hello guys, i want to use only whisper and python(libraries ) to transcribe Youtube videos. Any ideas?

vocal mica May 16, 2023, 12:11 PM

#

paper schooner hello guys, i want to use only whisper and python(libraries ) to transcribe Yout...

Lots of videos on this topic

#

Does anyone know how the translation task works? I want to output the translated + non translated transcription without making it transcribe twice, does anyone know how?

main current May 16, 2023, 4:36 PM

#

still moon Only works in linux right now; but, with the scripts, you can assign hotkeys to ...

Any distro? Kali or redhat?

#

Tails?

cold zealot May 16, 2023, 5:28 PM

#

warm hemlock Hey all! i'm having some trouble using the transcription api in a nextjs api rou...

I would try with a smaller file first, to be sure it's not another issue

main current May 16, 2023, 7:22 PM

#

So thanks for the charge openai team just take it next time alerts feel like threats

#

You have all my info I sent it there directly.

#

Since grade school

#

Thank you #AIbuddies

still moon May 16, 2023, 8:46 PM

#

main current Any distro? Kali or redhat?

Shouldn't matter. I might use some bash-specific stuff in there

#

Man. Isn't there some clever way of automatically detecting and removing noise from voice audio?

main current May 16, 2023, 10:45 PM

#

Sure but hiding is silly. If it's digital it's done tbh

#

That's my opinion

#

I'm just a random internet handle

vale sorrel May 16, 2023, 11:06 PM

#

Is it possible to use Pytorch/Whisper with an AMD GPU?

clever ravine May 16, 2023, 11:16 PM

#

Anyone here knows tinygrad discord?

lavish ermine May 17, 2023, 2:48 AM

#

last void Hi guys! A big question. You know some app that it's using whisper as a Note-tal...

Whispernotes app

#

Is the no thread for developers using APi's from chatGPT or GPT?

solemn root May 17, 2023, 3:12 AM

#

languid oyster what exactly is a whisper guys

It's a really accurate transcription service.

solemn root May 17, 2023, 3:13 AM

#

swift sparrow How to use whisper and what is does

You can access it from https://platform.openai.com/playground and clicking the little green mic in the top-right corner.

OpenAI API

An API for accessing new AI models developed by OpenAI

digital surge May 17, 2023, 5:01 AM

#

response = openai.Image.create(
prompt="a white siamese cat",
n=1,
size="1024x1024"
)
image_url = response['data'][0]['url']

still moon May 17, 2023, 7:20 AM

#

I don't get it. Some pages say to dl common voice training set in a specific language, but.. when I pick English the file still has tons of languages in it.

#

https commonvoice.mozilla.org/en/datasets

#

I'm just trying to get something that can help me isolate a patient's voice to prepare labeled training data for fine tuning whisper

#

Their whispy ventilator-breathing voice isn't recognized by anything we've ever tried

#

Maybe the datasets python api let's me be more specific.. still.. 60gb.

stuck cipher May 17, 2023, 8:22 AM

#

Hey does anyone know the python equivalent of

--form file=@openai.mp3 \
--form model=whisper-1 \
--form response_format=text

#

specifically the text response format

stuck cipher May 17, 2023, 8:41 AM

#

nvm it's literally response_format="text"

frail cloak May 18, 2023, 6:04 AM

#

Where can I use whisper?

normal loom May 18, 2023, 11:47 AM

#

hi~~

#

im whispering rn

vale badge May 18, 2023, 5:54 PM

#

is it possible to run the whisper model locally?

long tangle May 18, 2023, 7:09 PM

#

@proper spire need to talk

#

Now

long tangle May 18, 2023, 7:27 PM

#

@proper spire you stole my assets, answer me for goodness sake
Before the DMCA is pushing you up the list

autumn bolt May 18, 2023, 8:24 PM

#

vale badge is it possible to run the whisper model locally?

the new chatgpt app uses it, and it's fast, so yeah you probably can

analog goblet May 18, 2023, 10:18 PM

#

Hello! I'm looking for a developer to integrate Open AI on an ERP-PHP webapp.
Does anyone know a freelancer or programmer with experience in this kind of integration?

sharp parrot May 18, 2023, 10:37 PM

#

small juniper May 19, 2023, 2:02 PM

#

is it possible to run the whisper model

hot patrol May 19, 2023, 7:17 PM

#

Hi! Does anyone know how to run the whisper on python with less VRAM than the requirement?

shrewd tiger May 19, 2023, 8:44 PM

#

hey how it going

#

ember meadow May 19, 2023, 9:37 PM

#

What is this

frozen spoke May 20, 2023, 10:37 PM

#

Does whisper have IPA phonetics support?

upper ice May 21, 2023, 1:00 AM

#

hot patrol Hi! Does anyone know how to run the whisper on python with less VRAM than the re...

You cannot.

limpid sedge May 21, 2023, 6:47 PM

#

ember meadow What is this

Speech to text

autumn bolt May 21, 2023, 8:44 PM

#

m

slate hull May 22, 2023, 12:44 AM

#

how do i fine tune whisper?

#

anyone got whisper working on react native?

slate hull May 22, 2023, 1:14 AM

#

which platform?

hidden helm May 22, 2023, 10:17 AM

#

Waiting for next update to the Whisper API. Specifically the whisper-2 model

pale marsh May 22, 2023, 2:23 PM

#

Hello
Has anyone solved the problem of converting asr whisper to: PytorchScript, ONNX (later TensorRT)?
I need to convert the model for Nvidia Triton: (input - tensor, output - tensor), so ready-made options, such as suggested by huggingface, are not suitable.
Whisper divides the audio recording into segments of 30 seconds, I tried to convert the model with input: mel_segment tensor for 30 seconds of audio, the output tensor: token, which I can then decode into text.
Problems encountered:

convert to PytorchScript via jit.trace: the model remembers the output result of the tensor used in the conversion;
convert to ONNX: incorrect graph saved

autumn bolt May 22, 2023, 7:47 PM

#

what are some great ideas to use whisper as a startup company in the medical and healthcare field

woven quail May 22, 2023, 9:03 PM

#

transcription

onyx belfry May 22, 2023, 9:11 PM

#

hey guys, I am trying to cut my whisper api sends into chunks that are below the limit, but it is making me frustrated. I am a hobbyist.

Can anyone point me to some python code that does this? I want to point to a file and have the code chunk it and send it so that each chunk is below the limit and then just concatenate the resulting strings. I know this is a simple problem, but I am just starting out working programmatically with audio.

remote oasis May 23, 2023, 3:13 AM

#

Same

queen scarab May 23, 2023, 4:18 AM

#

Real-time speech-to-text is kind of a whole different animal. I believe most still save as a file, but much smaller. I have not used whisper but have built another app that chunks the data into 0.5 second chunks, adjusts the dB levels and then puts back together, then chunks it by silence of 0.75 seconds or longer and processes each chunk from there

#

again, not Whisper, but I did use Python

final wolf May 23, 2023, 11:07 AM

#

My script for whisper API doesn't work anymore. I didn't change anything about it, but haven't used it in over a month.

Did something change?
openai.error.InvalidRequestError: 1 validation error for Request body -> file Expected UploadFile, received: <class 'str'> (type=value_error)

I'm just confused because it used to work before

#

The code is quite simple:
`openai.api_key = config.openai_APIKEY

audio_files = sorted(os.listdir("parts"))

transcription_text = ""

for audio_file_name in audio_files:
audio_file_path = os.path.join("parts", audio_file_name)
audio_file = open(audio_file_path, "rb")
transcription_object = openai.Audio.transcribe("whisper-1", audio_file)
print(transcription_object)

part_transcription_text = transcription_object["text"]

transcription_text += part_transcription_text + "\n"`

final wolf May 23, 2023, 11:12 AM

#

onyx belfry hey guys, I am trying to cut my whisper api sends into chunks that are below the...

I have a python script that cuts an audio file into chunks. But I don't have the same script send it to whisper. The files are put into a folder and filenames are numbered, so an other script can loop through the files and concatenate the API responses

📎 breakupaudio.py

#

You could easily expand this script to have the API transcription done as well. I just decided to keep it separate so it would be easier for me to troubleshoot

#

This script is not perfect though, because it doesn't do anything to prevent the audiofile from being cut in the middle of a sentence or word even

snow rover May 23, 2023, 11:47 AM

#

theres suddenly a 50% error rate for whisper API calls now

stark laurel May 23, 2023, 12:21 PM

#

Try this https://github.com/jlonge4/whisperAI-flask-docker

GitHub

GitHub - jlonge4/whisperAI-flask-docker: I built this project becau...

I built this project because there was no user friendly way to upload a file to a dockerized flask web form and have whisper do its thing via CLI in the background. Now there is. Enjoy! - GitHub - ...

onyx belfry May 23, 2023, 1:58 PM

#

final wolf I have a python script that cuts an audio file into chunks. But I don't have the...

Thanks, this moves me in the right direction. One of the issues that I discovered yesterday is that the wav files can be way bigger than the mp3. so even if I calulated the chunk size based on the mp3, the wav file that i sent could still be way over the limit. It was only late last night that i found out about the .export method.

snow rover May 23, 2023, 8:44 PM

#

stark laurel Try this https://github.com/jlonge4/whisperAI-flask-docker

unfortunately the base whisper implementation is too slow for my use case, I'm exploring using faster implementations such as whisper-jax that have lower latency, that i can also deploy myself

remote oasis May 24, 2023, 4:50 AM

#

I am trying whisper-1 API transcription model first time for YouTube videos. It is fast but most of time is spent on downloading/converting the video.

For 1 hour video it took 30mins which too long. Am I missing sth or is what everyone else doing?

Isn’t there some faster way to do it?

small juniper May 24, 2023, 2:51 PM

#

Let me know if anyone compares Whisper to Meta's new MMS model on English.

onyx belfry May 24, 2023, 4:40 PM

#

@remote oasis I would probably use yt-dl and ffmpeg to strip the audio?

remote oasis May 25, 2023, 1:43 AM

#

onyx belfry <@767032309681487873> I would probably use yt-dl and ffmpeg to strip the audio?

I have audio URL, just ffmpeg takes bit time to download and convert but somehow decreased time and now bit faster.

distant ibex May 26, 2023, 7:00 AM

#

what next

still moon May 26, 2023, 7:30 AM

#

@remote oasis not sure about yt-dl, but yt-dlp is fast.. choose an audio-only, not the video (unless you want the video data for some reason)

#

oh, you have the audio url, n/m. But do use the alternate one which simulates being a mobile device and stuff.

#

here's a little bash script I wrote yt-dl-mp3:

#

#!/bin/bash
ytdlbin=yt-dlp
quality=2
url=
help=

for a in "$@"; do
    if [[ "$a" =~ ^-[0-9]$ ]]; then
        quality="${a#-}"
        echo -e "\\033[1mQuality set to: $a (0 is highest)\\033[0m"
    elif [[ "$a" =~ // ]]; then
        url="$a"
    elif [[ "$a" = -h || "$a" = "--help" ]]; then
        help=1
    fi
done
if [[ "$help" = 1 || "$url" = "" ]]; then
    echo "Usage: yt-dl-mp3 [-#] [-h] url"
    echo "Where: -# is compression (-0 is lowest. Default -2)"
    exit
fi

printf "%s " "$ytdlbin" "$url" -x --audio-format mp3 --audio-quality "$quality"
echo "Where: -# is compression (-0 is lowest. Default -2)"
read -p 'Enter to proceed with default (unless you set it)...' -t 5
"$ytdlbin" "$url" -x --audio-format mp3 --audio-quality "$quality"

#

Does anyone know for fine-tuning whisper if we should use noisy audio? I'm recording training data of someone and recording some while the vacuum is on.

#

will it make the model more robust?

still moon May 26, 2023, 9:52 AM

#

Also, I can't find info on if we can provide noise training data

upper dagger May 26, 2023, 12:54 PM

#

go to github and look at the whisper forks. there are some that will do this

small juniper May 26, 2023, 2:26 PM

#

Also I can t find info on if we can

still moon May 27, 2023, 12:15 AM

#

How much training data do we need to fine-tune? The person has a very unique voice. Their pronunciation of various syllables is affected by them being on a ventilator, and they can only speak in very short phrases (for the same reason).

#

i'm manually transcribing for the training data so .. it's a bit tedious (but worth it.. just would like to know how far to carry this) 🙂

#

because me loves her

timber canyon May 27, 2023, 10:03 AM

#

Ahhhhh, frustratin

gritty palm May 28, 2023, 7:51 AM

#

i feel youy

strong stag May 28, 2023, 5:40 PM

#

Question: Is 'google cloud speech to text' better than 'Open ai Whisper?'
Context: Ive been creating a project which is highly dependent on real-time voice transcriptions, Ive had a chance to integrate google cloud api for their Speech-to-text service, its alright but fails to provide accurate real-time transcriptions for some reason. I am running tests from my internal microphone.

#

IMO Google has the ability to create the better of two services as they logically have way more data than open ai, but open ai seems to moving way quicker when it comes to development, implementation, and use case

runic pulsar May 28, 2023, 11:58 PM

#

I'm trying to create a Discord bot with Whisper capabilities, so I need my requests to be async. I am downloading .mp3 files from Discord, then I am trying to upload them to the Whisper API. It isn't accepting my requests, and it returns this error:

{
        "message": "Could not parse multipart form",
        "type": "invalid_request_error",
        "code": null
}

Here is the code I am using:


    async with aiohttp.ClientSession() as session:
        filename = "test.mp3"
        async with session.get("https://cdn.discordapp.com/attachments/1097219558466658354/1112417715316076775/AI_Test_Kitchen_toetapping_footstomping_americana_1.mp3") as resp:
            if resp.status == 200:
                audiodata = await resp.read()
...

                headers = {
                    "Authorization": "Bearer " + openai.api_key,
                    "Content-Type": "multipart/form-data"
                }
                form = aiohttp.FormData()
                form.add_field('file', audio, content_type='audio/mp4')
                form.add_field("model", "whisper-1", content_type="text/plain")


                resp = await session.post(url="https://api.openai.com/v1/audio/transcriptions", data=form, headers=headers)
                print(await resp.text())

                
            await session.close()

Does anyone know why this is?

random yacht May 29, 2023, 4:25 AM

#

baboonalism

last quarry May 29, 2023, 5:30 AM

#

how can i use whisper

#

i am not able to find the api key

#

to use whisper

compact fractal May 29, 2023, 7:50 AM

#

me too

spiral bolt May 29, 2023, 2:41 PM

#

how can i connect my newly created rails app with newly created next js project

remote oasis May 30, 2023, 1:04 AM

#

still moon <@767032309681487873> not sure about yt-dl, but yt-dlp is fast.. choose an audio...

Thanks! I found some other package that has free api but now stopped working but it used to download youtube real fast. I went to the code and analyzed, low and be hold, it is using yt_dlp to download. I am trying that now and probably need deploy my own whisperx, because openai API has limit and slow

fresh jewel May 30, 2023, 11:21 AM

#

heyaa, im sorry, i'm still very new to using openai. Does anyone know how i can make an automatic whisper? By that i mean it always detects input from my mic, and when my mic goes silent, it saves the input and goes back to the detect input state. Is that possible?

small juniper May 30, 2023, 1:52 PM

#

How much training data do we need to

autumn bolt May 30, 2023, 6:57 PM

#

Hey anyone knows how tk create a database sql with chat gpt?

still moon May 30, 2023, 7:19 PM

#

Any guidance on where my time start/ends should be on labeled audio (without me having to download some huge data set just to see a few samples)?

#

still moon May 30, 2023, 8:38 PM

#

(blurred for privacy)

#

#

my fine tuning got down to .02x loss (not sure what loss function was used)

iron oar May 30, 2023, 10:57 PM

#

the best Youtube AI video search tool out there https://discord.com/channels/974519864045756446/1106744294661951508

still moon May 31, 2023, 8:27 AM

#

She speaks in 1 to two words bursts. Can whisper even learn this?

still moon May 31, 2023, 9:03 AM

#

I don't know how to find out except to keep possibly wasting time labeling things .. the fine tuning I did last time was garbage. Like one or two times she made a click sound with her mouth -- I labeled it as "tch".

#

In a subsequent test, of the find-tuned model, whisper transcribed EVERY word of hers as "tch"

autumn bolt May 31, 2023, 11:59 AM

#

How to resolve this error?

small juniper May 31, 2023, 2:52 PM

#

Any guidance on where my time start ends

weary arrow Jun 1, 2023, 12:44 AM

#

Are the large models better than medium.en if I only need english recognition?

marble heart Jun 1, 2023, 3:02 AM

#

amougus

small juniper Jun 1, 2023, 2:31 PM

#

Are the large models better than medium

mossy hemlock Jun 1, 2023, 11:21 PM

#

Has anyone here tried tackling the 25MB limit in NodeJS for Whisper? If so - what's the ideal way about it?

rich hawk Jun 2, 2023, 12:25 AM

#

fresh jewel heyaa, im sorry, i'm still very new to using openai. Does anyone know how i can ...

Ask ChatGPT to write a python script to do that. Specify you want the script to include an RMS threshold to detect speech and trigger Speech2text conversion using Google local speech2text library. If this works, then try OpenAI Whisper API and compar

rich hawk Jun 2, 2023, 12:28 AM

#

mossy hemlock Has anyone here tried tackling the 25MB limit in NodeJS for Whisper? If so - wha...

Why would you want to save and send such a large audio file? At least compress it? Or break it up into smaller files using RMS- thresholding to detect a silence pause.

mossy hemlock Jun 2, 2023, 12:34 AM

#

rich hawk Why would you want to save and send such a large audio file? At least compress ...

I'll be compressing it, but I've to demonstrate video files as large as 512 MB. I'll look into RMS tho.

candid flame Jun 2, 2023, 1:41 AM

#

What true happiness looks like. 38 minute done flawlessly under medium

still moon Jun 2, 2023, 4:14 AM

#

#

Yeah, no that's not what she said 😦

#

:} :/

still moon Jun 2, 2023, 4:18 AM

#

mossy hemlock I'll be compressing it, but I've to demonstrate video files as large as 512 MB. ...

Seems like a weird goal from a sadistic and irrational boss. 😉 Or maybe I'm not understanding the rationality behind it. Cut it up at silence, transcribe, piece back together. The actual technique would be a slight variation on this to account for some issues it introduces, but the solutions are extremely easy to implement.

polar gale Jun 2, 2023, 4:17 PM

#

What am I supposed to set for model? whisper-1 return 404-model doesn't exist

patent shale Jun 2, 2023, 4:50 PM

#

polar gale What am I supposed to set for model? whisper-1 return 404-model doesn't exist

I have got that error rom time to time. It was usually because something else was not right with the API call - especially if it didn't like the MP3 file I was sending - or sending the MP3 in the wrong way.

patent shale Jun 2, 2023, 4:51 PM

#

polar gale What am I supposed to set for model? whisper-1 return 404-model doesn't exist

And - yes, whisper-1 is the only model you can set at this time.

cinder pewter Jun 2, 2023, 7:34 PM

#

Hello guys!

#

Any idea why Im getting this error when trying to connect to CODEGPT with my OPENAI Api Key???

#

Can I use my OPENAI API key with many different services all at once?

#

or would I have to get a new OPENAI Account?

tepid lynx Jun 2, 2023, 7:40 PM

#

Hi! In the chart with models in whisper github there is "Required VRAM" column - the memory listed in this column is the memory required to transcript at a reasonable rate only one audio-files at a time? And if yes - then if I need for example to process via base model ("required ~1gb vram") two audio-files per moment, then I need ~2 gb vram, and if with large model ("required ~10 gb vram") - then ~20 GB for 2 audio-files at the same time and so on? And what will be the speed?

cinder pewter Jun 2, 2023, 7:50 PM

#

please help me guys!!

sinful thunder Jun 2, 2023, 7:52 PM

#

cinder pewter Any idea why Im getting this error when trying to connect to CODEGPT with my OPE...

The error message basically says you have used too much of resources, please wait. As they have limits you have hit.

cinder pewter Jun 2, 2023, 8:04 PM

#

Nooooo but Im getting this error ALWAYS

#

Ive never been able to use the codegpt extensoin, ever

fossil sonnet Jun 2, 2023, 10:45 PM

#

Not sure who to ask .. so here i am .. i am trying to built a website where folks can record voice and it will transcribe and use gpt4 with prebuilt prompt to display a certain output .. which can be copied and pasted somewhere … dont have a big budget with limited coding skills … is there a github link for this .. i can see the code at? Is this possible using google colab?

candid prism Jun 3, 2023, 2:25 AM

#

how to make clock

daring hearth Jun 3, 2023, 6:46 AM

#

if i deployed my own whisper model does the data (the voice coming from my user or the text generated by whisper) go to open ai by any chances?

normal grove Jun 3, 2023, 8:29 AM

#

[For Hire][Full-stack][Blockchain][Mobile][Remote][Full-time]

I am a senior full-stack and blockchain engineer with 5+ years of professional and extensive experience.
Fully understandable at the requirement in a few mins and make a perfect result which makes customers satisfied.

List of my professional skillset
💼 JavaScript(TypeScript) and its frameworks; Node.js (Express, Nest.js), React.js (Redux, Next.js), Angular (Ngrx, Rxjs, v1.0 ~ v9.0), Vue.js (Vuex, Nuxt.js)
💼 PHP (Laravel, CI), Python, Django
💼 React-Native, Flutter, Native Script
💼 Restful API, OpenAI, ChatGPT, Langchain, Pinecone, AWS
💼 Web3 technology, Smart Contract (Solidity, Rust), ERC tokens (ERC20, ERC721, ERC1155, ERC4337), Ethereum, Solana networks

I am always focusing on the product quality first and professional codebase implementing OOP at high level.
I am ready to make your dream true so feel free to contact me anytime.

Best regards

mossy hemlock Jun 3, 2023, 1:23 PM

#

https://github.com/openai/openai-python

Does this library not have parameter for VTT support in Whisper?

GitHub

GitHub - openai/openai-python: The OpenAI Python library provides c...

The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. - GitHub - openai/openai-python: The OpenAI Python library provides convenie...

flat pawn Jun 3, 2023, 9:26 PM

#

daring hearth if i deployed my own whisper model does the data (the voice coming from my user ...

I don't think so

daring hearth Jun 4, 2023, 5:52 AM

#

flat pawn I don't think so

Elaborate please

weary arrow Jun 4, 2023, 3:13 PM

#

daring hearth if i deployed my own whisper model does the data (the voice coming from my user ...

No, as long as you don't make a mistake in configuration/use and end up calling the cloud api instead.

candid junco Jun 4, 2023, 4:32 PM

#

Hey all, what is everyone using here as their computational power? Are you simply using your local environment or using a GPU cloud?

azure axle Jun 4, 2023, 10:09 PM

#

Hey guys! I've been working in the last few months in a project using whisper + gpt that I'm really excited about:

Kaption AI is a Chrome extension that transcribes and summarizes WhatsApp Web audios and chats. No more sifting through lengthy audio messages or struggling to keep pace with group chats - Kaption AI can convert audio into text and summarize long threads at the click of a button.

It's a tool that can be useful for business professionals, students, journalists, and people with hearing difficulties - anyone who wants to make their digital communication more efficient and accessible.

The development of Kaption AI was largely inspired by the groundbreaking work done by OpenAI, and the belief in the transformative potential of AI technology. It's my humble attempt to contribute to the ongoing revolution in our digital interactions.

Although I can't share a direct link here due to community guidelines, you can easily find Kaption AI on the Chrome Web Store by searching for "Kaption AI" in any browser. I would greatly appreciate it if you could give it a try and share your thoughts. Your feedback is incredibly valuable in helping me refine and enhance this tool.

Please rest assured that your privacy and security are of utmost importance. Kaption AI does not collect or store any of your messages. It adheres strictly to the latest security protocols and standards, ensuring it's secure from potential threats.

I'm curious - how do you currently handle lengthy audio messages and chats in WhatsApp? And what features would you be interested to see in future updates of Kaption AI?

Looking forward to hearing your thoughts. Thanks for your time!

daring hearth Jun 4, 2023, 10:10 PM

#

azure axle Hey guys! I've been working in the last few months in a project using whisper + ...

Is it OSS?

azure axle Jun 4, 2023, 10:34 PM

#

Thank you for your question. While I greatly appreciate the value and contributions of open source projects, Kaption AI is currently not open source software. It's part of a commercial endeavor that I'm investing significant time and resources into. However, I'm trying to think of ways to make it transparent for everyone to see that I'm not storing people's conversations or doing anything weird. What do you suggest?

daring hearth Jun 5, 2023, 2:23 AM

#

azure axle Thank you for your question. While I greatly appreciate the value and contributi...

Are you using openai whisper api or have you deployed your own model?

azure axle Jun 5, 2023, 1:20 PM

#

Ive tried both. Using whisper on my own servers and using API. API is cheaper and more reliable

candid junco Jun 5, 2023, 5:10 PM

#

Hey All,

I've ben testing a product i've been working on and GPU Clouds are getting super expensive.

My product focuses on transcription utilizing OpenAI Whisper Base & Large Model. Most GPU Clouds I research are used for ML/Deep Learning and appears really only for visual/graphics/art/video editting etc. I feel as if transcription is a much lower use of computation power than using something like stable diffusion. Does anyone here have insight on what computation is needed for my use case?

I am seriously considering in building out a low end deep learning machine.

Note: Transcription works on my PC when I run the API to my local environment

woven folio Jun 5, 2023, 5:44 PM

#

I wonder how good will Apple's on device speech recognition be compared to Whisper

sacred forum Jun 5, 2023, 8:35 PM

#

Hey everyone, What are all the features that Whisper actually covers.

The info I have on the features is limited and just
Speech recognition
Speech transcription

And I know there are more features. Do anyone have an idea of all the features or like a documentation I can read.

P.S The documentation of whisper on GitHub doesn't list all features.

late hemlock Jun 5, 2023, 10:03 PM

#

woven bronze Does anyone know what the companies in the Whisper paper are?

look at the references

heady iron Jun 6, 2023, 9:54 AM

#

im trying to find the function that stores the probabilities of each words in a set. anyone know where is it?

#

is it in the beam search class?

median abyss Jun 6, 2023, 6:02 PM

#

Hey guys, have you experienced a significant drop in performance after trying to load a HF checkpoint coverted to openai format? If I use ggml c++ conversion tool I cannot load the fine-tuned checkpoint at all... have you had problems with fine-tuned whispers?

slim blaze Jun 7, 2023, 11:21 AM

#

autumn bolt do you mean, it's hearing the malformed English and then correctly reforming it?...

sorry i've been away too long @autumn bolt , it correctly reforms the mispronounsed words. I want it not so smart, it will spot the mispronounced words, or even the syllables, so i can give feedback to students where they got it wrong/ a little wrong

autumn bolt Jun 7, 2023, 11:56 AM

#

slim blaze sorry i've been away too long <@456226577798135808> , it correctly reforms the m...

I don't think you can make it less good, you could try adding noise to the audio and experiment with results, really you need an LLM bolted to the speech to text system, then you could ask it to tell you if there was any accent... I think that is some time away though. Would be an interesting R&D project.

slim blaze Jun 7, 2023, 12:01 PM

#

yep, not just accent you know. For example with Indian / Spanish accents, it's still comprehensible, but there are other parts of the world they completely mispronounce its, like randomly obmitting syllables, but because the training data is so good, whisper can still convert it to (surprisingly) correct word. This is good in most of the cases, but not good in training English for them

blazing cave Jun 7, 2023, 12:03 PM

#

mossy hemlock Has anyone here tried tackling the 25MB limit in NodeJS for Whisper? If so - wha...

https://cloudconvert.com/ have a decent api. Can get mp3 to 10% in many cases. Use qscale=8, ch=mono, bitrate=16

CloudConvert

File converter service - more than 200 different audio, video, document, ebook, archive, image, spreadsheet and presentation formats supported.

paper schooner Jun 8, 2023, 11:01 AM

#

someone knows where to download deferent wave audio files of 1 second to train a model?

heady iron Jun 8, 2023, 2:22 PM

#

paper schooner someone knows where to download deferent wave audio files of 1 second to train a...

i suggest you may use audios for movies

#

because you have subtitles and the exact timestamps

vapid horizon Jun 8, 2023, 3:52 PM

#

mossy hemlock Has anyone here tried tackling the 25MB limit in NodeJS for Whisper? If so - wha...

You got any solution on this? @mossy hemlock

#

How can I get the word level transcriptions from whisper in node? I think there are options available in python but not able to find anything in Node JS!

Some of the output given by whisper is too long in a single timestamp.

leaden turtle Jun 9, 2023, 3:28 PM

#

How do i use Whisper?

granite aspen Jun 9, 2023, 3:43 PM

#

We are looking for an OpenAI fine-tuning expert.
You must have experience with this.
Don't worry about your budget.
Your good skills are needed.
If you are really an expert please contact me.

fringe karma Jun 9, 2023, 4:29 PM

#

Can I use my API key credits with whisper so the transcription occurs remotely instead of on my PC? I don't have a dedicated GPU so it takes soooo long to get a transcription. I would like to see if I can speed it up at the cost of some credits

odd mirage Jun 9, 2023, 7:05 PM

#

What directory would I find the downloaded models (.pt files) in for Windows? I want to delete and reinstall the large library. This was the ubuntu solution: https://github.com/openai/whisper/discussions/762

GitHub

Remove Downloaded Model · openai whisper · Discussion #762

Hello, I'm finding Whisper amazing (Thanks OpenAi!). I have a doubt, if anyone can enlighten me. I downloaded the "model large" and my computer is not able to run it, when I run the c...

odd mirage Jun 9, 2023, 8:07 PM

#

Found the directory in Windows for the model library: C:\Users\{YourUserHere} \.cache\whisper

fringe karma Jun 10, 2023, 12:30 AM

#

Found the solution. Work very quickly now!

queen solar Jun 10, 2023, 2:02 AM

#

fringe karma Found the solution. Work very quickly now!

And what is your solution?

past solar Jun 11, 2023, 6:05 AM

#

fringe karma Can I use my API key credits with whisper so the transcription occurs remotely i...

I thought getting it generated through the API was the only option.... thats how i've been doing it. I got a node.js code example if you want..

normal grove Jun 11, 2023, 4:52 PM

#

[For Hire][Full-stack][Blockchain][Mobile][Remote][Full-time]

I am a senior full-stack and blockchain engineer with 5+ years of professional and extensive experience.
Fully understandable at the requirement in a few mins and make a perfect result which makes customers satisfied.

List of my professional skillset
💼 JavaScript(TypeScript) and its frameworks; Node.js (Express, Nest.js), React.js (Redux, Next.js), Angular (Ngrx, Rxjs, v1.0 ~ v9.0), Vue.js (Vuex, Nuxt.js)
💼 PHP (Laravel, CI), Python, Django
💼 React-Native, Flutter, Native Script
💼 Restful API, OpenAI, ChatGPT, Langchain, Pinecone, AWS
💼 Web3 technology, Smart Contract (Solidity, Rust), ERC tokens (ERC20, ERC721, ERC1155, ERC4337), Ethereum, Solana networks

I am always focusing on the product quality first and professional codebase implementing OOP at high level.
I am ready to make your dream true so feel free to contact me anytime.

Best regards

static grove Jun 12, 2023, 6:06 AM

#

cinder pewter Can I use my OPENAI API key with many different services all at once?

you can use apikey in different services,your error is your apikey is not right.

desert mist Jun 12, 2023, 12:14 PM

#

hi, I try to use whisper api in order to make subtitles for a video. The problem is, that there are 2 languages used in the video and in the generated transcript, I get all the text in first language and all the text in second language gets translated to the first one. Is there any way to keep the languages as in the audio?

ember ridge Jun 13, 2023, 12:33 AM

#

can I use openai whisper as a live speech-to-text? As in, it gets what I say from the mic and it transcribes it?

vast tendon Jun 13, 2023, 10:13 AM

#

Hello im a little noob with whisper, i have a question, i need to transcribe a 9 hours of a video and i need divide in segments of 30 second, and my question its its better transcribe the 9 hours at one, or transcribe de for example 1100 files of 30 seconds?

#

or exist a more efficient model?

#

i using a R7 5700U bcs I'm not in mi home and i using my laptop

small juniper Jun 13, 2023, 2:06 PM

#

Hello im a little noob with whisper i

tawdry glade Jun 13, 2023, 6:54 PM

#

can i have iformation on langchain pls go prv

fast chasm Jun 13, 2023, 7:23 PM

#

Any updates regarding this post on github?

#

https://github.com/openai/whisper/discussions/466

GitHub

Arabic and Dialects · openai whisper · Discussion #466

Hello and thanks for this incredible piece of awesome. I've tested it out on actual "traditional" Arabic and it seems to work great with Large. However, the AI seems to have no unders...

#

I've been debating buying a mid-tier PC for Whisper as I transcribe Arabic as freelance, but I transcribed some videos using Whisper, model base, and the results were a complete mess that I was better off doing it manually

grand plaza Jun 14, 2023, 3:25 AM

#

Hey I have been trying to get whisper to work on a raspberry pi, but when I try install it, it fails because it depends on torch? (using the api) and with python integration

cerulean flint Jun 14, 2023, 8:40 PM

#

Hey there, i am looking for a way to format the .txt exports after i run the files through whisper - has anyone found a nice way to to it? don't need necessarily the timestamps, but a more readible way would be nice. For now i use an online tool for auto line break - would like to have it in my code - i run whisper via GoogleCollab

velvet helm Jun 14, 2023, 11:46 PM

#

cerulean flint Hey there, i am looking for a way to format the .txt exports after i run the fil...

Does Whisper support awk? I think Python has a ‘prettier’ function call.

cerulean flint Jun 15, 2023, 11:38 AM

#

velvet helm Does Whisper support awk? I think Python has a ‘prettier’ function call.

i guess not, but i've already a small Python set around it for auto-transcribe all my mp3s in a specific GDrive Folder.. will look into awk, thanks

velvet helm Jun 15, 2023, 1:50 PM

#

cerulean flint i guess not, but i've already a small Python set around it for auto-transcribe a...

Let me know if need help. I’m fluent in Linux, AWS and some DevOps.

hot river Jun 16, 2023, 4:01 AM

#

Anyone else get a totally random text back from whisper? Like it kinda sent the wrong response? :p

#

A whisper version of hallucinations?

#

It's very, very rare but it happens

jovial compass Jun 16, 2023, 7:01 AM

#

does anyone know how to use whisper in nodejs?

paper schooner Jun 16, 2023, 8:47 AM

#

hello every one can someone please help me with my model, i'm actually working on an AI model that gives keywords prediction (existance) in audio files, i trained my model with 30000 audios of 1 second the metrics shows that the model training was great actually but i'm still getting errors in the predictions, HELP PLEASE !

jovial compass Jun 16, 2023, 9:34 AM

#

Please someone help me with this below code error

const filePath = path.join(__dirname, '../../', 'temp.mp3');

    const formData = new FormData();
    formData.append('model', model);
    formData.append('file', filePath);
    axios
      .post('https://api.openai.com/v1/audio/translations', formData, {
        headers: {
          Authorization: `Bearer ${process.env.OPEN_AI_KEY}`,
        },
      })
      .then((response) => {
        console.log(response.data);
      })
      .catch((err) => {
        console.log(err.response);
      });

#

Error:

data: {
    error: {
      message: '1 validation error for Request\n' +
        'body -> file\n' +
        "  Expected UploadFile, received: <class 'str'> (type=value_error)",
      type: 'invalid_request_error',
      param: null,
      code: null
    }
  }

#

Can you please resolve me this error

paper schooner Jun 16, 2023, 11:53 AM

#

jovial compass Can you please resolve me this error

const fs = require('fs');
const path = require('path');
const FormData = require('form-data');
const axios = require('axios');

const filePath = path.join(__dirname, '../../', 'temp.mp3');

const formData = new FormData();
formData.append('model', model);

// Read the file as a stream
const fileStream = fs.createReadStream(filePath);
formData.append('file', fileStream);

axios
.post('https://api.openai.com/v1/audio/translations', formData, {
headers: {
Authorization: Bearer ${process.env.OPEN_AI_KEY},
...formData.getHeaders(), // Include the necessary headers for FormData
},
})
.then((response) => {
console.log(response.data);
})
.catch((err) => {
console.log(err.response);
});

paper schooner Jun 16, 2023, 11:54 AM

#

jovial compass Can you please resolve me this error

Replace 'model' with the appropriate value for your translation model, and ensure that the temp.mp3 file exists in the correct location

jovial compass Jun 16, 2023, 11:58 AM

#

Thanks @paper schooner

mossy hemlock Jun 16, 2023, 12:39 PM

#

I know many people here splice and feed input to Whisper for longer videos / audios - but how do you deal with the subtitles being separate for each chunk? How do you combine them properly so they flow well with the final video you transcribed for?

plucky palm Jun 16, 2023, 2:42 PM

#

whats the best whisper model to quickly and accurately transcribe long-form audio

small juniper Jun 16, 2023, 4:38 PM

#

I know many people here splice and feed

#

Anyone else get a totally random text

#

whats the best whisper model to quickly

simple latch Jun 20, 2023, 7:36 AM

#

Does anything like whisper exist that can transcribe an audio file of an unknown language directly into phonetic notation, something like X-sampa or IPA? Or maybe something simpler that only recognizes which language is spoken in a given audio file?

dull sable Jun 21, 2023, 12:40 AM

#

I have a use case which involves creating transcriptions of long audio files (about ten minutes each) which include unpredictable, brief events of loud, non-speech audio, usually no more than 30 to 90 seconds in length. Whisper seems to stop transcribing at the first of these (usually with hallucinations), so most of my results are only the first half of the audio.

Is there any way around this? I would like to continue using the hosted API as opposed to running the open-source Whisper, though I understand it can be made more tolerant of this situation. Preprocessing is quite difficult as the audio levels are basically constant, but I'm open to ideas. The best one I have at the moment is to split the files based on brief silences which imply sentence breaks, but sometimes speech briefly overlaps with the non-speech sound, so I lose content.

signal hearth Jun 21, 2023, 3:23 PM

#

Unleash your app idea with our Flutter development services for just $29 and make it a reality! DM us now!

unreal otter Jun 21, 2023, 8:14 PM

#

dull sable I have a use case which involves creating transcriptions of long audio files (ab...

I'm playing around with whiper and currently using Staplerfahrer Klaus from YouTube. I'm running whisper locally, but have used run it at least once using the API. The ranscription from the API seems to be good. Speech-to-text is in beta and the API version is limited in what options can be tweaked compared to running it locally. You don't happen to have a sample of a file that fails?

pallid pond Jun 21, 2023, 9:00 PM

#

Hi, is it possible to change sliding window interval from 30 seconds to something smaller?

dull sable Jun 22, 2023, 1:19 AM

#

unreal otter I'm playing around with whiper and currently using Staplerfahrer Klaus from YouT...

Not exactly, though it seems I can get the same result from the latest DankPods YouTube video. Transcription ends around 5:30.

olive sail Jun 22, 2023, 2:08 PM

#

birthday wish for a work colleague

pine palm Jun 22, 2023, 4:09 PM

#

anyone faced timeouts when reaching out to whisper API ?
locally it works great, but when my server is deployed on remote environment (e.g AWS) I get consistent timeouts ..

amber sparrow Jun 22, 2023, 5:23 PM

#

Hi, is it possible to integrate whisper to expo react native application to get real time transcript?

simple latch Jun 22, 2023, 10:39 PM

#

simple latch Does anything like whisper exist that can transcribe an audio file of an unknown...

@small juniper Hi, seem to be quite knowledgeable about Whisper, and I saw your emoji response to my question. If you have any insights/thoughts, I'd be happy to hear them!

cerulean flint Jun 23, 2023, 5:41 AM

#

dull sable I have a use case which involves creating transcriptions of long audio files (ab...

maybe try the adobe Enhance Speech to refine your audio?

cerulean flint Jun 23, 2023, 5:42 AM

#

velvet helm Let me know if need help. I’m fluent in Linux, AWS and some DevOps.

btw i've tried a bit and have now a sufficient Python code to give me at least a basic format.. enough to be able to read the .txt fast and transfer any information into my Obsidian Vault