#gpt-realtime

1 messages · Page 2 of 1

lilac karma
#

I have no idea for that

autumn bolt
#

well seems like its working now

#

but when i type ''whisper'' should be appearing something

#

showing that its installed..

#

huh..

#

actually the pc is still thinking, quite slow god

#

have you tried running it in collab?

#

you can run the large model and not have to worry about your GPU crying

autumn bolt
#

i am quite average pc user

autumn bolt
#

explains things so well

#

since its here, its installed?

#

try running it

#

now you just need a path for the transcription

#

transcription folders

#

its simply not running through cmd, i will trying to figure it out how to use this collab now, one step at time

#

i am trying*

autumn bolt
#

here?

#

yep

#

ok

autumn bolt
#

hey
!whisper "name.mp3" --language pt --task transcribe --model medium
is this the code to transcript an audio in portuguese?

#

first time doing this

#

idk if the format is right

trim rampart
#

Is there anyone from OpenAI here who can tell me why does Whisper transcribe Bengali as Hindi? Does Whisper not support Bengali?

autumn bolt
#

API discussions, We Are OpenAI, my guy.
developers piling as a community playing and improving the tech

autumn bolt
#

do you have access to chatgpt still?

#

or is it buggy for you too?

#

yes i have

#

subscription

#

let chatgpt read the whole command list

#

and ask it for what you need

trim rampart
#

@autumn bolt I was trying the Whisper API from the OpenAI playground. I want ASR with multilingual support.

autumn bolt
#

idk if this working or just slow

autumn bolt
#

this is*

autumn bolt
autumn bolt
#

you need python notes

autumn bolt
# autumn bolt

why not run the large one?
you've got access to a cloud gpu that has 16Gb of RAM
and 50Gb of VRAM

autumn bolt
#

smaller score is better @trim rampart see the chart.. but around 99 languages supported in the ASR too..

#

you can get more info in the github repo

autumn bolt
#

have you not compared the quality?

#

that is the official chart from OpenAI

trim rampart
#

@autumn bolt Bengali/Bangla is not in the list. It is one of the most spoken languages.

autumn bolt
#

_>

#

whisper transcribes audios from video files? or has to be audio files?

#

go try it and compare the transcription quality

XD

autumn bolt
autumn bolt
#

doesnt do anythiing in 7 minutes, geez

autumn bolt
autumn bolt
#

Normally if you would like to run ASR, do you need for normal speed at least 20Gb ram free. x1 == large model

autumn bolt
#

this is needed

#

@autumn bolt still running?

autumn bolt
autumn bolt
#

i am using this to transcript: !whisper "exemplo.mp3" --language pt --task transcribe --model medium

but how can i transcript just a specific part of the audio? like from 3:00 to 10:00

reef python
#

Hi, I keep getting 400 errors when trying to use whisper API for translation with language ja - what am I doing wrong>

gray wigeon
#

Hi all 👋
I'm wondering if anyone had experienced an issue where whisper seems to try to translate instead of transcribing?

#

I just said in English "how is that possible"
and it transcribed it as "Kako je to možno?"
which according to google translate is Croatian

#

I can reproduce it with the audio

#

if I trim the silence at the start it transcribes properly

autumn bolt
autumn bolt
#

But Im not from the OpenAI team...

mortal plover
pure veldt
versed scroll
#

So is it possible to have like an audio in Russian and have the transcript in like Spanish instead of English?

#

Or even have like an audio in Spanish and have the transcript/result as Spanish too, for like deaf people

tough tangle
#

hi ¯_(ツ)_/¯

#

(╯°□°)╯︵ ┻━┻

autumn bolt
peak saffron
#

Has anyone managed to get microphone recording in Safari or Firefox to work through the Whisper API?

#

I have it working well on chrome but never the other two

split palm
#

Does Whisper work both ways? TTS and STT

barren compass
#

no

sonic mango
#

Hello Everyone, does anybody know if whisper is working on diarization ?

pure veldt
#

That can transcribe and do diarization.

rough badge
#

hey guys

#

I want to work with open Ai to stress test it

#

our team has found many bugs and wants to help

pure veldt
#

Bug-reports channel

autumn bolt
#

has anyone tried applying for a transcription gig with whisper here?

#

I'm trying to install Whisper with pip3 install git+https://github.com/openai/whisper.git on MacOS and I'm getting "error: metadata-generation-failed" anyone know how I can fix this? Here's the full output when trying to install https://pastebin.com/qGGvgKaE

bronze delta
#

Hi everyone! I am trying to use Whisper via API to transcribe an audio recorded in the browser.
The way to the server (which is a NodeJS express backend) works. I have used both OpenAI's npm lib as well as talking directly to the OpenAI endpoint https://api.openai.com/v1/audio/transcriptions.

I always get a 400, bad request back and I can't work out why. I am sending the file as webm and via mulitpart/form-data

The function is in the screenshot

Any ideas? Many thanks in advance for an hint!
The file is in the Blob format /* let blob = new Blob([Buffer.from(audioFile, 'base64')]); */

lone orchid
#

Does Whisper have a recording length restriction? Im having issues where its not translating the entire file im sending it

#

And its under 25mb

faint cloud
#

I know that Whisper can produce either a transcription into input language, or translation into English. Is there a way to produce both in one go? That is, feed audio file to Whisper and get 2 transcripts as output, one in the original language, and one in English?

lone orchid
#

Not on the same call

#

You would have to do two seperate calls, one to translation and one to transcription

hybrid surge
austere nimbus
#

Hi there, I'm thinking about implementing Whisper with MP4s or MP3s. Can anyone tell me how long it takes in seconds for a result? For let's say 30 words.

faint cloud
faint cloud
austere nimbus
peak saffron
#

2 sometimes

whole tundra
#

how can i get to the whisper website

wet herald
#

API requests getting timeout?

#

Anyone?

patent shale
#

Seeing timeout on transcription

#

translation is working as expected

wet herald
#

Thanks for the info 🙂 Getting timeout on transcription too

gritty galleon
wet herald
#

Audio to text converter / translator

wet herald
#

is there a .tflite or .mlmodel large-v2 model available?

slow urchin
#

What is the status

pure veldt
#

You could try out. Large-v2 works, I tested.

bold orchid
#

what do you eat today

knotty trail
#

Does anyone know where I can read docs on silero vad? Trying to fix Whisper hallucinations

compact wind
#

Whisper AI api is regularly returning Welsh text when I give it english voice input. Why is it also doing a translation? (If I hand the text to ChatGPT and ask it to identify the language and provide an English translation it is identifying it as welsh and giving the essence of what I said)

compact wind
knotty trail
#

But you can read in the Whisper paper that a lot of their 'Welsh' data was actually just English but was labelled as Welsh, I'm guessing that's the source of the issue

compact wind
# knotty trail Are you setting the language as English?

Possibly not. It might be set to auto. I'm using an ios shortcut I found on the internet - I've not found the API docs myself to cross reference. It's strange it's auto detecting english and then translating to welsh though. Most of the time it returns english.

The api request appears to be posting to /v1/audio/transcriptions, with a request body containing model: whisper-1 and the audio file. No other optional parameters.

knotty trail
#

Just a random guess it has something to do with this.

compact wind
#

Looks likely. Do you know what parameter I need to set to inform whisper that the input is English? It's becoming annoying to dictate minutes of off the cuff thinking, and have it come back in Welsh! (With no recording to fall back on) 😉

#

I looked online for the API docs, but the web site just refers me to discord - hence my question.

knotty trail
#

(Hope I'm not breaking TOS by posting that feel free to tell me if I am mods)

compact wind
#

ios has a built in WYSIWYG shortcuts system for building extensions. My shortcut is triggered from SIRI when I say 'convert with whisper', captures some audio, calls the whisper API, and then shows the text and copies it into the paste buffer. Very useful.

knotty trail
#

Nice. If you can figure out how to pass the language that will probably fix it. Though I have had a lot of people use auto-detect language with my service and I can't remember seeing it incorrectly identified as Welsh

compact wind
#

Must be something about my voice 😉

#

Where can I find the API docs then I wonder.

knotty trail
#

working pretty well over 1 million minutes transcribed already

#

not sure about that. I think there are Telegram bots perhaps

rough badge
#

Hey i need to get into contact with a discord dev

#

who do I talk to?

#

@hushed pier

#

mod mail dont work like that huh

humble crater
#

Hello and good day. I report this error due to its persistence, I have deleted the cookies, I have reloaded the page among other things and this error continues, could you help me? Thank you

autumn bolt
humble crater
#

How can I use the API keys?

errant geyser
barren relic
#

Who has access to GPT plugins yet? I have a really cool project that I want to collaborate on, my company is called Whop (whop.com) shoot me a dm!

rough badge
#

the people who are using it

#

aren't good people btw

errant geyser
#

what does that have to do with openai @rough badge ?

rough badge
#

right

#

so I wont go into detail

#

but people have been able to glitch open Ai in ways thought to be un do-able think the worst possible out come

#

and that worst possible out come is where the glitching community started

light nacelle
#

Hi everyone, new here. Don't know if I am writing in the wrong section. But where can I search for developers with experience tinkering with Whisper and Gradient API?

I am interested in a joint venture, I have 2 companies eager to pay for gpt3. 5 or gpt4 implementation in their customers service.

inland sable
#

So I am talking to gpt4?

woven bluff
#

Does whisper support transcription of non words like coughing, laughing, and so on, for example with a prompt? Usually for coughing I get "thank you" which is polite but incorrect 😅

#

I can at least detect the non speech probability and most of the time it's good but still curious about it

lethal thorn
#

Hey all--I'm a node.js developer, and am wondering if Whisper supports word-level time stamping out of the box? Or do I need additional libraries?

knotty trail
lethal thorn
#

Right now, in the openai.createTranscription() method, I don't see a param for word-level timing or anything like that.

knotty trail
lethal thorn
#

Ah, I see. Sorry to be reptitive, but which is that? I see so many on npm and github! lol

knotty trail
lethal thorn
#

Thank you! It appears to only be Python-compatible at the moment, is that correct? I'm a JS-only guy, currently. But I have some Python devs who can implement this, if need be.

knotty trail
#

You can call it with spawn from node, like I do here: github[dot]com/mayeaux/generate-subtitles

lethal thorn
#

Oh I was not aware of this! Thanks so much for your help!

peak saffron
#

They actually added whisper support to the openai node package

#

not sure when, but I only noticed it last week

#

Oh weird, i don't see it in their code example - I swear I saw it. It still seems to work for me! I'm calling it like this:

const response = await openai.createTranscription(
  fs.createReadStream(audioPath),
  "whisper-1"
);
  return response.data;
#

Where openai is an openai client with key added through configuration (they have it in their other api examples)

lethal thorn
#

Thanks!! I did see (and have used) this part of the API... but I don't see anything in there for adding the word-level time-stamping, unfortunately.

#

The transcription part works perfectly, though.

knotty trail
lethal thorn
#

Darn. That's frustrating...

knotty trail
#

@lethal thorn I might add the word level timestamps functionality to freesubtitles.ai , it also has API access

peak saffron
#

Are you sure? Allegedly it can accept a parameter for that

lethal thorn
#

well it definitely returns json, but the only thing in it is the transcription. Weird, tbh.

lethal thorn
knotty trail
peak saffron
#

sure thing - I think they weirdly have documentation in two places and only one mentions the node package and some of the features

#

I don't think whisper is top of OpenAi's priorities though

lethal thorn
#

Yeah, the open source one appears to only be in python? Which makes this more confusing lol

#

But agreed, it doesn't seem to be top of their list. All I want is word-level time stamping 😄 and to be able to do that with node lol

knotty trail
knotty trail
peak saffron
#

does using the response_format parameter not work on the api? srt or vtt should have timestamps I thought? (though i thought it was at ~utterance level, rather than word level)

#

(that utterance comment is coming from using the local model (python), not the api so I don't really know)

patent shale
#

the srt and vtt outputs do have timestamps, but they are not at the per word level. they are phrase timing.

lethal thorn
#

Interesting, what does Whisper consider a "phrase"?

#

Just a group of words without a significant pause?

lethal thorn
patent shale
#

Appears the chunks are in about 8 second blocks

lethal thorn
#

Very interesting, thank you! Hopefully the word-level stamping comes soon. So many applications for it!

knotty trail
lethal thorn
#

No problem! I work for an educational publishing company, and we currently pay a vendor a lot of money to manually tag the timestamps for each word in our eBooks so that they can highlight as they are read

#

Figured I could save them a lot of money (and time!) every year relatively easily with the time stamping

peak saffron
#

Oh wow that's like a textbook 'you should start a startup and sell to your current company' situation it sounds like

dense pulsar
#

Is it better to use offline whisper models or the API? Which would grant me more accurate results (and preferably speed)

#

Let’s say I want to transcribe an 8 hour audio file. Would it be faster and more accurate using the API or an offline large model (RTX 3080)

lethal thorn
fluid locust
autumn bolt
slow urchin
#

Do you remember the last conversation we had?

slow urchin
dense pulsar
#

Oh I see

knotty trail
storm oak
#

how much does the large v2 take in vram?

analog pulsar
boreal roost
#

=-0p97t5§1QW2P0-[

knotty trail
knotty trail
knotty trail
dense pulsar
#

@knotty trail Hallucinations like it thinking its saying something when theres silence for example?

How is the speed

knotty trail
#

1
00:00:00 --> 00:00:01
Jaj, premanja na Joj,

2
00:00:01 --> 00:00:03
sijanje na sijanja na joj.

3
00:00:03 --> 00:00:05
Use ove se.

4
00:00:05 --> 00:00:07
Edna, a drugi je.

5
00:00:07 --> 00:00:08
Use.

6
00:00:08 --> 00:00:09
Taj je.

7
00:00:09 --> 00:00:10
Pa, jaj.

8
00:00:10 --> 00:00:11
Jaj, jaj.

9
00:00:11 --> 00:00:12
Jaj, jaj, jaj, jaj.

10
00:00:12 --> 00:00:13
Jaj, jaj, jaj, jaj, jaj.

11
00:00:13 --> 00:00:14
Jaj, jaj, jaj, jaj, jaj.

But for example, this came out of thin air. This doesn't appear when I run it with my own instance, I don't know what settings they have but it's a bit strange.

analog pulsar
knotty trail
# analog pulsar Thanks! Then it's difficult for me... I hope this non-English transcription issu...

I think this will help a lot: https://github.com/openai/whisper/pull/1155

Not sure when it will get merged and implemented in the API though. You can also try cutting the file into smaller chunks that should reduce the chance of hallucination loops. How long into your content does it start to get stuck into a loop?

GitHub

Following the suggestions of @Jeronymous in #914 and #924, it solves the problem of endless loop.

knotty trail
analog pulsar
knotty trail
small juniper
#

Anyone fine-tuning Whisper? Would adding timestamps to the training data improve accuracy of the fine-tuned model?

high mountain
#

Hi I want a chrome/edge extension that allows you to speech-to-text on any text input inside the browser using Whisper. I want the same functionality as Voice In but using Whisper. Is there anything like this?

lethal thorn
#

Sorry for the hand-holding request lol the documentation seems to be all over the place

lethal thorn
fickle coyote
#

will it be possible that whisper would have data when a certain sentence is being said in the future?

knotty trail
knotty trail
proven fjord
#

Just wanted to drop by and say that Whisper has been incredible for creating Text To Speech datasets from voice recordings.

knotty trail
proven fjord
#

Well, what you need to train it is a bunch of ca 5 sec audios, which you can get from an audiobook or a podcast, etc. Then you have to provide transcripts in a file like:
filename.wav | This is the transcript of that file

#

one line per file

#

splitting sound into smaller chunks is a oneliner with pydub

#

then writing the file for the transcripts with whisper's API

#

is like 5 lines of python

#

and that's what you need to give something like Tacotron2 for training a voice

#

I've done it with my own voice for testing

#

it is uncanny

autumn bolt
#

Are you speak Arabic

peak saffron
proven fjord
#

A combination of things, also asking questions to GPT-4. It proved very useful when helping me adapt code for NVIDIA GPUs to M1 Macs

#

I found the guide by FakeYou of big help too

#

but that's for the model training

#

The Fake You discord is pretty awesome

#

This is the FakeYou guide

#

ah! Can't post links

#

ok go to Fake You's discord

#

and you will find it

peak saffron
#

Thanks!

lethal thorn
#

Has anyone done a comparison of Whisper's accuracy (both for transcription and for word time stamping) vs Google Speech To Text?

small juniper
#

Just wanted to drop by and say that

#

Has anyone done a comparison of Whisper

knotty trail
grand comet
#

Hi everyone, how do you make whisper return a file in .vtt format?

spark glade
#

Hi, do you know what is the size (MB) limit to import an audio file to convert to text?

copper elbow
#

Hey everyone, sorry if this is a stupid question but I've been searching for an answer and haven't found anything. I'm running whisper locally and it's working, but all of the subtitles are uncapitalised? Is there a setting I need to change to enable capital letters?

EDIT: The marked answer here seems to help: https://github.com/openai/whisper/discussions/194

GitHub

Hello. I am generating subtitles for my this video : https://www.youtube.com/watch?v=77iDUQd4x90 I have provided the video file directly to the wisher with language en and model large However as ca...

copper elbow
#

Whenever I use initial prompts, it's forcing all of the subtitles into 30 second chunks? Is there a way to stop this behaviour whilst still providing an initial prompt? Without an initial prompt it chunks them logically, but I then have my previous error of not having any capital letters

lost isle
#

How to seperate & label multiple voices from one Audio file ?

knotty trail
west vapor
#

Hello everyone,
What’s the difference between these two approaches used by whisper to transcribe speech?

grand crag
#

ah

autumn bolt
#

k

quick bluff
#

ok

unique anchor
#

Hello everyone, how to use the Whisper API to send binary audio data.

royal eagle
#

response_format

fallow needle
#

guys there is a way to convert avg_logprob to a confidence that is 0-100%?

wet herald
#

Been having slow response times for some hours

west vapor
wet herald
#

I was making your message visible again as I am interested in the answer as well

west vapor
night talon
#

Does anyone know is the model from official API is the same as open sourced large-v2?

#

I found the result from api is better than mine with self-hosted large-v2

sullen cairn
#

Why a Chinese video whisper API response something like before decoding text?

fallow needle
#

Guys there is someone that is working in how to obtain confidence???

#

@outer scarab @left stag

knotty trail
fallow needle
opal steppe
#

q

knotty trail
fallow needle
knotty trail
fallow needle
#

ty i ll try

low condor
#

Hey guys, is there any user interface for using Whisper to transcribe speech, for people who don't know how to write and execute code?

light shale
#

Hi for the last several days, my GPT Plus account has been downgraded despite paying this months subscription twice.

The first payment was my usual monthly fee.

The second payment was an attempt to make a new subscription from the same account due to urgently needing use.

I am now out of pocket, with zero support response.

I cannot even attempt to make a new account in case the money is once again taken without providing me with what I paid for .

I have seen that numerous people have expereinced the same issue.

Is there some sort of offical update to this?

dull cradle
#

Ai

sand rune
#

1

pure veldt
knotty trail
forest iron
#

hey guys where I can find supported languages in v1/audio/transcriptions api endpoint?

autumn bolt
cinder nebula
#

Hello, I have an issue running Whisper API in Python (Jupyter Notebook). I followed all the recommendations I found on GitHub related to the ffmpeg error. I uninstalled ffmpeg and installed ffmpeg-python, and now instead of saying that the module ffmpeg has no input the error says: ---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18348\3212043240.py in <module>
----> 1 import whisper

~\anaconda3\lib\site-packages\whisper_init_.py in <module>
9 from tqdm import tqdm
10
---> 11 from .audio import load_audio, log_mel_spectrogram, pad_or_trim
12 from .decoding import DecodingOptions, DecodingResult, decode, detect_language
13 from .model import ModelDimensions, Whisper

~\anaconda3\lib\site-packages\whisper\audio.py in <module>
3 from typing import Optional, Union
4
----> 5 import ffmpeg
6 import numpy as np
7 import torch

ModuleNotFoundError: No module named 'ffmpeg'

#

Any advice?

#

The call before was pip install ffmpeg-python

#

response: Request satisfied, path: \anaconda3\Lib\site-packages\ffmpeg_python-0.2.0.dist-info

#

Hey Martini, did you solve it? I am running into the same issue...

leaden lake
#

Hello Team so i have a query related to Whisper

I am trying to integrate Whisper in my nodejs project. To do so i am using this code snippet to make an API call using openai npm package.

 const configuration = new Configuration({
    apiKey: process.env.CHAT_GPT_ACCESS_KEY,
  });
  const openai = new OpenAIApi(configuration);

const transcription = await openai.createTranscription(
        fs.createReadStream(audioFilePath),
        'whisper-1',
        undefined,
        'json',
        0.2,
        'en',
      );

when it comes to passing the audioFilePath i am passing a path where my stream is located, and reading it at the same time.

The code for the file/route includes:

  1. Reading file from user's request.
  2. Passing it to multer and creating a Buffer() from it.
  3. Passing that buffer file to fs.createWriteStream() and then streaming at a file location.
  4. Once the writing is done then reading the file content using fs.createReadStream().
  5. Finally passing that readStream to openai.createTranscription()

I have deployed my nodejs code on render.com. But whenever i try to hit the API route that has this code from my iPhone, then the request is failing with status-code: 400 (BAD REQUEST).

What i am not sure is what exactly am i passing wrongly in the openai API. Because the same API route returns data when i call it from my Desktop browser/ Android Browser.

knotty trail
#

From my own code

noble igloo
#

uhhhhhhhhhhhhh, im listening to my microphone, no sounds around, no system sounds, getting some data streams in return from whisper when sending anything, but those are not mine

#

oh, im getting it even unprompted

north flame
cinder nebula
#

That’s what I did first. Yes. Did you have a look at the GitHub convo? I did everything - install ffmpeg , uninstall it and installed ffmpeg-Python - none of these options work in my case

#

I think it’s because Python takes the ffmpeg from its cache (which I deleted as well, yet it still does it)

knotty trail
silk wren
#

Is there anybody who has experience with kore.ai?

modern obsidian
#

Hey guys, I’m doing a personal project. All the coding part is done for the moment but when I want to run the code and go and see how looks the website, it doesn’t work. The problem is bout the API keys that openai gave me. I wanted to know if anyone knows how to determine if a API key is for chatgpt or DALL-E. if I solve that I will be able to put my 2 different API keys in my JavaScript code and run the code

peak saffron
#

Has anyone had any issues with Whisper assuming the wrong language? i have a user who is a native non-english speaker but IS speaking english. The model (through endpoint, Large with language specified as "en") keeps assuming they're speaking their native language. I'm guessing this might be an accent issue? Any ideas or experience?

north flame
north flame
autumn bolt
patent shale
autumn bolt
meager schooner
#

note: if openai's Audio module isn't recognized, run pip install --upgrade openai

#

apparently my package was older than Whisper

trail quiver
#

8okay87

autumn bolt
meager schooner
peak saffron
#

Has anyone else ever gotten someone else's transcription back from a request?

#

I submitted one that had the audio of "I'm sorry, could you repeat that" and got this (screenshot from my app)

pliant sandal
#

No, but I've got some wildly inconsistent translations if the recording has been low. But not as far of as that!

novel tartan
#

hekllo

small juniper
#

Has anyone else ever gotten someone else

kind kayak
#

Is there a way to get back how confident the model is about pieces of the transcription?

For example: I say "Haskell Nix Python"
Output: "Haskell Licks Python"

I'd imagine licks is less confident?

pearl star
kind kayak
pearl star
#

Is there a music volume higher than the sounds in your source?

kind kayak
#

No but I am a little sick and stuffy nosed today

#

Otherwise, no sound around me

pearl star
#

The whisper-1 model can detect it as a subtitle if the sound source has a musical sound that is louder than the voice of the artist. Sometimes it can return an empty result.

pearl star
#

Probably the microphone you are using is not good 🙂

kind kayak
kind kayak
#

We unfortunately cant guarantee our users will have good microphones

#

We wouldnt necessarily neeeed to though if we could get how confident each transcription of each word is

pearl star
#

Can you share an audio recording containing the same query for me via private message. I would like to test it with my own whisper-1 model.

#

So I can make a guess as to where the error originates from.

autumn bolt
#

Hallo

pearl star
# kind kayak Yeah but I'll need to handle that case for my application

07.04.2023 - 00:54:45 : I am listening...
07.04.2023 - 00:54:47 : Speech-to-text translation is in progress...
07.04.2023 - 00:54:49 : Question: Haskell Nix Python
07.04.2023 - 00:54:55 : Haskell, Nix and Python are all programming languages that have different features and uses.

Haskell is a functional programming language that is known for its strong type system and its ability to handle complex mathematical computations. It is often used for building robust and reliable software systems, as well as for data analysis and machine learning applications.

Nix is a package manager and build system that allows developers to create reproducible and portable software environments. It is often used in large-scale deployments, where consistent and controlled environments are crucial.

Python is a high-level programming language that is easy to learn and use. It is widely used for web development, data science, and machine learning applications, as well as for scripting and automation tasks.

While Haskell and Nix can be considered more specialized languages, Python is popular for its versatility and ease of use. Each language has its own strengths and can be used in a variety of contexts, depending on the needs of the project.
07.04.2023 - 00:54:55 : Text-to-speech translation is in progress...
07.04.2023 - 00:56:20 : Completed!
07.04.2023 - 00:56:20 : ---

#

No problem appears.

#

Question: Haskell Nix Python -> whisper-1 model transcription

kind kayak
#

Im saying the speech to text part is not working perfectly

#

We have different voices

#

The error originates from my stuffy nose lmao

pearl star
#

Haskell Licks Python

#

I came to the same conclusion with the audio recording you sent.

kind kayak
#

Which conclusion exactly?

pearl star
#

Waiting for a short time between words can solve the problem. Don't talk in a row.

pearl star
kind kayak
#

Serious question, are you a bot? Cuz theres definitely a misunderstanding here

#

never know these days 😄

#

Cuz yes I already knew these things, that the audio track is unclear

pearl star
#

You can't get feedback, you have to come up with your own solution 🙂

#

When I spoke one by one in similar misunderstanding situations, I saw that there was no problem.

kind kayak
#

Are you a bot though? Genuinely wondering haha, it would make sense for an OpenAI discord

pearl star
#

No I'm not a bot 😄

#

No I'm not a robot 😄

#

bots can't make jokes 🙂

kind kayak
pearl star
#

A closer result. It looks like there is a problem with your voice 🙂

late hemlock
#

Is it possible to get the same level of timestamped text using whisper-api, similar to if you run it locally? If its possible can someone tell me how, its driving me insane!?

pearl star
#

For example, if you are using the local model with the English language and there is a German word in the sentence. The German word is evaluated in English.

kind kayak
pearl star
uncut oasis
#

发发

teal orchid
#

Is anyone facing an issue with the API? Many times the API is taking more than 30 seconds to respond or it times out...

robust dirge
#

yes, for me too

#

@autumn bolt: Is there any issue with the API right now?

patent shale
pliant sandal
late hemlock
pliant sandal
#

I'm afraid I'm misunderstanding what you're talking about. I mean - to answer your question I would simply say "call the API", but that seems wrong 😛

dull cradle
#

||ohio||

candid junco
#

Hey y’all, I’ve been building with Whisper and using Lambalabs GPU for the computation power. Has anyone tried using CPUs instead to transcribe? How’s the accuracy

sullen cairn
#

Hi did anyone encounter encode/decode problem when transcript mandarin audio file?
text in response json look like this: \u4e0d\u559d? \u54c8\u56c9\u5927\u5bb6\u597d

pliant sandal
#

Looks about right 🙂 That's just unicode characters

frail dew
#

Ummm, can someone sort me out, since I can't get Whisper to read my file

#

(Using NodeJS)

#

Should I supply a file path or a buffered file?

sick vault
#

The biggest scam, bots are asking to human if you are a bot

full gale
frank basin
tidal canopy
#

any good ideas to improve whisper acc for lyirc recognition?

#

currently running lyric extraction via demucs

#

could finetuning help?

mild basin
#

Does anyone have any tips for reducing hallucinations when using the translate API? Happens fairly frequently, inserting things like "Thanks for watching" and "Please subscribe" to the end of the text, and repeating phrases over and over

drifting lark
#

@onyx pike why did you send me a friend request

frozen spoke
#

Can whisper handle multiple languages in one API request?

late hemlock
#

I dont think so

surreal flame
#

Does anyone have any good diarization projects, they’ve been able to use successfully together with whisper (not the api)

knotty trail
knotty trail
abstract bolt
#

well done friend

mild basin
autumn bolt
#

@abstract bolt is a bot ban em

idle mica
#

hi, after receiving such a warm response to my last tutorial on using the API, I want to share my brand new video

scarlet dune
#

What languages does whisper support? Is there a list?

static sable
#

Usa colored supplement logo

dense pulsar
#

Anyone got an ffmpeg command to properly split a >25MB mp3 file into multiple segments (without cutting out dialogue).

Trying to transcribe large mp3 files with the whisper API in c++ but obviously they have a 25MB limit, and recommend to split it. I can’t find a sure fire way to do this properly.

My commands still cut out the audio during dialogue sometimes.

I also want to remove any decently large periods of silence from the audio preferably

Or if anyone knows a better way to do this let me know

mild basin
# dense pulsar Anyone got an ffmpeg command to properly split a >25MB mp3 file into multiple se...

if you first get the bitrate and duration of the file with ffprobe, you can then calculate how long each segment needs to be to be under 25MB
ffprobe.exe -v error -show_entries format=duration,size -of default=noprint_wrappers=1:nokey=1 file.mp3
will return the duration of the file in seconds and the size in bytes, then you can calculate how long each segment needs to be like this:
segment_duration = (desired_size * file_duration) / file_size
Then you can use ffmpeg to split it into segments of that duration, for example this will split this file into 40 second segments
ffmpeg.exe -i file.mp3 -f segment -segment_time 40 output_%03d.mp3

ffmpeg also can do silence detection and removal, look into the silencedetect filter

#

If you're still finding that it's cutting out the start/end of the audio, you can include a 1 second overlap between each audio snippet if you loop over the file and use the -ss and -t flags to manually adjust the start time and duration of each snippet. I use this approach for transcriptions and it works well

dense pulsar
#

@mild basin Thanks. Would you mind sending me your code? I'm curious about one part that i dont know how to do dynamically. Would be faster if i just read your code. Feel free to DM me.

If you dont want to send it thats ok too, i understand 🙂

mild basin
dense pulsar
mild basin
dense pulsar
#

Means a lot, thanks 🙂

lost isle
#

Does anyone have any good diarization

violet zephyr
#

Whisper can I use in iPhone ?

tidal canopy
violet zephyr
#

How ?

violet zephyr
# tidal canopy yes

How ? My native language is Urdu and I was searching on internet many people talking about whisper much high accuracy of in transcription, I want to try it but I didn’t get way to try it like chat gpt, I’m not Devloper finding free way to use this service if I can use in my iPhone it will be much easier for me

mild basin
#

Minimising hallucinations

tidal canopy
#

there are also services where you can upload audio and they run the model for you

violet zephyr
#

Do you can give me link

tidal canopy
#

no because the moderation bot doesn't allow me to post links kappajail

violet zephyr
knotty trail
latent prism
#

I'm very nearly done building my personal VoiceGPT Android app. The voice recognition innate to Android seems worse than I'd like, and the available voices aren't great.

Can anybody point me in a direction for how to get better speech recognition and more voice options?

#

I gather that Whisper can do both, but (if so) I'm unable to find things like where I can sample the available voices.

pliant sandal
#

Whisper is only voice to text, but not the other way round.

rugged thorn
#

scaam

pseudo hemlock
#

hi, nice to meet you

primal halo
#

Hi everyone ! I've got a weird message error in google collab. I'm trying to use Whisper to transcribe an audio file. I've created an API Key but the google collab tells me the api KEY is incorrect even though it is really not. Anyone already seen this ?

EDIT : i was dumb 🙂

small juniper
#

We need a second channel for those of us not using the Whisper API, who are using the model locally.

autumn bolt
#

@empty hinge

patent shale
graceful karma
#

Hi guys! I have a question: in python I have a variable with some bytes (of an audio file) and I want to transcribe this file. But if I call the function openai.Audio.transcribe_raw (I call this and not .transcribe() because I don't want to store the bytes in a file) I get this error: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
But the bytes are of an mp3 file.

Anyone with this issue?

eager helm
#

Write a C++ program, using function, to calculate the factorial of an
integer entered by the user at the main program.

waxen shale
#

I'm trying to automate caption creation as a function, every tutorial/project I'm seeing is using whisper as a CLI. Can someone clarify if I can use it more as a function, passing in the name+path of a file automatically instead of manual user input?

graceful karma
# autumn bolt Code?

transcript = openai.Audio.transcribe_raw("whisper-1", file=audio_file, filename="a")

where audio_file is <_io.BufferedReader>

graceful karma
#

ops sorry, you meant in the filename. I'll try now!

graceful karma
slim blaze
#

Hi, i'm detecting a student's ability to speak correctly. But now whisper is so good it can even recognize the mispronounced words. Is there a way to make whisper a little less smart ?

small juniper
#

Agreed I ve seen some confusion about

autumn bolt
small orbit
#

Code?

forest haven
#

hey guys, new here.
can whisper api provide timestamps?

empty juniper
#

For the python API (e.g. transcript = openai.Audio.transcribe("whisper-1", audio_file) ) does anybody know where I might find actual API documentation for the Python objects? The API documentation on the OpenAI website seems to be mostly the REST API. But there's this small example of using Whisper with Python, but then it says give --form attributes to tweak the output and I just don't know how. I find it too hard to guess and tweak through code completion and the cookbook doesn't have any python/whisper stuff in it. Thanks in advance for any pointers!

lunar wadi
#

how can i export the result as a srt file? my current line looks like that: result = model.transcribe("full-full.mp3", language="de", fp16=False)

autumn bolt
#

.

tall shell
#

anyone know what the issue here is?

gentle canyon
#

hi guys. i need help to move on my project. I need to connect Whisper API ; GPT-4 API and Google text to speech in flutterflow. Guys, any of you already did this kind of project?

pliant sandal
#

you have a space between "--" and "upgrade" in your pip install command. This results in pytube not being installed and subsequently the import fails.

misty pond
#

I am working on a project where I receive a URL from a webhook on my server whenever users share a voice note on my WhatsApp. I am using WATI as my WhatsApp API Provder

The file URL received is in the .opus format, which I need to convert to WAV and pass to the OpenAI Whisper API translation task.

I am trying to convert it to .wav using ffmpeg, and pass it to the OpenAI API for translation processing. However, I am getting an "invalid_request_error"

jovial compass
#

We are using the Whisper API in our React Native app, and we are encountering the following error:
ERROR Error asking AI: [RequiredError: Required parameter model was null or undefined when calling createTranslation.]

#

the code
const response = await openai.createTranslation({
file: uri,
model: 'whisper-1',
});

remote pier
#

i need help with the openai transcribe function in python

import openai

def transcribe(wave_buffer):
    transcript = openai.Audio.transcribe("whisper-1", wave_buffer)
    message = transcript.text
    
    if message is None or len(message.strip()) == 0:
        return None
    
    return message

i am passing a BufferedReader from the memory but im getting an error AttributeError: '_io.BytesIO' object has no attribute 'name', how do i fix this? currently it works if i save the audio as .wav file in the disk but that's very unintuitive since the recording can be large sometimes, how do i fix this?

old tiger
#

I know one can use Whisper API to upload an audio file and then receive a text from it. But I want my speech to be translated to text live as I speak. Does anyone the sources/ideas on how one can build his using whisper API specifically?

remote pier
old tiger
remote pier
#

yep, seems like you cant use bufferedreader to transcribe audios, only via files, openai apis are really halfassed

fluid vale
#

Hi!, im trying to do some speech to text with whisper in Spanish language, but it misses some keywords, and doesn't understand well the topic. is there a way to add maybe a text dictionary or do some further training in Spanish?

small juniper
#

Try splitting audio on silence using PyDub and send small pieces to Whisper API?

remote pier
#

yeah but when is he gonna know when to split? like stop the recording at some point

#

its a live recording

small juniper
#

I know. Two asynchronous processes/threads: one for reading live audio and splitting on silence, and one for sending and receiving from Whisper API.

neon lichen
#

sometimes I get bad transcriptions, like random characters in other languages:

transcript.text කපමාන්මාන්මාන්මාවක් කළුමන්තස්තුතියට අපි කිරීමට කිරීමට කිරීමට කිරීමෙන් කිරීම කිරීමට කිරීමට කිරීම කිරීම කිරීම කිරීම සහ කරයි.
transcript.text 今度は、私はこのような場所で、私は 私はこのような場所で、私は 私は 私は 私は 私は 私は 私は 私は 私は 私は 私は```

Anyone have any idea why this happens?
final apex
oak sparrow
#

and I think it's AI's key words understanding issue.

lapis anvil
#

Hi guys, has anyone tried training the pyannote.audio model with their own data from scratch? The results I have gotten for speaker diarization using the pre-trained pyannote.audio model are not so accurate, therefore I thought of training the model from scratch. Anyone with ideas on how to go about this?

patent shale
raven gate
#

So if an upload is near or at 25 mb, does whisper still transcribe within 10 seconds?

pliant plover
#

i am trying to add whisper to my python script but it does not work
Import "whisper" could not be resolved

#

any ideas why?

#

I have ffmpeg installed
and python 3.10.10

lapis basalt
pliant plover
lapis basalt
#

`def synthesize_speech(text):
engine = pyttsx3.init()
engine.setProperty("rate", 150)
engine.save_to_file(text, "output.mp3")
engine.runAndWait()

def transcribe_audio(audio_file):
transcript = openai.Audio.transcribe("whisper-1", audio_file)
return transcript["text"]`

#

OK, you have OpenAI installed in python already

pliant plover
#

yes and also the whisper as well

#

but python does not seem to recognize it

lapis basalt
#

the whisper package on PIP is not the same

pliant plover
lapis basalt
#

The code sample I provided only needs import openai to work

pliant plover
pliant plover
#

I am on Win10 btw

lapis basalt
#

have you tried creating an environment for your project in Python, then installing the requirements via pip to that env? maybe there is a conflict in your default setup.

python -m venv MyProjectEnvironment
./MyProjectEnvironment/Scripts/Activate.ps1

pliant plover
lapis basalt
#

Just to clarify, you realize that when you do pip install whisper it installs the Whisper Database package, not anything to do with the Whisper voice api?

#

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

pliant plover
#

how am I then supposed to install it to the machine, since I am trying to use EdgeGPT and whisper to create a voice assistant

#

are there any alternatives for voice recognition?

lapis basalt
#

I found a package called whisper-openai, but I haven't used it. You can interact with whisper using just the 'opanai' package, I am not sure why that project has you installing 'whisper' package, that's a DB.

#

I will look it up on GitHub and see what its doing, give me a few

pliant plover
#

tyt

lapis basalt
#

is it this one? acheong08/EdgeGPT

pliant plover
#

correct!

lapis basalt
#

odd, this one doesn't mention voice. I know AutoGPT has a voice option, but it looks like EdgeGPT does not.

pliant plover
#

weird, the thing is almost any vid I looked up uses the same pip install and can simply import it to their code and it finds the module

but in my case, while running python3 it does not find the module, could it be that my IDE is not compatible?

lapis basalt
#

Which IDE are you using? I use VS Code on Win10

pliant plover
#

same

lapis basalt
#

can you share one of those demos, maybe I can glean some insight from it. if it's on youtube, just provide the watch?v=YM3vT65q4tY part of it? (because links are not allowed)

pliant plover
#

oh...

#

nvm wait

lapis basalt
#

k

pliant plover
#

watch?v=HbY51mVKrcE

#

this is the one i am following

watch?v=aokn48vB0kc&t=248s

lapis basalt
#

Oh, ok. So he is using the project on openai/whisper GitHub as a python package. one moment

pliant plover
#

alright

lapis basalt
#

try running these commands and see if the import succeeds after

pip install git+https://github.com/openai/whisper.git
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

pliant plover
#

right away

lapis basalt
#

This package is a self-hosted version of the whisper API, which is what threw me off.

pliant plover
#

and it still wasn't able to import

#

this is very odd, since almost every video does the same as I do and it works out for them

lapis basalt
#

hmm. the video is a year old, maybe change the import to match the package

import openai-whisper

pliant plover
lapis basalt
#

for openai it says missing import, but for whisper it says undefined variable at line 3, character 15. what is that line in the code?

pliant plover
#

whisper still does not work on PYTHON 3.11 right?

lapis basalt
#

the github says "We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.10 and recent PyTorch versions. "

#

so, no it says python 3.10

pliant plover
#

hmm actually do not know what I am doing wrong at this point

#

3.10 is what I have

#

also ffmpeg is installed

lapis basalt
#

Ok, the example code shows import whisper.

`import whisper

model = whisper.load_model("base")

load audio and pad/trim it to fit 30 seconds

audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

make log-Mel spectrogram and move to the same device as the model

mel = whisper.log_mel_spectrogram(audio).to(model.device)

detect the spoken language

_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

decode the audio

options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

print the recognized text

print(result.text)`

pliant plover
#

should I add it to my code or this is just an example

lapis basalt
#

it was an example from the openai/whisper github page. You might model your code after it, but at the moment you are still stuck on the failed import.

pliant plover
#

yep,

what could possibly be the reason ? I mean I did everything as it was documented

lapis basalt
#

Im on Python 3.11.3, so of course it won't even try to install for me lol

pliant plover
#

so there is no solution I suppose

lapis basalt
#

You could always fall-back to the OpenAI API, but once your free credit is used/expired, you'd have to pay to use it.

pliant plover
#

alright. I will look into that

but thanks for helping

lapis basalt
#

You're welcome. The whisper api is priced at $0.006 / minute

raven gate
#

So if I were to try to transcribe audio that's near 25 mb, using next js within the 10 second timeout limit, would it be enough time to finish the api request?

#

If anyone knows

sly yew
#

I don't know what you are asking. I am new to this. I did, however, do a test with Whisper and Nuxt3 and I got it to convert an audio file to speech very easily.

raven gate
sly yew
#

I don't know. I'm new here too. Why the 10 sec time limit?

raven gate
#

That's how next works with backend serverless functions on a hobby plan

sly yew
#

gotcha

#

what host? vercel?

raven gate
#

yes

late hemlock
#

Will OpenAI ever release whispers API with the ability to get timestamps, like you can on the local run versions ?

sly yew
#

and it's free

late hemlock
#

because im asking for Whisper. as im in the OpenAI discord? I dont care about youtubes ASR

#

Im developing something that needs timestams. anyone useful able to answer me ?

lapis basalt
#

There isnt away to get them directly from whisper. They have a version of the API available on GitHub you could try to modify, otherwise use your own code to insert the timestamps using datetime() or something.

#

OpenAI\Whisper on GitHub

mighty sonnet
mighty sonnet
#

Not sure actual time stamps…it seems very GPT based which isn’t so smart with time…but organization..you can suggest with the prompt

#

I have GPT able to generate YouTube scripts…with time codes…so maybe not impossible

late hemlock
polar pivot
#

Any way i can change encoding of the output ?

#

Using api

mild basin
sly yew
late hemlock
late hemlock
mild basin
late hemlock
#

can i DM you ?

mild basin
#

sure

tender cave
#

@dapper fjord hey bro

woven folio
#

is the whisper api better than running whisper locally?

light raptor
#

im gay

tepid fulcrum
#

I am not

late hemlock
#

I have a 3090 and It still takes 15 - 20 mins for an hour video.
OPENAI - "We’ve now made the large-v2 model available through our API, which gives convenient on-demand access priced at $0.006 / minute."

that's $0.36 for an hour of video. and you get it back within 30 seconds.

#

Depending on what you technically need. its better in so many way s

woven folio
# late hemlock 1000%

Thanks. But their large model is not better than the open source one though right?

late hemlock
#

What open source one ?

#

whisper ??

woven folio
late hemlock
#

its using the better large-v2 model

#

the best one they have

late hemlock
woven folio
late hemlock
#

Dude ffs version 2 is the more efficient better one 😄

#

I use the api for business, as well as the local version

#

Id take api over local any day

woven folio
polar pivot
#

I also have a 4090 that i want to put to use

#

I also have questions about how safe the data in the api is, risk of eavesdropping etc

late hemlock
woven folio
woven folio
#

For anyone interested, I finally managed to make the large-v2 model work on Windows 11. It works great and fully utilizes my 4090! A 22 min audio was transcribed in about 3 minutes (the API does it in like 30 seconds). But it's free (+ some electricity).
Quality is the same since they're the same model.

dull lotus
#

лол

teal orchid
#

Anyone facing issue with the Whisper API? Suddenly some requests have started failing with the error Invalid file format. Supported formats

smoky chasm
#

is chat gpt working now?

patent shale
remote oasis
#

What do you guys whisper for mostly? For development or personal use

toxic creek
#

Hi! I am really trying every possible way to make the whisper work in NodeJS but no luck.
Always get this error.
My file is a simple m4a but it does not matter, got the same error with mp3 as well.

#

Mayba someone has faced this in the past

polar pivot
#

Use gpt4 🙂

dark knoll
#

is whisper api down ? i am getting 502 gateway error and errno: -4077, code: 'ECONNRESET', syscall: 'write',
this error

spare kernel
#

Hello - anyone found a good windows cli client for whisper local? Ala whisper.cpp. Can’t use api due to a work requirement, and openais version is cpu or cuda only - my understanding is there is some ports that work on gpus generically?

radiant path
gilded solstice
#

are you facing issues with whisper api? i got the error these 2 days: The server had an error while processing your request. 500 {'error'

#

i face this issue frequently today

toxic creek
# toxic creek Hi! I am really trying every possible way to make the whisper work in NodeJS but...

I have managed to hack this in case if someone needs this:

  1. file type is Express.Multer.File I am using NestJS on the server side
  2. working variable works but that reads from disc but I wanted to solve this via uploads
  3. You can convert Multer file buffer into a stream by this: const hackedData = Readable.from(file.buffer)
  4. Very important you need to add a path as well which can be anything BUT needs to have the correct extension. Otherwise openAI throws an error.
    Proper variable naming and TS needed of course but at least this is working for me
tidal cliff
#

hey wutsss goood

dense basin
#

@woven folio hey bro can you help me set this up. I don’t know anything. Just point me in the right direction and I’ll update you with the progress if you have the patience to help a noob with 0 for experience but a lot of drive and willingness

late hemlock
#

dont use m4a. use mp3

woven folio
sour ermine
spare kernel
woven folio
spare kernel
woven folio
#

I made this script for my workloads to run it automatically at night, just import your audio and you get txt outputs: https://github.com/sdevgill/whisper-auto
Install whisper directly from github like it says in the README, then install CUDA 116 or 117, latest NVIDIA drivers, then follow these instructions: https://github.com/openai/whisper/discussions/47 to remove pytorch and install it again cleanly for CUDA

A bit cumbersome but it's the current process on Windows with NVIDIA

spare kernel
# woven folio I made this script for my workloads to run it automatically at night, just impor...

Thank you! Yes I was looking through some other issues and getting stuck . The const-me version I’ve found does have the advantage of working on non -CUDA gpus but the maintainer just did it as a hobby project. I’m also looking at doing live transcription and displaying on a web page - obviously with some kind of buffer but live ish. The idea is we have some maintenance crews using radio comms and if something happens you can quickly look back the last few minutes and get some context etc

#

From playing around with it seems like you can get high quality transcription faster than real time on a rtx 2060 so it seems feasible, perhaps with a minute delay or so with a buffer

woven folio
# spare kernel Thank you! Yes I was looking through some other issues and getting stuck . The c...

Sounds like a cool project! I recommend playing around with the official model to get the hang of it, then start experimenting with the other projects and go from there. Then you can kind of experiment which model works best, you might even be able to pull it off with the large one, though most likely small/medium, they're not as good as the large, but if it's clear spoken English, they're good as well

spare kernel
#

Yeah - currently in that playing around stage but have had some brilliant results just pumping through old recorded comms so it’s definitely possible. Also not sure what the infrastructure will look like if I can scale it up to 10-20 live channels

woven folio
spare kernel
#

@woven folio Agree - unfortunately have a biz requirement it has to be sandboxed, for really no good reason a_skull

gilded solstice
heavy cove
#

How do I auto detect when someone is talking, and I should send it to whisper?

ornate urchin
heavy cove
#

thanks

urban knot
#

I used gpt to write code for Pythonista on iOS phone to leverage whisper for transcribing audio. It works when mp3 is around 10 mb but I start getting nothing when file size gets closer to 19 mb. Nothing meaning errors and no transcription. It’s relatively simple use case. If anyone has idea or would look at my code, I can share it here. Thanks 🙏 ((is api support channel correct path?))

queen escarp
#

Really sorry if this is a stupid question, I want to use whisper-1 API to transcribe speech in Urdu, but it works only half the time

There is another language Hindi which verbally sounds exactly the same as Urdu, but the characters and script of the language are completely different. The API constantly detects the speech as Hindi and transcribes it into Hindi text

Is there any way to force or hint to the model which language the speech is in, if there are multiple languages that verbally sound the same?

gleaming briar
#

Hello, I currently use Open AI Whisper in a application but i randomly came to know about Whisper Jax, How does it work?

paper schooner
#

Hello, i'm working in a project about whisper can i get some help?

autumn bolt
#

I'm trying to run google colab for openAI's whisper, but I don't know how to make whisper access the file. Any idea on how to do it?
I uploaded the file to google colab, but I don't know what to do to make whisper access it, basically
the colab in question:

elder radish
#

Is there a decent way to format the output of whisper? It’s super accurate but the wall of text is hard to work with

fierce moon
#

gm

mighty sonnet
steep nacelle
#

/start

analog cave
#

Hey guys. I am currently using whisper right now, and even though the language I speak is in english and I only use transcribe method not translate, but in response it becomes another foreign language that I don't understand, my guess is that it sounds like Indonesian language or Malaysian I'm not sure. Is it because of my accent? is it maybe I should speak more fluently or more American or British accent so that the response will still be in english? Thank you guys.

whole oasis
#

is whisper available for js?
either using the api or as I package doesn't matter

south lynx
#

Hi all,

does whisper transcribe filler words in english?

elder radish
fallow needle
#

Hi guys. I have a long audio, like 1 hour of audio and at 1 point of this audio like in 02:00 there is another person that is questioning the speaker (so the voice is lower). The model stop to transcribe and allucinate adding "... ... .. . . . ... .. ..". What is the option to avoid this problems and just don't transcribe what the model don't understand?

paper schooner
autumn bolt
fallow needle
pearl shell
#

Hi, I need to transcribe more than 25 megabytes of audio to text with Whisper multilingual, how can I do that? Thanks

split stone
pearl shell
#

I try to upload 24 mb to Whisper (minimum required is 25 mb), but it said it too large?

torn drum
#

Hi everyone. How to have the time positions in the audio file for each word that is recognized. Are there solutions for that? Thanks

paper schooner
acoustic yarrow
#

Do need to apply to get access to whisper API somewhere? I have an openAI account with key however I don’t see any options, or am I just missing something. I see the speech to text are in the API section but yeah then when reading that area it refers to whisper with link, and then I’m directed to the whisper main page where I don’t see any info on how to access it.

fallow needle
paper schooner
fallow needle
turbid idol
#

why whisperapi so slow?

thorny needle
#

If you guys want to visualize the whisperapi like so check out mabbu.app

autumn bolt
#

Is there any way to have more shorter sequences? Because some of my sequences are like 20 words long and I am using them for subtitles and it doesn't look good. In the transcribe options there are some settings which I couldn't find any information about, I tried changing no_speech_threshold but it didn't work

paper schooner
potent meadow
#

Hi guys. Today I decided to move from my machine with large model installed to a script for whisper API. Can I ask you an example on how to use it with remote videos urls?

woven bronze
#

Does anyone know what the companies in the Whisper paper are?

woven bronze
potent meadow
woven bronze
autumn bolt
#

Can I use whisper to get live Speech to text not a recording?

tender rivet
dull swan
acoustic yarrow
#

Hey hey! Has anyone incorporated whisper into a chatBot yet. I’m just about to dive in here and wondering if anyone has any pointers they’d like to share or any pitfalls?

coral lagoon
#

I was thinking about working on that but it would cost so much for how I would make it

untold crystal
#

helllooo guys

#

can you help me whis this?

#

this is the problem

iron wharf
autumn bolt
#

,.,

untold crystal
#

help people

#

pls

leaden root
#

II

topaz cargo
#

JOHN

acoustic yarrow
paper vapor
#

Does anyone know where i can find this user

low rain
#

yes

#

in this server

urban oasis
#

Hello guys 🙂 glad to be here

#

any tips for using whisper for live transcription ?

iron wharf
fallen turret
#

@fallen turret
bsusbus

tender rivet
#

Hey, im having an issue with whisper, im trying to force it to run on a language in particular by running it like this: whisper --model large --language pt --task transcribe input_file_here
but it looks like that despite setting --language the AI still picking up some english words and translating while I would like the AI to not do that but instead just use words of that particular language, is it possible?``

small juniper
#

Hi guys I have a long audio like 1 hour

bronze garden
#

it's me

untold crystal
#

anyone know how to solution this problem?

#

or whats the meaning?

#

help pls

hollow locust
#

Does anyone have the code to save an audio file to a txt file?

open portal
#

Willing to pay someone if they can use Chat GTP to develop an app

timber marsh
#

anyone help!
how can I train whisper ?

distant trail
#

also its not an error its a warning

#

you can go to the url mentioned in the error to learn more.

untold crystal
#

i went to the url

#

but i didnt undersatand

#

i study odontology

#

i dont know about programacion

warm hemlock
#

Hey all! i'm having some trouble using the transcription api in a nextjs api route. I get a ERR_FR_MAX_BODY_LENGTH_EXCEEDEED error even though the file i'm sending a 12.5MB .mp3 file. Here's the code i'm using: https://codeshare.io/nzvj9E

I tried with both fetch and the SDK, but i get weird errors for both:
fetch: i get a 'You must provide a model parameter' error even though it's 100% being included in the request
SDK: ERR_FR_MAX_BODY_LENGTH_EXCEEDED even though the file is only 12.5MB.

any advice?

autumn fox
#

she wanna go viral

woven bluff
timber marsh
#

anyone help! Please
how can I train whisper ?

loud sleet
#

Hello guys. I have a question on whisper. I need to make transcription of the call between two people so transcription is presented in the form of dialog. As far as I know whisper and Open AI speech api do not provide those features.

I know that one solution is to divide the file into smaller subfiles for each speaker and then push each file to open AI API, however I'm wondering is there any better or simpler solution for that problem. May be there is already library made for that task.

low rain
#

psst

#

hey

#

im whispering geddit

#

ok im out

amber edge
#

is there a way I can divide the whisper transcription by user? Say I have 2 users talking. Can I convert the transcript and find out who said what?

iron wharf
autumn bolt
#

hello how can I fix this openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. isnt API is for free?

lean grove
#

i want to send data to the whisper api in base64 format, since im using node js, i cant use the "File" type for the file input.

#

anyone know?

ebon blaze
#

Hello, does Whisper collects my transcript data?

languid oyster
#

what exactly is a whisper guys

tough imp
#

so... sometimes I've received weird languages, but this one was a bit odd...

autumn bolt
#

The whisper API is generating subtitle segments that are way too long (6-8 seconds each in some cases); how can I configure it to return shorter segments?

frail flax
#

messi or ronaldo

autumn bolt
untold crystal
#

--output_format {txt,vtt} its ok this prompt?

untold crystal
#

its says me error

#

pls help boys

iron oar
#

hey yall check out

#

SkmAI (Beta): AI powered Youtube video search tool (Revolutionizing search and content consumption) on the projets sections

swift sparrow
#

How to use whisper and what is does

last void
#

Hi guys!
A big question. You know some app that it's using whisper as a Note-talking-speech App?
A good question someone ask me because wanna write a book just dictating the story.

main current
#

I use it for tracking fitness etc

errant mason
#

Are there any projects with near real-time use of the whisper API? Haven't really worked with audio before but am basically looking to do real-time transcription and text analysis.

sharp parrot
#

ALL hail Kitty

pliant elbow
sharp parrot
still moon
#

Only works in linux right now; but, with the scripts, you can assign hotkeys to the scripts (for Start/Stop), or drag them to the desktop, OR, I have my new UI button overlay which is really quite nice, imo.

simple flare
#

how can i learn english?

autumn bolt
errant mason
still moon
#

@errant mason it was built up over a week from the initial quick hacks I did while on a screen sharing call with my friend (for whom I was writing it). ... Record audio, somehow kill recording process, transcribe, get it into clipboard

gray seal
pliant elbow
untold crystal
icy orbit
#

does anyone know of a way to have timestamps is this possible?

whole garnet
#

Hi! someone can help me ? I am trying to use the whisper API with the openia node package, trying to send a local file and I'm getting error 400. Someone knows how May I send this ?

paper schooner
#

hello guys, i want to use only whisper and python(libraries ) to transcribe Youtube videos. Any ideas?

vocal mica
#

Does anyone know how the translation task works? I want to output the translated + non translated transcription without making it transcribe twice, does anyone know how?

main current
#

Tails?

cold zealot
main current
#

So thanks for the charge openai team just take it next time alerts feel like threats

#

You have all my info I sent it there directly.

#

Since grade school

#

Thank you #AIbuddies

still moon
#

Man. Isn't there some clever way of automatically detecting and removing noise from voice audio?

main current
#

Sure but hiding is silly. If it's digital it's done tbh

#

That's my opinion

#

I'm just a random internet handle

vale sorrel
#

Is it possible to use Pytorch/Whisper with an AMD GPU?

clever ravine
#

Anyone here knows tinygrad discord?

lavish ermine
#

Is the no thread for developers using APi's from chatGPT or GPT?

solemn root
solemn root
digital surge
#

response = openai.Image.create(
prompt="a white siamese cat",
n=1,
size="1024x1024"
)
image_url = response['data'][0]['url']

still moon
#

I don't get it. Some pages say to dl common voice training set in a specific language, but.. when I pick English the file still has tons of languages in it.

#

I'm just trying to get something that can help me isolate a patient's voice to prepare labeled training data for fine tuning whisper

#

Their whispy ventilator-breathing voice isn't recognized by anything we've ever tried

#

Maybe the datasets python api let's me be more specific.. still.. 60gb.

stuck cipher
#

Hey does anyone know the python equivalent of

--form file=@openai.mp3 \
--form model=whisper-1 \
--form response_format=text
#

specifically the text response format

stuck cipher
#

nvm it's literally response_format="text"

frail cloak
#

Where can I use whisper?

normal loom
#

hi~~

#

im whispering rn

vale badge
#

is it possible to run the whisper model locally?

long tangle
#

@proper spire need to talk

#

Now

long tangle
#

@proper spire you stole my assets, answer me for goodness sake
Before the DMCA is pushing you up the list

autumn bolt
analog goblet
#

Hello! I'm looking for a developer to integrate Open AI on an ERP-PHP webapp.
Does anyone know a freelancer or programmer with experience in this kind of integration?

sharp parrot
small juniper
#

is it possible to run the whisper model

hot patrol
#

Hi! Does anyone know how to run the whisper on python with less VRAM than the requirement?

shrewd tiger
#

hey how it going

ember meadow
#

What is this

frozen spoke
#

Does whisper have IPA phonetics support?

limpid sedge
autumn bolt
#

m

slate hull
#

how do i fine tune whisper?

#

anyone got whisper working on react native?

slate hull
#

which platform?

hidden helm
#

Waiting for next update to the Whisper API. Specifically the whisper-2 model

pale marsh
#

Hello
Has anyone solved the problem of converting asr whisper to: PytorchScript, ONNX (later TensorRT)?
I need to convert the model for Nvidia Triton: (input - tensor, output - tensor), so ready-made options, such as suggested by huggingface, are not suitable.
Whisper divides the audio recording into segments of 30 seconds, I tried to convert the model with input: mel_segment tensor for 30 seconds of audio, the output tensor: token, which I can then decode into text.
Problems encountered:

  • convert to PytorchScript via jit.trace: the model remembers the output result of the tensor used in the conversion;
  • convert to ONNX: incorrect graph saved
autumn bolt
#

what are some great ideas to use whisper as a startup company in the medical and healthcare field

woven quail
#

transcription

onyx belfry
#

hey guys, I am trying to cut my whisper api sends into chunks that are below the limit, but it is making me frustrated. I am a hobbyist.

Can anyone point me to some python code that does this? I want to point to a file and have the code chunk it and send it so that each chunk is below the limit and then just concatenate the resulting strings. I know this is a simple problem, but I am just starting out working programmatically with audio.

remote oasis
#

Same

queen scarab
#

Real-time speech-to-text is kind of a whole different animal. I believe most still save as a file, but much smaller. I have not used whisper but have built another app that chunks the data into 0.5 second chunks, adjusts the dB levels and then puts back together, then chunks it by silence of 0.75 seconds or longer and processes each chunk from there

#

again, not Whisper, but I did use Python

final wolf
#

My script for whisper API doesn't work anymore. I didn't change anything about it, but haven't used it in over a month.

Did something change?
openai.error.InvalidRequestError: 1 validation error for Request body -> file Expected UploadFile, received: <class 'str'> (type=value_error)

I'm just confused because it used to work before

#

The code is quite simple:
`openai.api_key = config.openai_APIKEY

audio_files = sorted(os.listdir("parts"))

transcription_text = ""

for audio_file_name in audio_files:
audio_file_path = os.path.join("parts", audio_file_name)
audio_file = open(audio_file_path, "rb")
transcription_object = openai.Audio.transcribe("whisper-1", audio_file)
print(transcription_object)

part_transcription_text = transcription_object["text"]

transcription_text += part_transcription_text + "\n"`
final wolf
#

You could easily expand this script to have the API transcription done as well. I just decided to keep it separate so it would be easier for me to troubleshoot

#

This script is not perfect though, because it doesn't do anything to prevent the audiofile from being cut in the middle of a sentence or word even

snow rover
#

theres suddenly a 50% error rate for whisper API calls now

stark laurel
onyx belfry
snow rover
remote oasis
#

I am trying whisper-1 API transcription model first time for YouTube videos. It is fast but most of time is spent on downloading/converting the video.

For 1 hour video it took 30mins which too long. Am I missing sth or is what everyone else doing?

Isn’t there some faster way to do it?

small juniper
#

Let me know if anyone compares Whisper to Meta's new MMS model on English.

onyx belfry
#

@remote oasis I would probably use yt-dl and ffmpeg to strip the audio?

remote oasis
distant ibex
#

what next

still moon
#

@remote oasis not sure about yt-dl, but yt-dlp is fast.. choose an audio-only, not the video (unless you want the video data for some reason)

#

oh, you have the audio url, n/m. But do use the alternate one which simulates being a mobile device and stuff.

#

here's a little bash script I wrote yt-dl-mp3:

#
#!/bin/bash
ytdlbin=yt-dlp
quality=2
url=
help=

for a in "$@"; do
    if [[ "$a" =~ ^-[0-9]$ ]]; then
        quality="${a#-}"
        echo -e "\\033[1mQuality set to: $a (0 is highest)\\033[0m"
    elif [[ "$a" =~ // ]]; then
        url="$a"
    elif [[ "$a" = -h || "$a" = "--help" ]]; then
        help=1
    fi
done
if [[ "$help" = 1 || "$url" = "" ]]; then
    echo "Usage: yt-dl-mp3 [-#] [-h] url"
    echo "Where: -# is compression (-0 is lowest. Default -2)"
    exit
fi

printf "%s " "$ytdlbin" "$url" -x --audio-format mp3 --audio-quality "$quality"
echo "Where: -# is compression (-0 is lowest. Default -2)"
read -p 'Enter to proceed with default (unless you set it)...' -t 5
"$ytdlbin" "$url" -x --audio-format mp3 --audio-quality "$quality"
#

Does anyone know for fine-tuning whisper if we should use noisy audio? I'm recording training data of someone and recording some while the vacuum is on.

#

will it make the model more robust?

still moon
#

Also, I can't find info on if we can provide noise training data

upper dagger
#

go to github and look at the whisper forks. there are some that will do this

small juniper
#

Also I can t find info on if we can

still moon
#

How much training data do we need to fine-tune? The person has a very unique voice. Their pronunciation of various syllables is affected by them being on a ventilator, and they can only speak in very short phrases (for the same reason).

#

i'm manually transcribing for the training data so .. it's a bit tedious (but worth it.. just would like to know how far to carry this) 🙂

#

because me loves her

timber canyon
#

Ahhhhh, frustratin

gritty palm
#

i feel youy

strong stag
#

Question: Is 'google cloud speech to text' better than 'Open ai Whisper?'
Context: Ive been creating a project which is highly dependent on real-time voice transcriptions, Ive had a chance to integrate google cloud api for their Speech-to-text service, its alright but fails to provide accurate real-time transcriptions for some reason. I am running tests from my internal microphone.

#

IMO Google has the ability to create the better of two services as they logically have way more data than open ai, but open ai seems to moving way quicker when it comes to development, implementation, and use case

runic pulsar
#

I'm trying to create a Discord bot with Whisper capabilities, so I need my requests to be async. I am downloading .mp3 files from Discord, then I am trying to upload them to the Whisper API. It isn't accepting my requests, and it returns this error:

{
        "message": "Could not parse multipart form",
        "type": "invalid_request_error",
        "code": null
}

Here is the code I am using:


    async with aiohttp.ClientSession() as session:
        filename = "test.mp3"
        async with session.get("https://cdn.discordapp.com/attachments/1097219558466658354/1112417715316076775/AI_Test_Kitchen_toetapping_footstomping_americana_1.mp3") as resp:
            if resp.status == 200:
                audiodata = await resp.read()
...

                headers = {
                    "Authorization": "Bearer " + openai.api_key,
                    "Content-Type": "multipart/form-data"
                }
                form = aiohttp.FormData()
                form.add_field('file', audio, content_type='audio/mp4')
                form.add_field("model", "whisper-1", content_type="text/plain")


                resp = await session.post(url="https://api.openai.com/v1/audio/transcriptions", data=form, headers=headers)
                print(await resp.text())

                
            await session.close()

Does anyone know why this is?

random yacht
#

baboonalism

last quarry
#

how can i use whisper

#

i am not able to find the api key

#

to use whisper

compact fractal
#

me too

spiral bolt
#

how can i connect my newly created rails app with newly created next js project

remote oasis
fresh jewel
#

heyaa, im sorry, i'm still very new to using openai. Does anyone know how i can make an automatic whisper? By that i mean it always detects input from my mic, and when my mic goes silent, it saves the input and goes back to the detect input state. Is that possible?

small juniper
#

How much training data do we need to

autumn bolt
#

Hey anyone knows how tk create a database sql with chat gpt?

still moon
#

Any guidance on where my time start/ends should be on labeled audio (without me having to download some huge data set just to see a few samples)?

still moon
#

(blurred for privacy)

#

my fine tuning got down to .02x loss (not sure what loss function was used)

iron oar
still moon
#

She speaks in 1 to two words bursts. Can whisper even learn this?

still moon
#

I don't know how to find out except to keep possibly wasting time labeling things .. the fine tuning I did last time was garbage. Like one or two times she made a click sound with her mouth -- I labeled it as "tch".

#

In a subsequent test, of the find-tuned model, whisper transcribed EVERY word of hers as "tch"

autumn bolt
#

How to resolve this error?

small juniper
#

Any guidance on where my time start ends

weary arrow
#

Are the large models better than medium.en if I only need english recognition?

marble heart
#

amougus

small juniper
#

Are the large models better than medium

mossy hemlock
#

Has anyone here tried tackling the 25MB limit in NodeJS for Whisper? If so - what's the ideal way about it?

rich hawk
rich hawk
mossy hemlock
candid flame
#

What true happiness looks like. 38 minute done flawlessly under medium

still moon
#

Yeah, no that's not what she said 😦

#

:} :/

still moon
polar gale
#

What am I supposed to set for model? whisper-1 return 404-model doesn't exist

patent shale
patent shale
cinder pewter
#

Hello guys!

#

Any idea why Im getting this error when trying to connect to CODEGPT with my OPENAI Api Key???

#

Can I use my OPENAI API key with many different services all at once?

#

or would I have to get a new OPENAI Account?

tepid lynx
#

Hi! In the chart with models in whisper github there is "Required VRAM" column - the memory listed in this column is the memory required to transcript at a reasonable rate only one audio-files at a time? And if yes - then if I need for example to process via base model ("required ~1gb vram") two audio-files per moment, then I need ~2 gb vram, and if with large model ("required ~10 gb vram") - then ~20 GB for 2 audio-files at the same time and so on? And what will be the speed?

cinder pewter
#

please help me guys!!

sinful thunder
cinder pewter
#

Nooooo but Im getting this error ALWAYS

#

Ive never been able to use the codegpt extensoin, ever

fossil sonnet
#

Not sure who to ask .. so here i am .. i am trying to built a website where folks can record voice and it will transcribe and use gpt4 with prebuilt prompt to display a certain output .. which can be copied and pasted somewhere … dont have a big budget with limited coding skills … is there a github link for this .. i can see the code at? Is this possible using google colab?

candid prism
#

how to make clock

daring hearth
#

if i deployed my own whisper model does the data (the voice coming from my user or the text generated by whisper) go to open ai by any chances?

normal grove
#

[For Hire][Full-stack][Blockchain][Mobile][Remote][Full-time]

I am a senior full-stack and blockchain engineer with 5+ years of professional and extensive experience.
Fully understandable at the requirement in a few mins and make a perfect result which makes customers satisfied.

List of my professional skillset
💼 JavaScript(TypeScript) and its frameworks; Node.js (Express, Nest.js), React.js (Redux, Next.js), Angular (Ngrx, Rxjs, v1.0 ~ v9.0), Vue.js (Vuex, Nuxt.js)
💼 PHP (Laravel, CI), Python, Django
💼 React-Native, Flutter, Native Script
💼 Restful API, OpenAI, ChatGPT, Langchain, Pinecone, AWS
💼 Web3 technology, Smart Contract (Solidity, Rust), ERC tokens (ERC20, ERC721, ERC1155, ERC4337), Ethereum, Solana networks

I am always focusing on the product quality first and professional codebase implementing OOP at high level.
I am ready to make your dream true so feel free to contact me anytime.

Best regards

mossy hemlock
daring hearth
weary arrow
candid junco
#

Hey all, what is everyone using here as their computational power? Are you simply using your local environment or using a GPU cloud?

azure axle
#

Hey guys! I've been working in the last few months in a project using whisper + gpt that I'm really excited about:

Kaption AI is a Chrome extension that transcribes and summarizes WhatsApp Web audios and chats. No more sifting through lengthy audio messages or struggling to keep pace with group chats - Kaption AI can convert audio into text and summarize long threads at the click of a button.

It's a tool that can be useful for business professionals, students, journalists, and people with hearing difficulties - anyone who wants to make their digital communication more efficient and accessible.

The development of Kaption AI was largely inspired by the groundbreaking work done by OpenAI, and the belief in the transformative potential of AI technology. It's my humble attempt to contribute to the ongoing revolution in our digital interactions.

Although I can't share a direct link here due to community guidelines, you can easily find Kaption AI on the Chrome Web Store by searching for "Kaption AI" in any browser. I would greatly appreciate it if you could give it a try and share your thoughts. Your feedback is incredibly valuable in helping me refine and enhance this tool.

Please rest assured that your privacy and security are of utmost importance. Kaption AI does not collect or store any of your messages. It adheres strictly to the latest security protocols and standards, ensuring it's secure from potential threats.

I'm curious - how do you currently handle lengthy audio messages and chats in WhatsApp? And what features would you be interested to see in future updates of Kaption AI?

Looking forward to hearing your thoughts. Thanks for your time!

azure axle
#

Thank you for your question. While I greatly appreciate the value and contributions of open source projects, Kaption AI is currently not open source software. It's part of a commercial endeavor that I'm investing significant time and resources into. However, I'm trying to think of ways to make it transparent for everyone to see that I'm not storing people's conversations or doing anything weird. What do you suggest?

daring hearth
azure axle
#

Ive tried both. Using whisper on my own servers and using API. API is cheaper and more reliable

candid junco
#

Hey All,

I've ben testing a product i've been working on and GPU Clouds are getting super expensive.

My product focuses on transcription utilizing OpenAI Whisper Base & Large Model. Most GPU Clouds I research are used for ML/Deep Learning and appears really only for visual/graphics/art/video editting etc. I feel as if transcription is a much lower use of computation power than using something like stable diffusion. Does anyone here have insight on what computation is needed for my use case?

I am seriously considering in building out a low end deep learning machine.

Note: Transcription works on my PC when I run the API to my local environment

woven folio
#

I wonder how good will Apple's on device speech recognition be compared to Whisper

sacred forum
#

Hey everyone, What are all the features that Whisper actually covers.

The info I have on the features is limited and just
Speech recognition
Speech transcription

And I know there are more features. Do anyone have an idea of all the features or like a documentation I can read.

P.S The documentation of whisper on GitHub doesn't list all features.

late hemlock
heady iron
#

im trying to find the function that stores the probabilities of each words in a set. anyone know where is it?

#

is it in the beam search class?

median abyss
#

Hey guys, have you experienced a significant drop in performance after trying to load a HF checkpoint coverted to openai format? If I use ggml c++ conversion tool I cannot load the fine-tuned checkpoint at all... have you had problems with fine-tuned whispers?

slim blaze
autumn bolt
slim blaze
#

yep, not just accent you know. For example with Indian / Spanish accents, it's still comprehensible, but there are other parts of the world they completely mispronounce its, like randomly obmitting syllables, but because the training data is so good, whisper can still convert it to (surprisingly) correct word. This is good in most of the cases, but not good in training English for them

blazing cave
paper schooner
#

someone knows where to download deferent wave audio files of 1 second to train a model?

heady iron
#

because you have subtitles and the exact timestamps

vapid horizon
#

How can I get the word level transcriptions from whisper in node? I think there are options available in python but not able to find anything in Node JS!

Some of the output given by whisper is too long in a single timestamp.

leaden turtle
#

How do i use Whisper?

granite aspen
#

We are looking for an OpenAI fine-tuning expert.
You must have experience with this.
Don't worry about your budget.
Your good skills are needed.
If you are really an expert please contact me.

fringe karma
#

Can I use my API key credits with whisper so the transcription occurs remotely instead of on my PC? I don't have a dedicated GPU so it takes soooo long to get a transcription. I would like to see if I can speed it up at the cost of some credits

odd mirage
odd mirage
#

Found the directory in Windows for the model library: C:\Users\{YourUserHere} \.cache\whisper

fringe karma
#

Found the solution. Work very quickly now!

queen solar
past solar
normal grove
#

[For Hire][Full-stack][Blockchain][Mobile][Remote][Full-time]

I am a senior full-stack and blockchain engineer with 5+ years of professional and extensive experience.
Fully understandable at the requirement in a few mins and make a perfect result which makes customers satisfied.

List of my professional skillset
💼 JavaScript(TypeScript) and its frameworks; Node.js (Express, Nest.js), React.js (Redux, Next.js), Angular (Ngrx, Rxjs, v1.0 ~ v9.0), Vue.js (Vuex, Nuxt.js)
💼 PHP (Laravel, CI), Python, Django
💼 React-Native, Flutter, Native Script
💼 Restful API, OpenAI, ChatGPT, Langchain, Pinecone, AWS
💼 Web3 technology, Smart Contract (Solidity, Rust), ERC tokens (ERC20, ERC721, ERC1155, ERC4337), Ethereum, Solana networks

I am always focusing on the product quality first and professional codebase implementing OOP at high level.
I am ready to make your dream true so feel free to contact me anytime.

Best regards

static grove
desert mist
#

hi, I try to use whisper api in order to make subtitles for a video. The problem is, that there are 2 languages used in the video and in the generated transcript, I get all the text in first language and all the text in second language gets translated to the first one. Is there any way to keep the languages as in the audio?

ember ridge
#

can I use openai whisper as a live speech-to-text? As in, it gets what I say from the mic and it transcribes it?

vast tendon
#

Hello im a little noob with whisper, i have a question, i need to transcribe a 9 hours of a video and i need divide in segments of 30 second, and my question its its better transcribe the 9 hours at one, or transcribe de for example 1100 files of 30 seconds?

#

or exist a more efficient model?

#

i using a R7 5700U bcs I'm not in mi home and i using my laptop

small juniper
#

Hello im a little noob with whisper i

tawdry glade
#

can i have iformation on langchain pls go prv

fast chasm
#

Any updates regarding this post on github?

#

I've been debating buying a mid-tier PC for Whisper as I transcribe Arabic as freelance, but I transcribed some videos using Whisper, model base, and the results were a complete mess that I was better off doing it manually

grand plaza
#

Hey I have been trying to get whisper to work on a raspberry pi, but when I try install it, it fails because it depends on torch? (using the api) and with python integration

cerulean flint
#

Hey there, i am looking for a way to format the .txt exports after i run the files through whisper - has anyone found a nice way to to it? don't need necessarily the timestamps, but a more readible way would be nice. For now i use an online tool for auto line break - would like to have it in my code - i run whisper via GoogleCollab

velvet helm
cerulean flint
velvet helm
hot river
#

Anyone else get a totally random text back from whisper? Like it kinda sent the wrong response? :p

#

A whisper version of hallucinations?

#

It's very, very rare but it happens

jovial compass
#

does anyone know how to use whisper in nodejs?

paper schooner
#

hello every one can someone please help me with my model, i'm actually working on an AI model that gives keywords prediction (existance) in audio files, i trained my model with 30000 audios of 1 second the metrics shows that the model training was great actually but i'm still getting errors in the predictions, HELP PLEASE !

jovial compass
#

Please someone help me with this below code error

const filePath = path.join(__dirname, '../../', 'temp.mp3');

    const formData = new FormData();
    formData.append('model', model);
    formData.append('file', filePath);
    axios
      .post('https://api.openai.com/v1/audio/translations', formData, {
        headers: {
          Authorization: `Bearer ${process.env.OPEN_AI_KEY}`,
        },
      })
      .then((response) => {
        console.log(response.data);
      })
      .catch((err) => {
        console.log(err.response);
      });
#

Error:

data: {
    error: {
      message: '1 validation error for Request\n' +
        'body -> file\n' +
        "  Expected UploadFile, received: <class 'str'> (type=value_error)",
      type: 'invalid_request_error',
      param: null,
      code: null
    }
  }

#

Can you please resolve me this error

paper schooner
# jovial compass Can you please resolve me this error

const fs = require('fs');
const path = require('path');
const FormData = require('form-data');
const axios = require('axios');

const filePath = path.join(__dirname, '../../', 'temp.mp3');

const formData = new FormData();
formData.append('model', model);

// Read the file as a stream
const fileStream = fs.createReadStream(filePath);
formData.append('file', fileStream);

axios
.post('https://api.openai.com/v1/audio/translations', formData, {
headers: {
Authorization: Bearer ${process.env.OPEN_AI_KEY},
...formData.getHeaders(), // Include the necessary headers for FormData
},
})
.then((response) => {
console.log(response.data);
})
.catch((err) => {
console.log(err.response);
});

paper schooner
jovial compass
#

Thanks @paper schooner

mossy hemlock
#

I know many people here splice and feed input to Whisper for longer videos / audios - but how do you deal with the subtitles being separate for each chunk? How do you combine them properly so they flow well with the final video you transcribed for?

plucky palm
#

whats the best whisper model to quickly and accurately transcribe long-form audio

small juniper
#

I know many people here splice and feed

#

Anyone else get a totally random text

#

whats the best whisper model to quickly

simple latch
#

Does anything like whisper exist that can transcribe an audio file of an unknown language directly into phonetic notation, something like X-sampa or IPA? Or maybe something simpler that only recognizes which language is spoken in a given audio file?

dull sable
#

I have a use case which involves creating transcriptions of long audio files (about ten minutes each) which include unpredictable, brief events of loud, non-speech audio, usually no more than 30 to 90 seconds in length. Whisper seems to stop transcribing at the first of these (usually with hallucinations), so most of my results are only the first half of the audio.

Is there any way around this? I would like to continue using the hosted API as opposed to running the open-source Whisper, though I understand it can be made more tolerant of this situation. Preprocessing is quite difficult as the audio levels are basically constant, but I'm open to ideas. The best one I have at the moment is to split the files based on brief silences which imply sentence breaks, but sometimes speech briefly overlaps with the non-speech sound, so I lose content.

signal hearth
#

Unleash your app idea with our Flutter development services for just $29 and make it a reality! DM us now!

unreal otter
# dull sable I have a use case which involves creating transcriptions of long audio files (ab...

I'm playing around with whiper and currently using Staplerfahrer Klaus from YouTube. I'm running whisper locally, but have used run it at least once using the API. The ranscription from the API seems to be good. Speech-to-text is in beta and the API version is limited in what options can be tweaked compared to running it locally. You don't happen to have a sample of a file that fails?

pallid pond
#

Hi, is it possible to change sliding window interval from 30 seconds to something smaller?

dull sable
olive sail
#

birthday wish for a work colleague

pine palm
#

anyone faced timeouts when reaching out to whisper API ?
locally it works great, but when my server is deployed on remote environment (e.g AWS) I get consistent timeouts ..

amber sparrow
#

Hi, is it possible to integrate whisper to expo react native application to get real time transcript?

simple latch
cerulean flint
cerulean flint