#gpt-realtime | OpenAI | Page 4

restive reef Apr 22, 2024, 4:18 PM

#

just dont use windows 10 for whisper

If you did i am interest how.

unreal kraken Apr 22, 2024, 4:19 PM

#

I just followed this tutorial and everything was working until recently
https://www.youtube.com/watch?v=XX-ET_-onYU&t=488s

YouTube

TroubleChute

FREE & OFFLINE Audio to Text | Whisper: Install Guide | OpenAI Whis...

OpenAI has done some fantastic things. Whisper is a great project open to the public. Transcribe (Turn audio into text) for MANY languages, all completely for free and all from your computer. No subscription, no fees, nothing. Your data, your computer, your free unlimited transcription.

Whisper AI install guide: https://hub.tcno.co/ai/whisper/i...

▶ Play video

restive reef Apr 22, 2024, 4:21 PM

#

unreal kraken I just followed this tutorial and everything was working until recently https://...

so you installed it with the pip package manager than.

how about you start by pip list.

Ans you are using it via rust?

Well thanks for the side i am using it directly in linux. With a simple command.

#

it looks helpfull wanted to try out Tensorflow anyways.

unreal kraken Apr 22, 2024, 4:24 PM

#

Yesterday I tried uninstall all Whisper and I used command in powershell: iex (irm whisper.tc.ht) but it got worse...

#

I will try again do the whole installation

#

Btw are you using python 3.9.9 or leatest?

restive reef Apr 22, 2024, 4:25 PM

#

unreal kraken Yesterday I tried uninstall all Whisper and I used command in powershell: iex (i...

Well i thought whisper was not for windows. So no clue. Probably a windows problem.

restive reef Apr 22, 2024, 4:28 PM

#

unreal kraken I will try again do the whole installation

i have 3.12.1 installed

#

my whisper is not in the pakage list

restive reef Apr 22, 2024, 4:30 PM

#

unreal kraken Yesterday I tried uninstall all Whisper and I used command in powershell: iex (i...

i would try it with an absolute path if i would you.
for example. I first added whisper to the path.
Than i wrote a script with help of chat gpt.

Beispiel .sh datei.

#!/bin/bash

nummer=220
endnummer=233

pfad="/home/Nutzername/Dokumente/whisper_using/jenny/22_folgend/"

datei="${pfad}${nummer}.mp3"

model="base"
output_format="vtt"
target_location=""

echo "datei DATEI DATEI DATEI= ${datei} "

whisper "$datei" --model "$model" --output_format "$output_format"

Schleife durch die Nummern

while [ $nummer -le $endnummer ]
do
# Konstruiere den Dateinamen
datei="${pfad}${nummer}.mp3"
# Führe den Befehl aus
echo "Führe whisper für $datei aus"
whisper "$datei" --model "$model" --output_format "$output_format" --output_dir "$pfad"

# Erhöhe die Nummer für die nächste Iteration
nummer=$((nummer + 1))
done

whisper /home/Nutzername/Dokumente/whisper_using/jenny/12_jenny.mp3 --model base --output_format vtt

unreal kraken Apr 22, 2024, 5:56 PM

#

@restive reef
YES!!!
I have it
It works
I can't have any space in this line of names
and I have to do it in download folder
Thanks for help ❤️

dense jetty Apr 22, 2024, 6:48 PM

#

Deez

drifting echo Apr 22, 2024, 7:23 PM

#

wondering if there's a way to use whisper to transcribe only the primary speaker in a file? Imagine something really long like a lecture with questions from the audience at the end. Primary speaker has way more presence. I'd like to transcribe just the speaker.

anything open in the space capable of doing that?

gilded oasis Apr 23, 2024, 8:23 AM

#

Hello all. I have created a python whisper app for my company to transcribe videos. Right now it works great with given videos on the server, however, I would like the option to re-process specific parts of the videos where the transcription gave no results or hallucinations, for example, in a 1hour long video, I would like to be able to say 'reprocess from 00:05:15 to 00:07:20'.
Is there some way I can give these parameters directly to whisper?

thick agate Apr 23, 2024, 5:46 PM

#

gilded oasis Hello all. I have created a python whisper app for my company to transcribe vide...

Hello. I think the only option is to cut the video yourself (using python.. or something else), give the new video to whisper and change it then in the transcript.

restive reef Apr 24, 2024, 10:49 AM

#

unreal kraken <@1070327944695779388> YES!!! I have it It works I can't have any space in this...

never put spaces into your folders if you work with programs. Thats why i use underscores.

dense jetty Apr 25, 2024, 6:47 PM

#

E

carmine scroll Apr 28, 2024, 7:45 AM

#

Hi, I've an error with my code.

There is my code:

const openai = new OpenAI({ apiKey: "my-openai-api" });

async function transcription() {
  console.log("Démarrage de la transcription...");
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("audio_${time}.mp3"),
    model: "whisper-1",
  });
  console.log("Fin de la transcription...");
  console.log(transcription.text);
}

But I've this error:

Error: APIConnectionError: Connection error.
    at OpenAI.makeRequest (C:\Users\Eleve\Documents\Videos\node_modules\openai\core.js:292:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async transcription (C:\Users\Eleve\Documents\Videos\index.js:157:25)
    at async C:\Users\Eleve\Documents\Videos\index.js:74:5 {
  status: undefined,
  headers: undefined,
  request_id: undefined,
  error: undefined,
  code: undefined,
  param: undefined,
  type: undefined,
  cause: FetchError: request to https://api.openai.com/v1/audio/transcriptions failed, reason: read ECONNRESET
      at ClientRequest.<anonymous> (C:\Users\Eleve\Documents\Videos\node_modules\node-fetch\lib\index.js:1501:11)
      at ClientRequest.emit (node:events:519:28)
      at TLSSocket.socketErrorListener (node:_http_client:492:9)
      at TLSSocket.emit (node:events:531:35)
      at emitErrorNT (node:internal/streams/destroy:169:8)
      at emitErrorCloseNT (node:internal/streams/destroy:128:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
    type: 'system',
    errno: 'ECONNRESET',
    code: 'ECONNRESET'
  }
}

#

I've the free trial API idk if its the cause

candid siren Apr 28, 2024, 11:10 AM

#

Friends in the community, we encountered a problem during the use of whisper and wanted to ask if anyone has experience with it. At what decibel does the whisper speaker's voice maintain? WER's performance is relatively excellent.

社区的朋友们，针对whisper 我们使用的过程中遇到一个问题想问问大家谁有经验。whisper的讲话者声音维持在多少分贝，WER得表现是比较优异的情况呀。

hazy grotto May 1, 2024, 10:35 PM

#

carmine scroll Hi, I've an error with my code. There is my code: ```js const openai = new Ope...

shh we need to whisper like the chat name tells us

carmine scroll May 2, 2024, 3:54 PM

#

hazy grotto *shh we need to whisper like the chat name tells us*

👀

karmic raft May 3, 2024, 7:19 PM

#

carmine scroll I've the free trial API idk if its the cause

you're right. you should purchase a paid key

carmine scroll May 3, 2024, 7:48 PM

#

sad... Thanks for the answer

fallow narwhal May 6, 2024, 9:02 AM

#

Hello everyone
I'm currently workin on Whisper to specialize it in French railway language. I'm facing some issues with transcribing amnigous words, as well as recognizin station names. Initially, i tried training it with audio file totaling 2 hours, but the results didn't meet my expectations. I then turned to usings prompts, which solved thé ambiguity problème, however since the context size is limited to 244 tokens, i can't include aller station names.

Could you please provide me with some tips? I'm new to this field.
Thank you

brave kestrel May 8, 2024, 10:53 PM

#

fallow narwhal Hello everyone I'm currently workin on Whisper to specialize it in French railwa...

imo the transcription of random words is hard to avoid. I'm assuming you mean things like "Thanks for watching!". Maybe use a speech detection prior to whisper perhaps?

#

I haven't used whisper offline much but the time I did it didn't perform well out of the box. the api has pretty good results but still has the "hallucinations"

fallow narwhal May 9, 2024, 11:27 AM

#

brave kestrel imo the transcription of random words is hard to avoid. I'm assuming you mean th...

Actuctially my main problem is to transcribe station names for exemple " La Défense".
I tried usings embedding to correct the transcription in post processing but didn't get the results i wanted .

upbeat totem May 12, 2024, 11:50 PM

#

How are you guys using the Whisper API to create subtitles for 2-hour videos?

fallen bramble May 12, 2024, 11:52 PM

#

fallow narwhal Actuctially my main problem is to transcribe station names for exemple " La Défe...

Have you tried post-processing with an LLM? There’s an example here: https://platform.openai.com/docs/guides/speech-to-text/improving-reliability

rapid spindle May 13, 2024, 1:05 PM

#

has anyone tried the wishper v3 ? does it significantly improve the v2 or not?

ruby agate May 13, 2024, 5:41 PM

#

Guess open source whisper is a dead end, glad openai folded it into their proprietary models

loud plinth May 13, 2024, 10:03 PM

#

I'd like to know if there will be enhancements to Whisper with SSML or a similar markup which will help with enunciation, pronounciation, accentuation, tone, and other features noted today in the presentation.

brave kestrel May 13, 2024, 11:15 PM

#

ruby agate Guess open source whisper is a dead end, glad openai folded it into their propri...

Not clear because they say large-v2 is used for the api

fair pecan May 14, 2024, 8:04 AM

#

Has anyone noticed this with whisper?
When there is no audio input it's sometimes outputs something like "welcome to chatgpt" or "thank you for watching"

#

It also outputed this, on real-time chat with chatgpt

#

This transcript was recorded on October 12, 2021. Thank you for participating in this webinar.

static sage May 14, 2024, 8:43 AM

#

Hi,

I’m using the openAI API, I’m trying to get short segments (~4 words) with timings & punctuations. What I went through:

API doesn’t allow to set the number of words per segment
Thought I could build it from words level transcribe → there is no punctuation there, also characters like - and ' weirdly managed
Thought I could merge text or segments with words (I can get punctuation from text and timing from words)

Until I noticed a few things between text/segments and words:

Text might differ. Literally having word in words that totally not exist in text/segments
Timestamps is a big mismatch between words and segments
No punctuation in words
Words containing ' or - like « it’s » in some language would be consider as one word, in other language as two word.
This makes merging segments and words difficult since there is not the same amount of words in both side and rules on specific characters differ depending on the language
Did anyone succeed getting a word based transcribe with punctuation and level timestamp with the API or short segments ?

Thank you

vocal veldt May 14, 2024, 9:11 AM

#

fair pecan Has anyone noticed this with whisper? When there is no audio input it's sometim...

Yup have had similar issues. It tends to hallucinate quite a lot actually

loud plinth May 14, 2024, 3:16 PM

#

Servers are having load issues now. All of the services are getting that in one way or another.

rough sequoia May 14, 2024, 7:32 PM

#

it would be nice to have a date of when the voice engine (or new whisper) is going to be released

crystal yew May 15, 2024, 10:14 AM

#

Hello there !
I am Loïc Founder of the french startup Callifly. I'm looking for our french speaking future CTO to complete our Core Team (possibility of equity entry). Already made up of 4 members (company of 8 people all remotely) I am looking for an AI expert to automate our solution.
We specialize in recovering abandoned shopping carts over the phone for e-retailers. More generally, we want to become the voice of e-retailers by selling, advising and supporting customers in their purchasing journey.
To start, I would like to work on a demonstrator (MVP type) and offer it to our clients for simple campaigns (Promo Code proposal for example). They are already excited about the idea.
First step, meet and discuss. 🙂
Who feels up to the challenge? Dm me.

alpine timber May 15, 2024, 1:46 PM

#

Simple question about Whisper. Does it pad audios shorter than 30s to 30s?

#

Also have the same question as this guy here; although I'm using faster-whisper and I observed the same behavior.
Related discussion: https://github.com/SYSTRAN/faster-whisper/discussions/837

GitHub

Question on transcription time vs. input audio duration · SYSTRAN f...

Hello. Recently I ran a benchmark on a fine-tuned whisper-small model. Most audio files were pretty short, below 10s. The results are as follows, where the y-axis is the time to transcribe using fa...

urban parcel May 20, 2024, 10:16 PM

#

ॐ गं गणपतये नमः – Language bot test

#

चिद्धर्मा सर्वदेहेषु विशेषो नास्ति कुत्रचित् ।
अतश्च तन्मयं सर्वं भावयन्भवजिज्जनः ॥ १०० ॥

#

𑆃𑆤𑆴𑆫𑆾𑆣𑆩𑇀 𑆃𑆤𑆶𑆠𑇀𑆥𑆳𑆢𑆩𑇀 𑆃𑆤𑆶𑆖𑇀𑆗𑆼𑆢𑆩𑇀 𑆃𑆯𑆳𑆯𑇀𑆮𑆠𑆩𑇀 𑇅
𑆃𑆤𑆼𑆑𑆳𑆫𑇀𑆡𑆩𑇀 𑆃𑆤𑆳𑆤𑆳𑆫𑇀𑆡𑆩𑇀 𑆃𑆤𑆳𑆓𑆩𑆩𑇀 𑆃𑆤𑆴𑆫𑇀𑆓𑆩𑆩𑇀 𑇆
𑆪𑆂 𑆥𑇀𑆫𑆠𑆵𑆠𑇀𑆪𑆱𑆩𑆶𑆠𑇀𑆥𑆳𑆢𑆁 𑆥𑇀𑆫𑆥𑆚𑇀𑆖𑆾𑆥𑆯𑆩𑆁 𑆯𑆴𑆮𑆩𑇀 𑇅
𑆢𑆼𑆯𑆪𑆳𑆩𑆳𑆱 𑆱𑆁𑆧𑆶𑆢𑇀𑆣𑆱𑇀𑆠𑆁 𑆮𑆤𑇀𑆢𑆼 𑆮𑆢𑆠𑆳𑆁 𑆮𑆫𑆩𑇀 𑇆

#

𑆃𑆤𑆴𑆫𑆾𑆣𑆩𑇀 𑆃𑆤𑆶𑆠𑇀𑆥𑆳𑆢𑆩𑇀 𑆃𑆤𑆶𑆖𑇀𑆗𑆼𑆢𑆩𑇀 𑆃𑆯𑆳𑆯𑇀𑆮𑆠𑆩𑇀 𑇅
𑆃𑆤𑆼𑆑𑆳𑆫𑇀𑆡𑆩𑇀 𑆃𑆤𑆳𑆤𑆳𑆫𑇀𑆡𑆩𑇀 𑆃𑆤𑆳𑆓𑆩𑆩𑇀 𑆃𑆤𑆴𑆫𑇀𑆓𑆩𑆩𑇀 𑇆
𑆪𑆂 𑆥𑇀𑆫𑆠𑆵𑆠𑇀𑆪𑆱𑆩𑆶𑆠𑇀𑆥𑆳𑆢𑆁 𑆥𑇀𑆫𑆥𑆚𑇀𑆖𑆾𑆥𑆯𑆩𑆁 𑆯𑆴𑆮𑆩𑇀 𑇅
𑆢𑆼𑆯𑆪𑆳𑆩𑆳𑆱 𑆱𑆁𑆧𑆶𑆢𑇀𑆣𑆱𑇀𑆠𑆁 𑆮𑆤𑇀𑆢𑆼 𑆮𑆢𑆠𑆳𑆁 𑆮𑆫𑆩𑇀 𑇆

#

anirodham anutpādam anucchedam aśāśvatam .
anekārtham anānārtham anāgamam anirgamam ..
yaḥ pratītyasamutpādaṃ prapañcopaśamaṃ śivam .
deśayāmāsa saṃbuddhastaṃ vande vadatāṃ varam ..

urban parcel May 21, 2024, 2:59 AM

#

𑆪𑆼 𑆣𑆩𑇀𑆩𑆳 𑆲𑆼𑆠𑆶𑆥𑇀𑆥𑆨𑆮𑆳 𑆠𑆼𑆱𑆁 𑆲𑆼𑆠𑆶𑆁 𑆠𑆡𑆳𑆓𑆠𑆾 𑆄𑆲 𑇅
𑆠𑆼𑆱𑆚𑇀𑆖 𑆪𑆾 𑆤𑆴𑆫𑆾𑆣𑆾 𑆍𑆮𑆁 𑆮𑆳𑆢𑆵 𑆩𑆲𑆳𑆱𑆩𑆟𑆾 𑇆

amber schooner May 22, 2024, 3:34 AM

#

hey Rule 4 is no spamming, you are gonna get kicked for posting that in muliple channels if you aren't careful @arctic plover

steep shadow May 22, 2024, 12:37 PM

#

Hello. I don't know why, since almost 2 weeks, Whisper API seems to not be able anymore to transcribe lyrics from songs, where it was the best at this game a few weeks ago... Why???

#

I've got this on a song like Paradise from ColdPlay, where it was able to clearly transcribe the lyrics 2 weeks ago 😕

steep shadow May 24, 2024, 12:59 PM

#

Can please someone at open ai explain us what happend??

weary skiff May 25, 2024, 5:37 PM

#

Hey, does anyone here have experience with insanely-fast-whisper? I've managed to get it working with flash attention 2, but to achieve the best final results, I need to utilize Spacy and the PunctuationModel for optimal quality. However, I'm encountering an issue with obtaining word-level timestamps to complete this task. Any ideas on how to address this?

bright flax May 26, 2024, 2:12 AM

#

So it seems OpenAI is dynamically changing the amount of usage we get with a plus subscription even after we have subscribed... I'm curious how this is okay? You can't change the terms of a contract after both parties have agreed to it?

#

4o has changed from 50 prompts to 40 prompts per 3 hours.. and these limits are not mentioned at all when subscribing for plus. Not trying to be rude just curious what a deb would have to say about this?

nimble pumice May 27, 2024, 1:31 PM

#

bright flax 4o has changed from 50 prompts to 40 prompts per 3 hours.. and these limits are ...

what??? just 40>??> how can i speak 40 sentences only? if its will be so short then im not buying plus

#

i just wanna speak to the new voice

dapper bobcat May 27, 2024, 1:38 PM

#

.

bright flax May 27, 2024, 3:39 PM

#

nimble pumice what??? just 40>??> how can i speak 40 sentences only? if its will be so short t...

Every 3 hours yup.

nimble pumice May 27, 2024, 8:01 PM

#

bright flax Every 3 hours yup.

do u have the new advanced voice already?

bright flax May 27, 2024, 8:38 PM

#

nimble pumice do u have the new advanced voice already?

Nope. Only alpha testers have it afaik.

dapper bobcat May 28, 2024, 7:50 AM

#

bright flax Nope. Only alpha testers have it afaik.

thought i read albaik for a moment

gilded fiber May 28, 2024, 7:30 PM

#

I am willing to pay for someone to help get the whisper API working for IOS and Mac OS
It only gets the first few seconds using Mobile IOS
Please DM ❤️ I am so tired

neat cedar May 28, 2024, 8:02 PM

#

gilded fiber I am willing to pay for someone to help get the whisper API working for IOS and ...

dumb question, but are you calling the api correctly?

gilded fiber May 28, 2024, 8:12 PM

#

Yes @neat cedar I am

#

The problem is with the Audio file itself

#

I found a work around using cloud convert but my application needs a straight through change and I really don’t want to use FFMMEG

neat cedar May 28, 2024, 8:22 PM

#

I'm going to guess you're streaming audio directly to the whsiper apis and you're probably only transcribing the first segment?

gilded fiber May 28, 2024, 8:28 PM

#

@neat cedar So I am using an audio file

#

It will pick up the first few seconds then poof

#

like it grabs the first word then breaks

#

I vow to open source a fix

#

Pls @ me ❤️ actively working on this

gilded fiber May 28, 2024, 8:52 PM

#

I figured it out

gilded fiber May 28, 2024, 9:37 PM

#

Ok so fluent-ffmpeg

#

IF YOU GOT STUCK WHERE I GOT STUCK

YOU NEED TO USE fluent-ffmpeg

SO YOU CAN USE Whisper IOS, Whisper on IOS, Whisper Mobile, Whisper Mac OS (Tags for those that get stuck)

Convert .wav -> .mp3 then process

No do not setup the recording as mp3 off the get go; for some reason you need to translate it into that format and the audio file will process fine

FOR DOCKER USERS

RUN apt-get update &&
apt-get install -y ffmpeg

CONVERSELY USE CLOUD CONVERTS API TO DO THE TRANSFER OF AUDIO TYPE

const ffmpeg = require('fluent-ffmpeg');

function convertWavToMp3(inputFile, outputFile) {
    // Set the PATH environment variable to include FFmpeg bin before conversion
    process.env.PATH = `C:\\FFmpeg\\bin;${process.env.PATH}`;

    ffmpeg(inputFile)
        .toFormat('mp3')
        .on('end', () => {
            console.log('File has been converted successfully');
        })
        .on('error', (err) => {
            console.error('An error occurred: ' + err.message);
        })
        .saveToFile(outputFile);
}

// Example usage
convertWavToMp3('audio.wav', 'output.mp3');

still moon May 29, 2024, 6:10 PM

#

lame foo.wav 🙂

#

(the requirement "straight through change" is not exactly clear, imo) @gilded fiber

gilded fiber May 29, 2024, 6:13 PM

#

@still moon wym

gilded fiber May 29, 2024, 6:14 PM

#

still moon `lame foo.wav` 🙂

Use the node-lame?

#

Or conversion

still moon May 29, 2024, 6:14 PM

#

i was just throwing 'lame' out there.. i use the binary program from cli

#

but not for whisper.. haven't had the issue you're having with wav

#

@gilded fiber i wonder why yours stops.. wav file format issue? max size issue?

gilded fiber May 29, 2024, 6:17 PM

#

It’s all formats

#

I just figured out the best way to fix it just to convert file type

#

Works from wav to mp3

still moon May 29, 2024, 6:18 PM

#

i wrote (cgpt'ed up) a script to split my files into N mb chunks..

#

haven't yet done the part where re-writing timestamp offsets is done for reassembly though [haven't needed to]

#

from the sound of it, it still could be a size issue you hit? (mp3 bing much smaller than wav)

rare summit May 30, 2024, 5:33 PM

#

hello, I am new to openai and am trying to use whisper for speech to text translations. I've been trying to use it in the magic leap, and im finding some target architecture issues. it seems whisper requires arm64 but the magic leap requires x86_64. does anyone know if i can re-target whisper or if there's anyway i can still use it? or if there are other speech to text tools i can use?

lofty aurora May 30, 2024, 5:44 PM

#

rare summit May 30, 2024, 6:08 PM

#

lofty aurora

Not helpful—if you don’t think the question is worthy of answering please don’t reply

lofty aurora May 30, 2024, 6:08 PM

#

rare summit Not helpful—if you don’t think the question is worthy of answering please don’t ...

ChatGPT may very well provide the answer

brave kestrel May 30, 2024, 6:59 PM

#

If you're the one building stuff chatgpt almost always will steer you wrong on the details. It's not bad if you already know how to make something. maybe also if you're brain storming without needing a viable program

brave kestrel May 30, 2024, 7:01 PM

#

rare summit hello, I am new to openai and am trying to use whisper for speech to text transl...

what is the actual processor architecture ? x86_64 or arm

#

you can def run whisper on x86_64. I've done it on my system. can you tell me more about your setup

rare summit May 30, 2024, 8:51 PM

#

brave kestrel you can def run whisper on x86_64. I've done it on my system. can you tell me mo...

Yeah magic leap is an x86_64 device. My desktop is windows 11. I’m building from Unity and to build to the headset I have to use Android platform. I tried using a whisper Unity package, but it looks like for Android it only supports arm64

brave kestrel May 30, 2024, 8:56 PM

#

wut

#

x86_64 is the standard processor architecture on intel and amds

#

windows is an operating system that is often x86_64, but sometimes they have an arm version

#

android is almost always arm64

#

I might be missing something

rare summit May 30, 2024, 8:59 PM

#

I think the issue is that I have to use Android platform to build to the headset and specifically if I use Android I have to use arm64

brave kestrel May 30, 2024, 8:59 PM

#

gotcha

#

Have you looked at the android's apis for voice

rare summit May 30, 2024, 8:59 PM

#

But also I’m really new to this so I could be confused about something else or maybe there’s a better way to do this

#

Not yet—I’ll look into it. Thanks for your help!

brave kestrel May 30, 2024, 9:00 PM

#

https://developer.android.com/reference/android/speech/SpeechRecognizer#createSpeechRecognizer(android.content.Context)

#

you could probably find a way to run whisper on arm64 but depending on the hardware it may not run great

#

there's always the openai whisper api but not sure if that's going to work for your implementation

#

https://github.com/ggerganov/whisper.cpp mentions android support

rare summit May 30, 2024, 9:08 PM

#

Thanks I’ll try those out!

opaque willow Jun 3, 2024, 10:17 AM

#

model = WhisperModel('tiny', compute_type="int8")
segments, _ = model.transcribe("input.wav")
text = ''.join(segment.text for segment in segments)
return text

i am trying to use the whisper model and get it to work on cpu and i think the code i have written should work fine on cpu but it says that is needs cudnn to work, how can i fix this without needing cudnn?

#

tag me if u answer thanks

gilded fiber Jun 3, 2024, 8:49 PM

#

https://github.com/ricky0123/vad?tab=readme-ov-file

GitHub

GitHub - ricky0123/vad: Voice activity detector (VAD) for the brows...

Voice activity detector (VAD) for the browser with a simple API - ricky0123/vad

clever blaze Jun 5, 2024, 2:29 AM

#

whats this channel for

livid mauve Jun 5, 2024, 4:28 PM

#

clever blaze whats this channel for

This is for OpenAI's speech to text model called Whisper

gentle furnace Jun 6, 2024, 11:02 PM

#

Anyone knows an app for windows for TTS like the built-in one (win+h) but with an option to use external tts api like whisper?

I want to put the transcription into an any text field immediately without copy paste actions..

static stirrup Jun 10, 2024, 1:03 PM

#

tldr: whisper wont use cuda depsite torch detecting graphics card and cuda in same file

Soooo I have Cuda installed

and these in my venv:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
pip install -U openai-whisper

My code checks if cuda is available and it prints yes and my GPU:

import whisper
import torch
print(torch.cuda.is_available())
import torch

if torch.cuda.is_available():
    print(f"CUDA is available. Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA is not available. Using CPU.")

model = whisper.load_model("medium")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("test.opus")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

(venv) ➜  mssp-transcript python test.py
True
CUDA is available. Using GPU: NVIDIA GeForce GTX 1070

But it uses my CPU???? The source code of the whisper module also checks torch_cuda.is_avaibale() and runs on cuda if it is so I have NO idea why its running on the CPU

lunar mirage Jun 11, 2024, 9:21 PM

#

ハンバーガーやチーズバーガーはフライドポテトと一緒に食べると美味しいと思う

lone island Jun 16, 2024, 2:37 AM

#

I have a Korean show that has English subtitles files, but I would like to generate Korean subtitles using whisper.

Is there a simple way to use the english subtitle timings as a guide for the whisper generated transcription subtitles?

I would like to have the generated native Korean subtitles match the existing timings of the English subtitles.

#

I have ideas for how to hack together some stuff but I'd like to check first if there's an easy way

vocal veldt Jun 16, 2024, 8:34 AM

#

lone island I have a Korean show that has English subtitles files, but I would like to gener...

I think you can tinker with the timestamp_granularities parameter as demonstrated here
https://platform.openai.com/docs/guides/speech-to-text/timestamps

outer musk Jun 16, 2024, 3:09 PM

#

rare summit Thanks I’ll try those out!

friend you can also emulate//use a vm to get whisper running. Set emulation/vm to arm and run the model.

outer musk Jun 16, 2024, 3:10 PM

#

rare summit Thanks I’ll try those out!

oh, the person above links a cpp, so its in c++/c and you can just run that native lol

halcyon trail Jun 17, 2024, 5:29 PM

#

Hello, I'm looking for a TTS model that can handle caribbean creole or ocean indian creole.

torpid pine Jun 18, 2024, 8:59 PM

#

Shhh ya'll too loud

#

https://tenor.com/view/whisper-gif-20936283

Tenor

glad seal Jun 28, 2024, 2:33 PM

#

brave kestrel you could probably find a way to run whisper on arm64 but depending on the hardw...

Whisper.cpp uses quantized 4 bit whisper and works great on CPU only on my Pixel 6 up to the base model. There are also two Android apps I know implement it (Whisper Journal) and the same dev did a keyboard app that simply uses 10-30 second clips to use whisper as keyboard. Hasn't been updated in a bit but works great and is fast even in this Pixel 6.

brave kestrel Jun 28, 2024, 2:39 PM

#

Sweet

dapper bridge Jun 29, 2024, 7:17 PM

#

Hey - I have a problem with my subtitles. I use large-v2 model and whenever there's a pause in audio (no one is talking), timecodes desync.
It's really annoying to manually resync them, especially on videos that are 30+ minutes long. Did any of you figured out what to do in that situation? Some kind of fix?

torn bramble Jun 29, 2024, 10:35 PM

#

dapper bridge Hey - I have a problem with my subtitles. I use large-v2 model and whenever ther...

Do you have timecodes for your transcription, like a specific time in the recorded audio where that word was recognized? If so, how did you do it?

dapper bridge Jun 30, 2024, 2:43 PM

#

Okay, to be honest - I've found another solution. Switched to WhisperX, set my model to large-v3 and align_model to WAV2VEG2_ASR_LARGE_LV60K_9COH and it works wonders. It's much better than before.
BUT - for some reason I can't force it to work with CUDA (float16), it only works with CPU (int8) - and it's slow as hell. And I have no idea why it doesn't want to use it.

worn vector Jun 30, 2024, 9:34 PM

#

hey so i have some code meant to put subtitles on a video, but i only want the subtitles to be of a very short length (e.g. like 9 characters). this is my python code using whisper. it works perfectly fine except for the fact that my max_line_width and max_line_count parameters arent respected (the subtitles still are multi-lined and much longer than i would like). does anyone know why?

def generate_subtitles(audio_path):
    subtitle_filename = audio_path.replace('.mp3', '.srt')
    # max_line_width = 9
    # max_line_count = 1

    model = whisper.load_model("base")
    result = model.transcribe(audio_path)

    srt_writer = WriteSRT(".")
    srt_writer.write_result(
        result,
        file=open(os.path.join(".", os.path.splitext(os.path.basename(audio_path))[0] + ".srt"), "w",
                  encoding="utf-8"),
        options={"max_line_width": 9, "max_line_count": 1, "highlight_words": False}   # Why aren't the options working?
    )

agile niche Jul 1, 2024, 10:20 AM

#

Hi guys 👋
I’m a newbie here, I’m currently using the Whisper transcription API to transcribe audio files within a python app.
My question is : is it mandatory to read the audio file prior to sending it to the transcription API ? I’m scared RAM wise since I may have a lot of concurrent requests to handle with 9MBs for each audio files, and my app is hosted on Heroku with only 512MB of RAM.

#

Is there a way to send the transcription request without loading the audio file in RAM ?

agile niche Jul 1, 2024, 10:57 AM

#

Is this approach a risk of memory exhaustion ?

dapper bridge Jul 1, 2024, 12:23 PM

#

dapper bridge Okay, to be honest - I've found another solution. Switched to WhisperX, set my m...

it throws me that float16 (cuda) is not supported by my pc, but... it is and it's installed
it worked previously with faster-whisper
any ideas what might be the cause and fix?

agile niche Jul 1, 2024, 1:22 PM

#

Would this approach be correct ? :

async with aiofiles.open(audio_file_path, 'wb') as f:
while True:
chunk = await response.content.read(8192)
if not chunk:
break
await f.write(chunk)

            transcription_response = await client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file_path
            )

agile niche Jul 1, 2024, 7:52 PM

#

Anyone ?

tender rivet Jul 1, 2024, 9:10 PM

#

the API request has to contain the audio file, it will be loaded in memory at some point even if you decide to not use the lib and do the HTTP reqeust yourself

agile niche Jul 1, 2024, 9:49 PM

#

tender rivet the API request has to contain the audio file, it will be loaded in memory at so...

Thanks for your answer ! Does it mean that if my app needs to be able to handle 100 transcriptions calls concurrently I’ll need to have a beefy instance full of ram to avoid performance issues ?

lofty aurora Jul 1, 2024, 9:50 PM

#

agile niche Thanks for your answer ! Does it mean that if my app needs to be able to handle ...

yes, you will need system resources to match the load

agile niche Jul 1, 2024, 9:50 PM

#

lofty aurora yes, you will need system resources to match the load

Thanks Robert !

tender rivet Jul 1, 2024, 9:55 PM

#

agile niche Thanks for your answer ! Does it mean that if my app needs to be able to handle ...

processing that amount of audio at the same time will surely require a beefy machine, I would go with something with a good amount of ram and also a nice SSD if you plan to read all of those audio files from disk

#

even if it is all being done on the cloud, the machine will need some good amount of IO to keep uploading all of that consistently

agile niche Jul 1, 2024, 9:57 PM

#

Yup makes sense, I guess my temporary solution is to define a limit of concurrent calls to 10 for example to lower my expectations

#

Ram is extremely expensive on Heroku, my instance only has 512MB lol

tender rivet Jul 2, 2024, 12:11 AM

#

for sure you will need a queueing process to be able to control that

dapper bridge Jul 2, 2024, 11:00 AM

#

dapper bridge it throws me that float16 (cuda) is not supported by my pc, but... it is and it'...

any help with that, please?

eager ember Jul 5, 2024, 6:29 PM

#

There is also Speech to Text API from local file. You can pick a file from the device and ofcurse use it as API
https://rapidapi.com/swift-api-swift-api-default/api/ai-speech-to-text/playground/apiendpoint_e98a440e-6f3d-473a-8d13-56afe020f179

AI Speech To Text

Empower your applications with cutting-edge speech recognition technology, crafted to meet the highest standards of excellence. Equipped with the essential tools and resources, developers can confidently create and deploy their projects swiftly and efficiently. Our solution ensures unparalleled performance, delivering a seamless experience that ...

eager ember Jul 5, 2024, 6:33 PM

#

eager ember There is also Speech to Text API from local file. You can pick a file from the d...

frigid shale Jul 5, 2024, 10:50 PM

#

I know I'm pulling up an old thread here, but where can I find info on these more specific parameters such as vad or even compression_ratio_threshold and temperature_increment_on_fallback. I'm having the same issue as a lot of people have mentioned on here (hallucinations), but with the API and I can't find relevant docs on either OpenAI's platform docs nor Azure's OpenAI services.

#

#

This seems to be common in audio clips longer than 15 minutes or so, and it happens pretty often. I've tried setting my parameters based on some online answers (I'm using node and axios fetch) but none seem to get whisper any closer to a coherent response on this audio clip

const audioResponse = await axios.get(audioUrl, { responseType: 'stream' });
const form = new FormData();
form.append('file', audioResponse.data, {
  filename: 'audio.mp3',
  contentType: 'audio/mpeg'
});

// THESE ARE MY WHISPER API PARAMS
form.append('response_format', 'verbose_json');
form.append('timestamp_granularities', 'segment');
form.append('temperature_increment_on_fallback', 'None');
form.append('compression_ratio_threshold', 1.2);
form.append('temperature', 0.1);
form.append('vad', 'True');
// ===============================

const azureResponse = await axios.post(
  `${AZURE_OPENAI_ENDPOINT}/openai/deployments/${deploymentName}/audio/translations?api-version=2024-05-01-preview`,
  form,
  {
    headers: {
      'api-key': AZURE_OPENAI_API_KEY,
      ...form.getHeaders()
    }
  }
);

return azureResponse.data;

#

yes I'm using the Azure instance; and no I have no idea whether these params can actually be passed to the API but it should be the same as the OpenAI whisper api

shrewd rose Jul 6, 2024, 7:39 AM

#

frigid shale I know I'm pulling up an old thread here, but where can I find info on these mor...

Unfortunately I don't believe the API gives you access to all of those options...

icy phoenix Jul 7, 2024, 1:35 PM

#

have anyone test this model can tell me, it's better than large v2 japanese 5k or not ?
link :https://huggingface.co/drewschaub/whisper-large-v3-japanese-4k-steps/tree/main

drewschaub/whisper-large-v3-japanese-4k-steps at main

real zealot Jul 9, 2024, 10:45 AM

#

yes

manic plover Jul 9, 2024, 11:14 AM

#

real zealot yes

what?

woven pier Jul 11, 2024, 1:34 PM

#

Has whisper been abandoned by OpenAI?

austere halo Jul 12, 2024, 7:26 PM

#

they released v3 large less then a year ago

woven pier Jul 13, 2024, 12:15 AM

#

austere halo they released v3 large less then a year ago

but they also said that gpt-4o does a better job than whisper, and haven't touched it since last year

#

they did edit the readme recently

#

but they haven't accepted any PRs, even spelling mistake fixes. I know I submitted one that removed an error when run on Windows

surreal plover Jul 13, 2024, 4:51 AM

#

Pull requests have always taken literal years if there not like large security issues.

autumn bolt Jul 14, 2024, 4:15 PM

#

how to use whisper to detect sound effects in a video?

woven pier Jul 15, 2024, 3:11 PM

#

autumn bolt how to use whisper to detect sound effects in a video?

that is definitely not something it could do

#

it'll try and put a word or letters to the sound

surreal plover Jul 16, 2024, 7:31 AM

#

when i use whisper through command prompt it works flawlessly, however when i try to run it through a python script it says:

'whisper' has no attribute 'load_model'

i tried to debug and used: print(dir(whisper))
which gave me:

['AudioTranscriber', 'QApplication', 'QColor', 'QMainWindow', 'QPalette', 'QPushButton', 'QTextCursor', 'QTextEdit', 'QTimer', 'QVBoxLayout', 'QWidget', 'Qt', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'np', 'queue', 'sd', 'sf', 'sys', 'tempfile', 'threading', 'whisper']

and now im stumpted, ive updated everything, reinstalled everything, restarted everything 10 times over and this still happens.

#

edit: you can not name any file "whisper.py" anywhere on your pc or it will break everything. i'd put in a pull req to fix this but im sure there is an open one already.

willow current Jul 17, 2024, 7:00 AM

#

hi

#

am new here. I have some doubts about the training and fine tuning the whisper model

#

is any posibilities to finetune the model with larger duration audio files

#

i have various audio files minimum 10 mins to maximum 30 mins and also have their trancription to in my dataset

#

i need to train the model with my dataset it is possible

#

any one help me about this

surreal plover Jul 17, 2024, 11:43 AM

#

You just have too convert the audio files to the correct format for the model you want to tune e.g. 16kHz then tokenize the transcriptions, set up your training pramaters, and run the training function. You can then use the trainer.evaluate() function to check the results and change the pramaters as needed.

Here is an example:
https://colab.research.google.com/drive/1P4ClLkPmfsaKn2tBbRp0nVjGMRKR-EWz

Google Colab

willow current Jul 17, 2024, 1:19 PM

#

surreal plover You just have too convert the audio files to the correct format for the model yo...

ok thanks, @surreal plover I'll try this

willow current Jul 17, 2024, 2:13 PM

#

@surreal plover is it possible with finetuning the model with larger audio files more than 20mins audio and its transcription (note : i need to finetune the model with full size audio i don't want to split 30sec)

surreal plover Jul 18, 2024, 2:24 AM

#

willow current <@1003456168489922633> is it possible with finetuning the model with larger aud...

I don't believe there is any limit on the length of audio you can use to fine tune the model. You should be able to use any length file, it's just hardware constraints.

willow current Jul 18, 2024, 5:09 AM

#

surreal plover I don't believe there is any limit on the length of audio you can use to fine tu...

but when ever i try to train my dataset (long duration audio with transcription) i got this following error.

ERROR : RuntimeError: The size of tensor a (1719) must match the size of tensor b (448) at non-singleton dimension 1

surreal plover Jul 18, 2024, 5:13 AM

#

Your current model is set up expecting all files in the batch to have the same dimensions between both audio files. The simplest fix would be to either pad or truncate the files. A simple python script could automate this if you don't want to get into changing what the batch expects.

willow current Jul 18, 2024, 5:17 AM

#

surreal plover Your current model is set up expecting all files in the batch to have the same d...

i already try those two method padding and truncate after also i got this same error

surreal plover Jul 18, 2024, 5:18 AM

#

Sounds like there is an error with how your padding and or turncating. I'd add some logging and check the length of files after they have been padded, etc.

willow current Jul 18, 2024, 5:20 AM

#

@surreal plover do you give me any sample files for your padding and truncate logic code i'll go through that and i'll follow your methods will you help me for this

covert kraken Jul 19, 2024, 4:31 AM

#

Hi,
I want to optimize whisper model for running local Android device.
What is the best approach for optimziing whisper?
Which model is the best for base one?
While maintaining accuracy, how to optimize much?

surreal wing Jul 21, 2024, 4:02 PM

#

Hi, I have problems with whisper. Whisper works well this way:


model = whisper.load_model(“base”)
result = model.transcribe(“audio2.wav”)
print(result[“text”])```

But when I want to use it in my code, I get errors. Here a part of my code : 
```import whisper
import sounddevice as sd
import numpy as np
import torch  # Assurez-vous que PyTorch est importé

def voice_model(language='en-EN', mic_index=0,
                voice_id='HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\MSTTS_V110_enGB_HazelM'):
    working = True
    model = whisper.load_model("base")
    engine = pyttsx3.init()
    engine.setProperty('voice', voice_id)

    if language == 'fr-FR':
        print("Apuuyez sur * sur votre clavier pour mettre en pause la conversation")
    else:
        print("Press * on your keyboard to pause the conversation")

    def talk(text):
        engine.say(text)
        engine.runAndWait()

    def listen():
        command = ''
        try:
            # Définir la durée de l'enregistrement en secondes
            duration = 5  # Durée de l'enregistrement en secondes
            samplerate = 16000  # Fréquence d'échantillonnage de l'audio

            # Enregistrer l'audio du microphone
            print('Assistant :')
            audio = sd.rec(int(samplerate * duration), samplerate=samplerate, channels=1, dtype='int16')
            sd.wait()  # Attendre que l'enregistrement soit terminé
            audio = np.squeeze(audio)  # Assurez-vous que l'audio est en mono

            # Convertir l'audio en tensor PyTorch et en type flottant
            audio_tensor = torch.tensor(audio).float()  # Convertir en tensor flottant

            # Charger le modèle whisper et transcrire l'audio
            model = whisper.load_model("base")
            result = model.transcribe(audio_tensor)
            command = result["text"]
        except Exception as e:
            print(f"Erreur lors de la transcription Whisper: {e}")
        return command```

#

In English : ```import whisper
import sounddevice as sd
import numpy as np
import torch # Make sure PyTorch is imported

def voice_model(language='en-EN', mic_index=0,
voice_id='HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\MSTTS_V110_enGB_HazelM'):
working = True
model = whisper.load_model(“base”)
engine = pyttsx3.init()
engine.setProperty('voice', voice_id)

if language == 'fr-FR':
    print(“Press * on your keyboard to pause the conversation”)
else:
    print(“Press * on your keyboard to pause the conversation”)

def talk(text):
    engine.say(text)
    engine.runAndWait()

def listen():
    command = ''
    try:
        # Set recording duration in seconds
        duration = 5 # Recording duration in seconds
        samplerate = 16000 # Audio sampling frequency

        # Record microphone audio
        print('Assistant:')
        audio = sd.rec(int(samplerate * duration), samplerate=samplerate, channels=1, dtype='int16')
        sd.wait() # Wait for recording to finish
        audio = np.squeeze(audio) # Make sure audio is mono

        # Convert audio to PyTorch tensor and floating type```

#


            # Load whisper model and transcribe audio
            model = whisper.load_model(“base”)
            result = model.transcribe(audio_tensor)
            command = result[“text”]
        except Exception as e:
            print(f “Transcription error Whisper: {e}”)
        return command```

rugged narwhal Jul 21, 2024, 7:16 PM

#

surreal wing ```audio_tensor = torch.tensor(audio).float() # Convert to floating tensor ...

And what is the actual error here?

#

Please post a traceback

#

Anyone here with the same problem? It all started when I included a prompt for the API, something like this:

prompt = (
                "Please provide a clean and formatted transcription of the following audio:\n"
                "The transcription should include punctuation and capitalization to make the text readable."
            )

The API then goes wild sometimes and returns "If you have any questions or comments", instead of transcribing the audio submitted. Sometimes it works, sometimes it does not. The fact that there's also no "If you have any questions or comments" in the MP3 file at all makes me question it even more

gilded oasis Jul 24, 2024, 12:32 PM

#

any tips on improving Greek language recognition on whisper and faster-whisper?

celest wolf Jul 25, 2024, 9:30 AM

#

rugged narwhal Anyone here with the same problem? It all started when I included a prompt for t...

Check the audio levels and overall quality. Whisper behaves this way when it hears silence, likely because it was trained on videos that ended with subtitles containing such expressions but no audible audio.

rugged narwhal Jul 25, 2024, 11:31 AM

#

celest wolf Check the audio levels and overall quality. Whisper behaves this way when it hea...

Yeah, that's what I thought. However, if I submit the file multiple times, it gets the lyrics right. The audio is also pretty clean/clear if you ask me. It did only happen more often when giving a prompt alongside the initial audio

#

Thanks for responding though! Love

celest wolf Jul 25, 2024, 11:59 AM

#

rugged narwhal Yeah, that's what I thought. However, if I submit the file multiple times, it ge...

That's correct because it's something like confabulation. It doesn't happen every time.

I have a Voice UI that transcribes as "thank you for watching," "please subscribe to my channel," or even something in a language I don't understand when the user doesn't speak for a moment.

rugged narwhal Jul 25, 2024, 12:57 PM

#

celest wolf That's correct because it's something like confabulation. It doesn't happen ever...

Interesting to know, thank you! I mean, it's not something big so for my use cases that's fine, but I was just confused for a second lul

celest wolf Jul 25, 2024, 1:52 PM

#

rugged narwhal Interesting to know, thank you! I mean, it's not something big so for my use cas...

sure!

I have a picture representing this behavior. I activated the mic, and it deactivated after three seconds of detected silence, yet I still received a transcription.

split acorn Jul 25, 2024, 6:23 PM

#

Love it.

I've been playing with whisper and Google voice synthesis using SSML. It's pretty awesome wiht some of the new voices google has for SSML. You can create a group conversation as a discussion using made up actors that take the transcription, reprocess it with OpenAI API to genereate the conversation and SSML code - send to Google and get the audio back. We've layered it on top of other videos as commentary and it was pretty cool.

Here is a sample of a cmpletely unscripted conversation that used Whisper to take the conversation of the exising marketing video and created the dialog. The app alloows you to add as many actors as you want and their roll. One person knows everything, one person is asking questions and we've added a third who is the joke commentary to provide levity in what was a boring video before it was processed. https://www.robsoninc.com/wp-content/uploads/2024/06/420OTU6uCnA-final.mp4

split acorn Jul 25, 2024, 6:44 PM

#

Best part is, It can run off any existing youtube video. I've tried publicly avaliable music videos (to test), which was just odd when you remove the existing voices, keep the music and add in a multi person conversation talking about the lyrics. Next version is going to overlay pop-ups on the video to provide contextual information. It's all about increasing user engagement, creating new unique content.

sudden aspen Jul 27, 2024, 10:23 PM

#

frigid shale

set repetition penalty and use default temperature. why did you change it to 0.1 ?

subtle vortex Jul 28, 2024, 7:10 PM

#

hey
whisper does accept .mp4 video files directly to transcript?

late wave Jul 29, 2024, 5:45 PM

#

subtle vortex hey whisper does accept .mp4 video files directly to transcript?

yes!

#

https://platform.openai.com/docs/guides/speech-to-text
"File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm."

alpine lion Aug 1, 2024, 11:23 AM

#

Hey guys, I have some issues when transcribing to WORD LEVEL, I get some words with the same start and end time, how can I solve that ?

#

Would love your help !

abstract viper Aug 7, 2024, 4:49 AM

#

You are best to run Whisper locally on your own GPU

abstract viper Aug 7, 2024, 4:51 AM

#

split acorn Best part is, It can run off any existing youtube video. I've tried publicly ava...

interesting. ..

split acorn Aug 7, 2024, 3:52 PM

#

abstract viper You are best to run Whisper locally on your own GPU

I use the GPU in the Mac M3. Been working great. Also tried a Nvidia card in a dedicated server, I like the Mac for dev better.

Check out Torchx / TensorFlow

https://medium.com/bluetuple-ai/how-to-enable-gpu-support-for-tensorflow-or-pytorch-on-macos-4aaaad057e74

Medium

How to enable GPU support for TensorFlow or PyTorch on MacOS

Enable GPU support on MacOs

feral haven Aug 8, 2024, 8:53 AM

#

that is so cool

alpine lion Aug 8, 2024, 9:12 AM

#

abstract viper You are best to run Whisper locally on your own GPU

well I can't, I'm hosting an app on Vercel so I prefere to use the API

turbid bay Aug 11, 2024, 9:18 PM

#

Did I mess up installing something?

stuck blaze Aug 16, 2024, 8:23 PM

#

Have you tried forcing a reinstall/update for Whisper? That should help make sure any dependencies are installed.

honest rampart Aug 17, 2024, 7:21 AM

#

i think try updating your pytorch module or go to the fit repo and pip install -r requirements.txt

#

mate at this point i just tell people to upload pip list

#

dependency conflict is so freaking annoying honestly

sudden aspen Aug 22, 2024, 10:30 PM

#

whats the best tts ai ?

#

tts1-hd not so good

tender rivet Aug 23, 2024, 4:45 PM

#

sudden aspen whats the best tts ai ?

GPT-4o 😆 sadly we can't use that via the API yet

sudden aspen Aug 23, 2024, 4:47 PM

#

tender rivet GPT-4o 😆 sadly we can't use that via the API yet

lol i'll record it

#

😛

tender rivet Aug 23, 2024, 4:47 PM

#

hopefully that ages badly and we get that sweet sweet voice mode on the API soon =P

buoyant olive Aug 26, 2024, 1:57 AM

#

what is whisper

tender rivet Aug 26, 2024, 4:38 PM

#

buoyant olive what is whisper

speach to text AI model made by OpenAI

buoyant olive Aug 26, 2024, 4:39 PM

#

speech to text? isnt that just on every phone/youtube?

agile niche Aug 28, 2024, 10:42 AM

#

Is there any workaround to transcribe a multilingual audio file ? I have french and english in the same audio file

latent sand Aug 28, 2024, 10:56 AM

#

trying to use whisper to subtitle an episode and running into an issue

#

whisper via the api works great for picking up silence and properly segmenting the text, but the timings are all off

#

and so now I'm trying to use whisperx, where the timings are correct, but it does a much poorer job with breaking up the text by speaker

#

anyone have any familiarity w this and/or recommendations?

latent sand Aug 28, 2024, 12:07 PM

#

actually I'll just write out the srt myself based on the word timings...

split acorn Aug 28, 2024, 5:41 PM

#

alpine lion well I can't, I'm hosting an app on Vercel so I prefere to use the API

You can use the API locally, you just need the *nix environment.

split acorn Aug 28, 2024, 5:42 PM

#

tender rivet GPT-4o 😆 sadly we can't use that via the API yet

You can... if you're a paying subscriber.

split acorn Aug 28, 2024, 5:44 PM

#

latent sand and so now I'm trying to use whisperx, where the timings are correct, but it doe...

I could help, I've done a lot of Whister -> Transcription -> to OpenAI API to Process and generate SSML -> process with OpenAI TTS and re add back over the existing video. In any language. (More with TTS using Google Speech) Just need a Google Dev API with the service to use that.

tender rivet Aug 28, 2024, 5:59 PM

#

split acorn You can... if you're a paying subscriber.

GPT-4o voice features are not available, there is no API endpoint for it, even if you are a subscriber of ChatGPT, the ChatGPT subscription is a completely separated service to the API and a subscription on ChatGPT does not grant any API usage

split acorn Aug 28, 2024, 6:12 PM

#

tender rivet GPT-4o voice features are not available, there is no API endpoint for it, even i...

You don't need GPT4o to use OpenAI TTS - I mean there would be zero point or benefit.

If you want to create new, better content off the transcript you would use GPT4o to generate either the SSML (if you want to use Google Voice) or you can just send the text to OpenAI TTS and let it do it's thing. Which I found to be much more human sounding.

https://platform.openai.com/docs/api-reference/audio

I used tts-1-hd

tender rivet Aug 28, 2024, 6:20 PM

#

split acorn You don't need GPT4o to use OpenAI TTS - I mean there would be zero point or ben...

In the context of the message you replied to, it was talking about the upcoming voice features that have been showcased by OpenAI on ChatGPT. These features are not available on the API. Those features are in fact not even available to ChatGPT subscribers on ChatGPT except for a small amount of beta testers.

split acorn Aug 28, 2024, 6:34 PM

#

How do you see the features being introduced not programmatically available via the API now? Genuine question.. The issues as I see it is just speed for realtime conversation.

You need an OpenAI subscription to access the API, not ChatGPT Plus. - right?

TTS is included with the OpenAI subscription.

latent sand Aug 28, 2024, 9:49 PM

#

split acorn I could help, I've done a lot of Whister -> Transcription -> to OpenAI API to Pr...

any tips would b appreciated, can show you what I'm looking at

#

left is from openai's API, right is from whisperx run locally

#

the api did a better job with breaking up sentences with silence/no speech between them

#

whisperx kinda just groups them together if they were close enough

#

i know the latter is basically just whisper under the hood, but I'm not sure which parameters were used to achieve the former (for whisper, i.e. past the api)

#

a clear difference is the punctuation obv but I have no idea how to control that hehe

#

toyed with params like chunk_size no_speech_threshold max_line_count and only the first one of those made a significant difference, though its effect isn't exactly what I'm looking for

split acorn Aug 29, 2024, 7:41 PM

#

I found OpenAI TTS-HD to be more realistic that Google Voice with SSML.

#

But Google has a lot more languages/accents and there are some new - pretty great sounding voices but the SSML can be tricky. I had issues when adding in two GenAI people in to a conversation.

woven pier Aug 29, 2024, 8:54 PM

#

thinktate

prime prism Sep 2, 2024, 6:57 AM

#

woven pier <a:thinktate:397729724358590465>

a ye seems like its obsesed whit amara org always say that 😄

junior briar Sep 2, 2024, 6:10 PM

#

woven pier <a:thinktate:397729724358590465>

it hallucinates when there is silence

woven pier Sep 2, 2024, 6:12 PM

#

junior briar it hallucinates when there is silence

I had never seen that one before

pastel cradle Sep 2, 2024, 6:31 PM

#

I had never seen "Amara" message before too.

junior briar Sep 3, 2024, 12:54 AM

#

woven pier I had never seen that one before

send an empty audio clip

#

i noticed it when i was building a STT/TTS agent and sometimes i’d accidentally send an empty message to the agent

deep cradle Sep 3, 2024, 7:55 AM

#

hi i am trying to transcribe the audio but whisper seems to skip a big chunk every time in the audio in other audio i work fine but in this one i seem to skip a big chunk

here the audio i about 56 sec

if you listen to the audio in the start the lady say anyway okay let's move on so what I'm doing right now and after the the whole chunk it skipped untell did you figure it out

https://filebin.net/mnk9e6qmy5nfm9oa

TimeStamp

00:05 - 00:30 audio is skipped every time i transcribe it 3 time i get the same response

transcriptions

anyway okay let's move on so what I'm doing right now did you figure out what I was saying there or you just bought the test like pretty much everybody does because now we have some groups that are supposed to help people guide people but actually what happens is that sometimes people go into their small private discussions and when somebody wants to do the test they just ask someone from that group\n

Filebin | mnk9e6qmy5nfm9oa

Convenient file sharing. Registration is not required. Large files are supported.

flint temple Sep 13, 2024, 5:59 AM

#

Hey, trying to get whisper.cpp working on windows. anyone been able to compile with cmake? for whatever reason manually compiling the files doesn't work as it doesn't seem to use any backend and therefore doesn't work. i can't comile any example with cmake other than the main and i wanna do more than that. any help?

gilded oasis Sep 17, 2024, 11:55 AM

#

I have a self hosted whisper server used to transcribe Greek news videos (Large-v3 model), however I am experiencing an issue where sometimes, after the speaker changes, the transcription is lost.
Sometimes it comes back after a sentence or two, other times it does not understand the 2nd speaker at all.
Audio quality is excellent for both speakers and neither has a thick accent or is unintelligible.
Is there anything I can do to improve this?

rustic ginkgo Sep 25, 2024, 4:28 PM

#

Hey everyone!

Does anyone else here finds a problem that max file size for whisper being 25MB?
If it is an issue for anyone else, I thought I'd share with you a simple api for transcoding videos / audios in a variety of formats into opus ogg, which greatly reduces the file size.

Some tests I ran were able to reduce 1GB (video) mp4 file into a ~15 MBs audio, and 50MBs mp3 audios into ~ 2MB files, and the transcription with whisper worked perfectly!

It's a very simple api (only one file), that can be run/deployed with a docker container.
You can find all instructions (including, deploying it to fly.io) at the repo:
https://github.com/vfssantos/ffmpeg-deno-microservice

GitHub

GitHub - vfssantos/ffmpeg-deno-microservice

Contribute to vfssantos/ffmpeg-deno-microservice development by creating an account on GitHub.

sleek valve Sep 26, 2024, 4:35 PM

#

Is whisper-1 the only whisper model?

rocky ruin Oct 3, 2024, 5:05 PM

#

Hi I'm trying to generate translated subtitles for a non-english video using whisper but using --task translate doesn't seem to do anything what am I missing

rocky ruin Oct 3, 2024, 5:34 PM

#

i found the issue turbo doesn't have the translation capabilities

neon silo Oct 4, 2024, 1:21 AM

#

sleek valve Is whisper-1 the only whisper model?

https://github.com/openai/whisper/discussions/2363 large-v3-turbo is the latest

GitHub

`turbo` model release · openai whisper · Discussion #2363

We’re releasing a new Whisper model named large-v3-turbo, or turbo for short. It is an optimized version of Whisper large-v3 and has only 4 decoder layers—just like the tiny model—down from the 32 ...

unique veldt Oct 7, 2024, 6:10 AM

#

the api docs says ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

https://platform.openai.com/docs/api-reference/audio/createTranscription

#

looks like large-v3-turbo is not yet available :/

manic cliff Oct 11, 2024, 4:52 PM

#

How does one transcribe an audio file with multiple speakers, and have the AI distinguish between them?? Examples: podcasts, meeting call recordings

hollow orchid Oct 12, 2024, 8:07 PM

#

You will need to do speaker diarization. Whisper doesn't support this itself but you can use libraries like pyannote https://github.com/pyannote/pyannote-audio or if you are okay with paying you can check assembly.ai they have this available via api and its fairly cheap if your usage is limited.

GitHub

GitHub - pyannote/pyannote-audio: Neural building blocks for speake...

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding - GitHub - pyannote/pyannote-audio: Neural build...

dapper dirge Oct 22, 2024, 8:01 PM

#

unique veldt the api docs says `ID of the model to use. Only whisper-1 (which is powered by o...

Whisper large-v3-turbo is available. The model is available on huggingface

Here is the link - https://huggingface.co/openai/whisper-large-v3-turbo

openai/whisper-large-v3-turbo · Hugging Face

rapid spindle Nov 2, 2024, 1:08 PM

#

hollow orchid You will need to do speaker diarization. Whisper doesn't support this itself but...

thanks...that' s interesting.

#

Anyone know is there will be a new version or completed remake of wihsper when the audio version of chatgpt will be available?

#

i'm waiting for that info before i make an investment....would be really hurtfull to invest just that some days later a new complete version of whisper comes out....it's for STT project

#

if anyone has any news, it would be great, thanks!

upbeat marten Nov 3, 2024, 6:43 AM

#

Hi all. Any tips or resources for generating an srt, given an mp3 file and its lyrics?

I thought about using whisper, but sometimes the lyrics it hears are wrong -- eg when the song audio isnt too clear or words are in other languages --

upbeat marten Nov 6, 2024, 11:22 PM

#

upbeat marten Hi all. Any tips or resources for generating an srt, given an mp3 file and its l...

So.... many... crickets...

neat cave Nov 9, 2024, 3:24 PM

#

What is the best wrapper your have seen to run whisper locally on mac?

Atm I use flowvoice.ai but I would prefer running something open source if possible

rocky dawn Nov 13, 2024, 8:27 AM

#

neat cave What is the best wrapper your have seen to run whisper locally on mac? Atm I us...

simplest would be to do a single http request with curl, or if you install the OpenAI API a simple Python script:

client = OpenAI()

THE_PATH = "[path goes here]"
THE_NAME = "[file name goes here]"

audio_file = open(THE_PATH+THE_NAME+".wav", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="srt"
)

print(transcription)
f = open(THE_PATH+THE_NAME+'.txt', 'w')
print(transcription, file = f)```

#

(i just added THE_PATH and THE_NAME to make this example clear)

#

also you can install ffmpeg and convert the audio to .wav first which can prevent weirdness with server side conversion:
ffmpeg -i <audio> -ar 16000 -ac 1 -c:a pcm_s16le <output>.wav

#

if you divide it into multiple audio files you can use a GPT to combine the transcript files smoothly

rocky dawn Nov 13, 2024, 9:10 AM

#

Also you could create your own Whisper app for Mac by telling o1-preview:
Please write me a Swift app for Mac that uses SwiftUI for the UI, and Combine for asynchronous data and UI. The app should let me select an audio file, convert it to a transcript with an OpenAI Whisper http request, and then put it into a text editor window so i can edit it. Please add whatever other features you think will make it amazing. Please write out all source files 100% completely three files at a time, and ask permission to continue.

#

There are many ways

neat cave Nov 14, 2024, 8:45 AM

#

rocky dawn simplest would be to do a single http request with curl, or if you install the O...

Thanks! I still think the simple way would be an existing open source solution. It seems like a pretty straightforward case and whisper is not really new so I guess there should be something already out there that is plug and play

lethal shoal Nov 27, 2024, 7:37 PM

#

Hey everyone! I have a short question (maybe not so short, haha) about fine-tuning Whisper models on own labelled data. Following the general jupyter notebook on Huggingface, there seems to be no preprocessing steps for transcriptions that are generated during the training and are used for evaluations over the course of training. Is it necessary to add this step in the training somehow?
What I mean:
For example, for my own dataset that I am using for fine-tuning, all letters are lowercase, and there are no punctuation marks, only pure letter characters and whitespaces. Now, after starting fine-tuning, for example, every 200 steps the current model gets evaluated, however, is it not possible that model will generate output that will have uppercase and punctuation characters, and therefore WER will be higher than if they were preprocessed after being generated?

naive plinth Dec 5, 2024, 10:51 AM

#

hi I am new to the channel, so forgive me if it has been answered before. Has anybody tried to use gpt4o- native audio 2 text to transcribe audio? How does it compare with Whisper? How good is it for long-context? Can it somehow do speaker identification?

crystal jetty Dec 7, 2024, 3:28 PM

#

hi is it possible to directly summarize an audio with whisper without a previous transcription?

dapper bridge Dec 8, 2024, 5:34 PM

#

Hey, I've recently discovered a problem with using two different versions of Whisper with CUDA.
I have installed whisperx and whisper-ctranslate2 on two different conda enviroments.

CUDA v12.5 and v11.8 are installed in Windows.
CUDNN v8.9.7 and v9.6 are installed in Windows.
Both of them are included in environment variable PATH.
Additionally I have two CUDA_PATH and CUDA_PATH_V11_8 set up.

whisperx works correctly with CUDA and spits out transcripts perfectly
whisper-ctranslate2 recognizes file but it refuses to transcript it - it crashes with no error. It only works with --device CPU argument in command line.
So, it looks like whisper-ctranslate2 has some problems with reading CUDA and CUDNN files, but I have no idea how to force it to recognize files correctly.

(I remember that on my previous Windows installation, the problem was reversed - whisper-ctranslate2 was working with CUDA and whisperx was not having any of it)

Any ideas how to fix it?

misty cove Dec 8, 2024, 8:26 PM

#

crystal jetty hi is it possible to directly summarize an audio with whisper without a previous...

probably not but you can use any python package for this summarization

golden sage Dec 12, 2024, 5:07 PM

#

Hey,

We've recently added Intel® Gaudi® support to the Whisper repo! 🎉
Check out the GitHub discussion for more details - https://github.com/openai/whisper/discussions/2463

Let us know your thoughts or any feedback in the thread!

GitHub

Introducing Intel® Gaudi® Support · openai whisper · Discussion #24...

Introduction We are excited to announce that we have opened a pull request (#2450) on the Whisper GitHub repository to add support for Intel Gaudi. This enhancement aims to improve performance and ...

whole tangle Dec 19, 2024, 9:45 AM

#

I have seen that midjourney allow to generate img in relaxed mode for free users! Is that true

crystal copper Dec 19, 2024, 8:47 PM

#

Is there documentation on how to tune the ServerVAD API for PSTN calls? The default seems to be really sensitive

worldly lantern Dec 28, 2024, 7:20 PM

#

Hello guys, I need some help. I don't know any coding but through chatgpt, I somehow got whisper integrated in my terminal, but when I ran the prompt to get speech to text transcription, "Error calling OpenAI Audio API: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https:/platform.openai.com/docs/guides/error-codes/api-errors.", even though I have the chatgpt $20/ month subscription

ornate river Dec 28, 2024, 8:20 PM

#

worldly lantern Hello guys, I need some help. I don't know any coding but through chatgpt, I som...

The api does not use the ChatGP account. It is separate, separate billing.

worldly lantern Dec 28, 2024, 8:24 PM

#

Ohh okay

frozen spoke Jan 4, 2025, 6:52 PM

#

hey, i'm currently trying to integrate whisper locally to save on some api call costs, and i've followed the github repo's instructions on how to load and set up the module and model.

this is my code:

import whisper
model = whisper.load_model("tiny")

This is the error it raises:
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)>

Anyone know how to fix?

#

Oh - running python 3.12.8 in a venv using VSCode

shrewd rose Jan 7, 2025, 2:23 PM

#

frozen spoke hey, i'm currently trying to integrate whisper locally to save on some api call ...

Are you running this from a public or a school or work computer ?

#

Or public wifi ?

frozen spoke Jan 9, 2025, 3:13 AM

#

shrewd rose Are you running this from a public or a school or work computer ?

Personal device on personal wifi. I tried on python 3.9 and it worked.

shrewd rose Jan 9, 2025, 7:33 AM

#

frozen spoke Personal device on personal wifi. I tried on python 3.9 and it worked.

Oh good then. Lmk if there's anything else

umbral palm Jan 10, 2025, 10:06 PM

#

Can whisper detect um, ah and other filler words? Could you help me with what I should be putting in as a prompt so that I can detect it. I need it for the app I am building.

surreal dragon Jan 12, 2025, 7:22 PM

#

umbral palm Can whisper detect um, ah and other filler words? Could you help me with what I ...

I suggest setting normalize=False, and yes! If you prompt with something like "Umm, let me think like, hmm... Okay, here's what I'm, like, thinking." It's at least somewhat more likely to include those sorts of filler words, but maybe not all of them

#

if you search for "allow wisper to detect filler words" there's a few good conversations on various online forums for tricks and methods of doing so :)

umbral palm Jan 12, 2025, 7:46 PM

#

surreal dragon I suggest setting normalize=False, and yes! If you prompt with something like "U...

This is not working for me, even the prompt is not working.

#

I don't think there is a parameter for normalisation

surreal dragon Jan 13, 2025, 6:43 AM

#

umbral palm This is not working for me, even the prompt is not working.

This video goes in depth on how you can do something like you're trying to achieve: https://youtu.be/pUzBuwjvH9E

YouTube

Jeffrey Chupp

Stop saying “um” - Building an AI-powered Filler Word Detector

Let's build an AI-powered filler-word detector to play an air horn when we say "um" or "uh." We'll use powerful off-the-shelf tools and wire them together with minimal code.

00:00 - Intro
01:17 - Getting Started
03:17 - Recording in chunks
06:15 - Roughing out recognition
07:16 - Detecting filler words
15:16 - Adding the air horn
16:56 - Faster...

▶ Play video

mighty beacon Jan 15, 2025, 11:23 PM

#

I am using whisper on windows10 and set up the settings the way I like now I just drop the files I want to transcribe onto it and it does so dropping the translations into the folder of the file its transcribing.

I wanted to ask if theres a command I can add to the *.bat file that would do the following: The audio file I drop onto the .bat I want to be moved to another folder after its completed.

Right now i Just do it manually, but I wanted to automate it.

paper obsidian Jan 17, 2025, 3:51 AM

#

Is there anyway I can have taxonomy included in the whisper? I have some banking terms like TMRW (pronounced as tomorrow), UOB where I don't get these terms when spoken. Based on the context can whisper identify these terms? I tried with initial prompt but that doesn't work out really well. Any suggestions or ideas?

prime escarp Jan 20, 2025, 3:30 PM

#

What's the best way to do speaker diarization via the API for a react native app?

surreal dragon Jan 21, 2025, 3:59 PM

#

prime escarp What's the best way to do speaker diarization via the API for a react native app...

This isn't directly offered via the whisper API and you will likely need a middleware or additional step for diarization

surreal dragon Jan 21, 2025, 4:00 PM

#

paper obsidian Is there anyway I can have taxonomy included in the whisper? I have some banking...

A very strict prompt structure may help here, I find with whisper specifically putting directions and examples in a format similar to XML works pretty well

prime escarp Jan 21, 2025, 4:13 PM

#

surreal dragon This isn't directly offered via the whisper API and you will likely need a middl...

Thanks, any recommendations?

magic forge Jan 25, 2025, 3:33 PM

#

Any best practices to avoid hallucination with whisper? Nothing I try works for a specific audio file

quiet condor Jan 27, 2025, 11:38 PM

#

what is whisper?

#

i dont think ive heard of this

#

or maybe just not realised

tardy sonnet Jan 28, 2025, 12:04 AM

#

quiet condor what is whisper?

Whisper is a speech-to-text model

quiet condor Jan 28, 2025, 12:06 AM

#

tardy sonnet Whisper is a speech-to-text model

ohhhh

#

ty

quaint spire Jan 28, 2025, 1:05 PM

#

Guys is there an app that lets you run the Whisper Large V3 model for free on MacOS?

queen valve Jan 28, 2025, 4:44 PM

#

quaint spire Guys is there an app that lets you run the Whisper Large V3 model for free on Ma...

brew install python
pip install openai-whisper
whisper --model=large-v3 whatever.mp3

queen valve Jan 28, 2025, 4:48 PM

#

queen valve ``` brew install python pip install openai-whisper whisper --model=large-v3 what...

If you do whisper --help it says the default is --model=turbo:

MODELS = {
...
    "large-v3": "https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt",
...
    "turbo": "https://openaipublic.azureedge.net/main/whisper/models/aff26ae408abcba5fbf8813c21e62b0941638c5f6eebfb145be0c9839262a19a/large-v3-turbo.pt",
}

so it defaults to large-v3-turbo

quaint spire Jan 28, 2025, 4:48 PM

#

Thank you!

queen valve Jan 28, 2025, 4:49 PM

#

I wrote a thing for transcribing to the terminal from the default mic pip install catvox

#

you can pass --model to that, it uses whisper. Currently working on getting it to recognise who is talking, so it can just show me my own speech, so I can send that to an assistant.

#

https://bitplane.net/dev/python/catvox/
if anyone is interested in helping make it work, current pipeline redesign plan is here: https://github.com/bitplane/catvox/tree/pipeline

GitHub

GitHub - bitplane/catvox at pipeline

transcribe and pipe your voice to stdout. Contribute to bitplane/catvox development by creating an account on GitHub.

quaint spire Jan 28, 2025, 4:55 PM

#

That's awesome

queen valve Jan 28, 2025, 4:57 PM

#

Thanks. Building an automatic pipeline builder is a headache. Been scratching my head over it for ages

agile niche Jan 28, 2025, 10:51 PM

#

Are there any best practices to avoid having wrong timestamps via Whisper API when transcribing in SRT format ? I notice huge offsets on the timestamps...

queen valve Jan 30, 2025, 5:18 AM

#

you can get word-level timestamps and then consolodate. that gives me more accurate results

agile niche Feb 1, 2025, 9:44 PM

#

Are you guys using Silero VAD to exclude silences of the audio chunks sent to Whisper API ?

#

Currently doing the following : Silero VAD --> Whisper API --> Aeneas

last ridge Feb 4, 2025, 10:11 PM

#

what i've to do ?

Capture_decran_2025-02-04_a_18.02.35.png

fast drift Feb 5, 2025, 7:26 AM

#

Am using openai whisper model and observed following issues

When I don't speak anything instead of giving empty response generates random words
Sometimes, detect incorrect words where audio is clear

Can anyone please advise me on how do i improve?

agile niche Feb 5, 2025, 2:21 PM

#

fast drift Am using openai whisper model and observed following issues - When I don't speak...

Nobody answered me on this before. Here is why it happens : Whisper is not handling properly silences, which induces hallucinations such as repetitions, triple dots and so on (very funky stuff).
What you need to do is use a VAD tool like Silero-VAD to cut your audio into voice only chunks and send these to transcribe.

amber schooner Feb 5, 2025, 3:48 PM

#

last ridge what i've to do ?

was this the whole error message?

fast drift Feb 5, 2025, 4:36 PM

#

agile niche Nobody answered me on this before. Here is why it happens : Whisper is not handl...

Thanks @agile niche for providing suggestion on how to handle silence
Have you faced any challenge where it detected wrong words

eg:
What I asked "hi can you assist me "
What it assumed "Hi Kenyas speak to me"

agile niche Feb 5, 2025, 4:37 PM

#

fast drift Thanks <@393814902059565056> for providing suggestion on how to handle silence H...

Yup, so a good practice is to ask ChatGPT to correct your whisper transcript by giving him guidelines

#

An efficient transcription pipeline is : Audio --> VAD --> transcription job --> correction job

#

Do you also need time stamps ?

fast drift Feb 5, 2025, 4:38 PM

#

I don't need timestamps

jaunty loom Feb 10, 2025, 1:47 PM

#

Hey folks,
My colleague raised a cool PR for whisper that boostx transcription peformance and I was wondering if I could find anyone here who might be able to review the PR and give her some feedback? Apologies if this is the wrong place, here's a link if not https://github.com/openai/whisper/pull/2516

GitHub

Performance improvements for transcription (up to 20% faster transc...

Implements a suite of optimizations focusing on memory efficiency, tensor initialization, and model loading functionality. These changes improve performance, code clarity, and model handling flexib...

hollow silo Feb 15, 2025, 7:04 AM

#

hi, I just got the command example running using whisper. this is using ggml and the 'base' model. its great, only when i have no real ambient music on. something as simple as a kind of hangar background or something makes it not detect anything. do i adjust 'temperature' or anything?

#

i would train it based on how well a voice seems in context with any detected music, in the sample. that way you can have anything on, and as long as you arent nailing the right vocal tones with the music it would think its the speaker

paper obsidian Feb 20, 2025, 12:31 PM

#

Hi Folks, can whisper base model be used for other langauges than english for the real-time usecase? Also can it be used for non-english use case for batch processing? Has anyone tried and how is the accuracy?

paper obsidian Feb 25, 2025, 5:40 AM

#

Hi Guys, we are working on a real-time STT using whisper base model, this is for a conversation between agent and customer. 2 websockets connection to the backend STT engine. The latency is little high as I understand whisper is not meant for real-time, but is there any way we can enhance this?

Also, the WER is around 35-40%, any ways to reduce this? The language is mostly Singapore English.

wanton girder Feb 25, 2025, 3:09 PM

#

hii i was wondering if whisper could run on any microcontrollers?

#

im planning to use it in a project with an esp32, but idk if itll run on it and theres barely anything on the internet about it

manic cliff Mar 4, 2025, 3:02 AM

#

So I keep getting these markers for blankness or silence. I'll get stuff like this:

00:26:56.599 --> 00:27:07.599
Silence.

00:27:08.599 --> 00:27:35.599
Silence.

00:27:36.599 --> 00:27:49.599
Silence.

Or

00:26:56.599 --> 00:27:07.599
...

00:27:08.599 --> 00:27:35.599
...

00:27:36.599 --> 00:27:49.599
...

But it's not the same. It varies by each output file. Each one has its own idea of conveying "silence" or blankness. But the thing is, I really don't want this in my stuff. Is there a way to get it to not do this? Can I just put it into the prompt? Will that work?

#

Also what it does with dead space on a track is actually hilarious. Here's what I got on dead space on the track of one of my players:

00:06:14.000 --> 00:06:32.000
Yeah.

00:06:32.000 --> 00:06:38.000
Yeah.

00:07:02.000 --> 00:07:12.000
Yeah.

00:07:12.000 --> 00:07:22.000
Yeah.

00:07:22.000 --> 00:07:40.000
Yeah.

00:07:40.000 --> 00:07:50.000
Yeah.

00:07:50.000 --> 00:08:00.000
Yeah.

00:08:00.000 --> 00:08:10.000
Yeah.

00:08:10.000 --> 00:08:20.000
Yeah.

00:08:20.000 --> 00:08:30.000
Yeah.

00:08:30.000 --> 00:08:40.000
Yeah.

00:08:40.000 --> 00:08:50.000
Yeah.

00:08:50.000 --> 00:09:00.000
Yeah.

00:09:00.000 --> 00:09:10.000
Yeah.

00:09:10.000 --> 00:09:20.000
Yeah.

00:09:20.000 --> 00:09:44.000
Yeah.

He said literally nothing in this ~4min period. At all. I listened to the whole thing to make certain of it

inner quarry Mar 13, 2025, 2:02 AM

#

manic cliff Also what it does with dead space on a track is actually hilarious. Here's what ...

I think that if you consider it to be trained on live stream data from twitch. Brief silences followed by a yeah could be someone reading chat lol.

sleek bane Mar 20, 2025, 1:05 AM

#

manic cliff So I keep getting these markers for blankness or silence. I'll get stuff like th...

You should remove all silence so it doesn't attempt to transcribe it with something like VAD preprocessing.

manic cliff Mar 20, 2025, 1:08 AM

#

sleek bane You should remove all silence so it doesn't attempt to transcribe it with someth...

That's what I ended up doing and it worked quite well. I chopped up the file based on inverse silence detection using ffmpeg

sleek bane Mar 20, 2025, 1:11 AM

#

open source GitHub projects like faster-whisper and whisperx have VAD preprocessing built-in that works rather well

versed slate Mar 20, 2025, 8:36 PM

#

So, will it be possible to download gpt-4o-transcribe? I want to run it locally

sleek bane Mar 20, 2025, 8:59 PM

#

I’ve only seen it announced on API and seems quite expensive unless I’m reading their chart incorrectly.

hot belfry Mar 20, 2025, 11:06 PM

#

so RIP this channel?

sleek bane Mar 21, 2025, 12:01 AM

#

Whisper isn't being removed and I doubt the new models will be much more (if at all) popular considering they're API only.

#

Much cheaper and easier to run Whisper locally.

finite jetty Mar 21, 2025, 2:52 AM

#

whisper is open source also

safe apex Mar 23, 2025, 9:46 AM

#

Hello, I am having an issue with a whisper related plugin. WhisperAttack is a plugin for Voice Attack that changes its language recognition to whisper's STT. But whisper never loads its model when I'm using GPU, on CPU it loads fine. It doesn't throw any error codes or something similar, it just doesn't get past the loading stage. I've done some admittedly meager testing, because this is the first time i've actually stepped into ai territory, but i've found that my system can run whisper itself through python+pytorch using a gpu. Or atleast thats what the code ChatGPT gave me said. I'd appreciate if anyone knew if there was a more concrete way I could test it, or if its something else.

#

#

Sorry that i've asked here, but WhisperAttack has no forums or discord servers i can find. So this was my best option, tied with the VoiceAttack discord.

sleek bane Mar 23, 2025, 2:30 PM

#

I have no clue about the software, never heard of it or used it.

If you created the code, simply add an insane amount of logging so you know what’s actually going on and paste it back into ChatGPT.

safe apex Mar 24, 2025, 12:34 AM

#

sleek bane I have no clue about the software, never heard of it or used it. If you create...

I didn't create the code, and I'm almost entirely code illiterate unfortunately.

sleek bane Mar 24, 2025, 1:32 AM

#

safe apex I didn't create the code, and I'm almost entirely code illiterate unfortunately.

That isn't what I meant. You said you made ChatGPT create the code. So you make ChatGPT add extensive logging throughout the file. Then you paste all the new output you've got back into ChatGPT to solve your problem.

#

Though you shouldn't just trust that any tools can do what ChatGPT says they can do. You should actually ensure it has the ability to do whatever it is you're wanting to do first. If you can't verify that, then ya, you're definitely possibly gonna spend many hours on it to achieve nothing.

safe apex Mar 24, 2025, 1:52 AM

#

sleek bane That isn't what I meant. You said you made ChatGPT create the code. So you make ...

Oh sorry, chatgpt didn't make whisper attack's code either, it just made some code for me to run on my computer to test if I could even run whisper without the plugin. I'll see if I can find out where to edit the plugins code and try what you said. Sorry if I misunderstand you again.

safe apex Mar 24, 2025, 1:53 AM

#

sleek bane Though you shouldn't just trust that any tools can do what ChatGPT says they can...

It should be able to run on GPU, that's what the plugin comes on natively (and recommends you to use). I was only able to make it work by changing the config file to use CPU

gleaming lake Mar 30, 2025, 5:41 PM

#

hello i have a question, how do you use whisper word timestamps?

sleek bane Apr 1, 2025, 12:59 AM

#

however you want to utilize them?

paper obsidian Apr 8, 2025, 3:33 AM

#

How does whisper prompts work? Does it uses the prompt for every chunk of audio that we send to the model?

harsh merlin Apr 8, 2025, 5:43 AM

#

paper obsidian How does whisper prompts work? Does it uses the prompt for every chunk of audio ...

https://github.com/openai/whisper/discussions/117#discussioncomment-3727051

finite hawk Apr 8, 2025, 6:06 PM

#

Hi all,can we get pronunciation feed back while the audio is been transcribed?

willow delta Apr 10, 2025, 2:25 PM

#

Heyhey. I am trying to get whisper to work in a java environment but am not sure what the best approach is. I am targeting a live transcript so I was using whisper live for a bit but am getting a bit tied up with the library. Is there someone that has a bit of experience or idea towards this? If this is a space to even ask this

harsh merlin Apr 11, 2025, 1:30 AM

#

willow delta Heyhey. I am trying to get whisper to work in a java environment but am not sure...

whisper cpp is very actively maintained, and has Java bindings: https://github.com/ggml-org/whisper.cpp/tree/master/bindings/java

willow delta Apr 11, 2025, 7:58 AM

#

harsh merlin whisper cpp is very actively maintained, and has Java bindings: https://github.c...

Oh hell Yeh. I will look it up. I assume its with JNI?

harsh merlin Apr 11, 2025, 8:27 AM

#

willow delta Oh hell Yeh. I will look it up. I assume its with JNI?

README said it is JNI!

willow delta Apr 11, 2025, 9:06 AM

#

harsh merlin README said it is JNI!

Thanks. Got some time soon, I shall dig in and see.

harsh merlin Apr 12, 2025, 1:58 AM

#

For anyone interested in running Whisper locally, I am working on Open-source AI notepad for meetings: https://github.com/fastrepl/hyprnote

GitHub

GitHub - fastrepl/hyprnote: AI notepad for meetings. Local-first & ...

AI notepad for meetings. Local-first & Extensible. - fastrepl/hyprnote

vestal belfry Apr 12, 2025, 5:39 AM

#

@calm finch

calm finch Apr 12, 2025, 5:39 AM

#

vestal belfry <@219071603953238016>

alright, i think i'll use the API then

vestal belfry Apr 12, 2025, 5:40 AM

#

yujonglee seems to hav something meant for your meeting stuff

#

above me

calm finch Apr 12, 2025, 5:41 AM

#

i think i'll just use the API it's not too bad

#

i'm gonna test 4o transcribe to see how good it is

harsh merlin Apr 12, 2025, 10:26 AM

#

yeah mine(Hyprnote) is for using local whisper(for now)

native storm Apr 13, 2025, 3:05 PM

#

4o transcription isn't working for me, does anyone know what to do?

cerulean tusk Apr 15, 2025, 9:40 PM

#

Is there anyone here who might be able to assist me with better understanding OpenAI whisper pricing? We are testing using them for transcription of call recordings from a telephony server, and I cannot reconcile their charges with our usage.

serene gull Apr 18, 2025, 10:01 AM

#

You listening

strange mauve Apr 18, 2025, 2:28 PM

#

What is whisper ?

hushed forge Apr 19, 2025, 1:28 AM

#

Why is ChatGPT whisper not even working

left bluff Apr 19, 2025, 3:37 PM

#

hushed forge Why is ChatGPT whisper not even working

elaborate

cedar moss Apr 20, 2025, 9:43 AM

#

what's up with whisper? i get The server had an error while processing your request. Sorry about that! Please contact us through our help center at help.openai.com if the error persists.

quaint patrol Apr 24, 2025, 7:17 PM

#

for the life of me I cannot get whisper to translate from english to another language. Any help appreciated

pure swift Apr 26, 2025, 2:33 AM

#

If I need accurate timestamps so k can show the current location in a transcript while playing the audio , is there anyway to use the new transribe models or do I have to use whisper ?

distant gazelle Apr 26, 2025, 4:25 PM

#

hello im new here so guys can you help me

bitter bough Apr 30, 2025, 11:59 PM

#

serene gull You listening

Mewheart

#

Not the discord user for clarification.

lone edge May 4, 2025, 6:03 PM

#

Hoi, is whisper capable of making non verbal sounds by using symbols to activate X non-verbal action?
Or is it capable of processing a voice shouting? As i'm trying to make some nonsense automation in home assistant, but can't seem to figure out how to make my VA sneeze for instance lol. Or make it phonetically make the noise i want by using only letters., as if there's too many letters and whatnot, it will start a spelling-bee competition lol

uneven herald May 13, 2025, 9:07 PM

#

https://youtube.com/shorts/MODrtSdWMDM?si=V9PTFofj_RCtxKXo

YouTube

Resuvonia

Get ChatGPT & Choose How Much Money You Want To Use Yourself #ai #c...

▶ Play video

muted axleBOT May 20, 2025, 4:13 PM

#

success @oceangrover muted

Reason: Possible spam: Excessive mentions in a short period.
Expiration: 56 seconds
Proof: @winter island

atomic totem May 20, 2025, 4:34 PM

#

hello

placid falcon May 22, 2025, 3:56 AM

#

I need to be able to get language realtime captured off of a PC livestream and then auto translated. I heard that whisper is the best captioning software but it doesnt have realtime captioning?? I also heard there has been a lot of things added to this software through huggingface etc. Just wondering if what im asking is possible

placid falcon May 23, 2025, 9:45 AM

#

So apparently thats a yes but the actual good model large v2 requires a nvidia cuda gpu.... Surely you can get this service as a cheap api key that doesnt run locally?

verbal walrus May 24, 2025, 6:07 PM

#

Shhhhhhh

placid falcon May 25, 2025, 9:30 PM

#

WhisperAI is one of the most interesting things that OpenAI has built and apparently no one uses or engineers it.

plucky dune May 27, 2025, 3:06 PM

#

I am Poison Apple

flint crow May 30, 2025, 3:27 PM

#

🚀 New Project Alert!
Hey everyone! I just fine-tuned OpenAI's Whisper-Tiny model to translate Bengali-English code-switched speech into English, especially for healthcare use cases like doctor-patient conversations. An easy fine-tune script for translation is open-sourced.🎙️🏥

🧠 It’s called MediBeng-Whisper-Tiny — perfect for building better clinical transcriptions and exploring speech translation tasks!

Check it out here:
🔗 GitHub: https://github.com/pr0mila/MediBeng-Whisper-Tiny
🔗 Hugging Face: https://huggingface.co/pr0mila-gh0sh/MediBeng-Whisper-Tiny

Let me know what you think or if you try it out! 😊
#SpeechTranslation #gpt-realtime #HealthcareAI #Bengali #OpenAI

GitHub

GitHub - pr0mila/MediBeng-Whisper-Tiny: MediBeng Whisper Tiny impr...

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping...

pr0mila-gh0sh/MediBeng-Whisper-Tiny · Hugging Face

fast drift Jun 1, 2025, 7:08 AM

#

Hi,
I have video which has only music/ no audio
i need to show video response for the user when they ask the queries related to video

what approach should i follow please advise

fast drift Jun 3, 2025, 1:45 PM

#

Hi,

shut quartz Jun 4, 2025, 6:35 PM

#

Hey there! is anyone here experienced with using faster-whisper while multithreading?

I’m running Faster-Whisper in a Python app that transcribes multiple audios (.wav) in parallel. I’m using ProcessPoolExecutor

The issue: sometimes the transcription just hangs silently — no errors, no CPU or GPU usage, no output. Eventually I have to kill it. When I add logging inside the transcription subprocess, it often doesn’t get past WhisperModel(...).transcribe(...).

Still stuck.

Has anyone run into this kind of silent freeze or dead GPU thread in Faster-Whisper? Any known workarounds or tips?

Thanks 🙏

rich idol Jun 8, 2025, 11:01 PM

#

the voice is better indeed now

#

its the best thing for foreign language learners

hollow sandal Jun 13, 2025, 10:43 PM

#

#openai-chatter

#

#status

pallid bay Jun 18, 2025, 2:55 AM

#

hollow sandal

was this made with whisper??

hollow sandal Jun 18, 2025, 3:16 AM

#

pallid bay was this made with whisper??

That was verbatim responses from gpt in one of my threads on chatGPT app

hollow sandal Jun 18, 2025, 3:16 AM

#

pallid bay was this made with whisper??

What is whisper lmao

pallid bay Jun 18, 2025, 4:02 AM

#

hollow sandal What is whisper lmao

You’re in the channel for whisper lol. It’s their api for text to speech

hollow sandal Jun 18, 2025, 7:44 AM

#

Oh i know it well

toxic moss Jun 21, 2025, 10:33 PM

#

Hello! Anyone here use an automated whisper transcription workflow for better note taking?

sturdy spire Jun 23, 2025, 4:49 AM

#

Anyone else having very delayed Whisper API responses over this last week?

velvet perch Jun 25, 2025, 1:25 AM

#

@narrow thorn

fair pecan Jun 30, 2025, 8:21 AM

#

anybody knows some platform or app that can do audio/video files transcription with translatioin support?

#

and export to srt

#

preferably able to run locally on windows

#

and optionally on mac

stone drift Jun 30, 2025, 11:24 AM

#

👋🏽

pale hatch Jul 1, 2025, 10:41 PM

#

Whisper not working in chatgpt?

stone drift Jul 5, 2025, 7:18 AM

#

pale hatch Whisper not working in chatgpt?

Draw a playing card backside

ember steppe Jul 7, 2025, 7:53 AM

#

why are we whispering?

slow storm Jul 11, 2025, 11:38 PM

#

shhh!

hollow wigeon Jul 13, 2025, 2:05 PM

#

has anyone figured out how to avoid large v3 going into an endless loop of repeating the same sentence thousands of times?

#

makes v3 essentially useless for longer videos, even with no background noise

quiet terrace Jul 20, 2025, 2:47 PM

#

hollow wigeon has anyone figured out how to avoid large v3 going into an endless loop of repea...

Late reply, but for everyone experiencing the same; do you use Whisper locally? If you do, give condition_on_previous_text = False. This solved all my repetition, spamming, nonsense output problem.

However, I can't find the way to give this option in API...

hollow wigeon Jul 20, 2025, 3:03 PM

#

doesnt help

#

yea I am using it locally

#

using this ui

#

https://github.com/jhj0517/Whisper-WebUI/tree/master

GitHub

GitHub - jhj0517/Whisper-WebUI: A Web UI for easy subtitle using wh...

A Web UI for easy subtitle using whisper model. Contribute to jhj0517/Whisper-WebUI development by creating an account on GitHub.

#

https://www.youtube.com/watch?v=bcIgITbt3cg
This video for example

YouTube

ANNnewsCH

都心大冠水　帰宅ラッシュ混乱　相次ぐ「記録的...

10日は関東甲信地方を中心に各地で猛烈な雨となり、都内でも道路が冠水するなどの被害が相次いだ。横浜では、大雨の影響でマンホールが吹き飛び道路が陥没するという被害も出た。

■都心大冠水　帰宅ラッシュ混乱

打ちつける雨に鳴り響く雷鳴。10...

▶ Play video

#

around 4 minutes in large v3 gets stuck even with condition on previous text disabled

quiet terrace Jul 20, 2025, 3:25 PM

#

I'm using pure Python script myself, and got this result with large-v3 You mean model stuck around here?

hollow wigeon Jul 20, 2025, 3:25 PM

#

indeed

#

might be a bug in the gui that doesnt pass on the parameter

#

so its the previous text conditioning that breaks v3

quiet terrace Jul 20, 2025, 3:28 PM

#

I think so. I mostly use Whisper with Korean, and every single transcription had that kind of problem. When I turn off previous text conditioning, every problem gone; it rarely appears, but re-transcribe works.

#

This is with previous text conditioning. It indeed causes problem...

hollow wigeon Jul 20, 2025, 3:29 PM

#

okay but it doesnt get permanently stuck

#

with previous text conditioning i get the same sentence repeated from around 4 mins until the end of the video

#

📎 2025711.srt

quiet terrace Jul 20, 2025, 3:34 PM

#

Hmm... that's strange. I only get partial repetition with conditioning. Maybe because WebUI uses its own setting for transcription?
I randomly got that kind of thing, but not every time.

hollow wigeon Jul 20, 2025, 3:36 PM

#

okay ran the app in debug mode, first of all it doesnt pass through the condition_on_previous_text to the actual transcribe call

quiet terrace Jul 20, 2025, 3:37 PM

#

It uses various default settings...

whisper:
  model_size: "large-v2"
  file_format: "SRT"
  lang: "Automatic Detection"
  is_translate: false
  beam_size: 5
  log_prob_threshold: -1
  no_speech_threshold: 0.6
  best_of: 5
  patience: 1
  condition_on_previous_text: true
  prompt_reset_on_temperature: 0.5
  initial_prompt: null
  temperature: 0
  compression_ratio_threshold: 2.4
  chunk_length: 30
  batch_size: 24
  length_penalty: 1
  repetition_penalty: 1
  no_repeat_ngram_size: 0
  prefix: null
  suppress_blank: true
  suppress_tokens: "[-1]"
  max_initial_timestamp: 1
  word_timestamps: false
  prepend_punctuations: "\"'“¿([{-"
  append_punctuations: "\"'.。,，!！?？:：”)]}、"
  max_new_tokens: null
  hallucination_silence_threshold: null
  hotwords: null
  language_detection_threshold: 0.5
  language_detection_segments: 1
  add_timestamp: false
  enable_offload: true

hollow wigeon Jul 20, 2025, 3:37 PM

#

second, it uses the faster_whisper library

#

isnt faster_whisper being deprecated?

quiet terrace Jul 20, 2025, 3:39 PM

#

https://github.com/SYSTRAN/faster-whisper/releases
Still getting updates, I suppose.

GitHub

Releases · SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2. Contribute to SYSTRAN/faster-whisper development by creating an account on GitHub.

hollow wigeon Jul 20, 2025, 3:41 PM

#

okay well then I guess I know where to file a bug report at least, thanks

snow dirge Jul 22, 2025, 2:42 AM

#

Critical user face bug

#

@snow dirge

#

Saw expected Behavior then became critical behavior need to talk to someone privately

gilded oasis Jul 23, 2025, 12:45 PM

#

Hello everyone, I'm having some difficulties training whisper (specifically large_v2) with a Greek dataset.
If anyone is available to let me pick their mind it would be greatly appreciated.
Specifically, doing a full finetune with Transformers using the common_voice_11 dataset gives me very high WER% even if loss stays low
Doing a LoRA finetune on the same dataset does not produce better results
Finetuning on a smaller dataset I have created (~1 hour) appears to be overfitting

loud plinth Jul 23, 2025, 10:44 PM

#

snow dirge Saw expected Behavior then became critical behavior need to talk to someone priv...

How about starting here. What happened?
And/Or take it to #1070006915414900886
And/Or OpenAI Help Center > Support

snow dirge Jul 24, 2025, 1:57 AM

#

The app on iOS locked up my phone and had to do a hard restart multiple times are you part of the AI team

loud plinth Jul 24, 2025, 5:38 AM

#

No, guides are community members just like you, helping people to find things and get what they're looking for.

#

Staff doesn't respond here. Our (we, the community) resources include the channels here or Help Center on the OpenAI site.

#

But I am a career developer, and you're question seemed to involve Whisper, so ... I'm offering what I can.

#

I get the impression that you're talking about ChatGPT and not Whisper?

snow dirge Jul 25, 2025, 2:12 AM

#

Yes I'm sorry it's about chat GPT i thought this Whisper was to whisper to a developer i can't get through there help stuff is driving me nuts

loud plinth Jul 25, 2025, 4:34 AM

#

Ahhh, try #chatgpt-discussions

paper obsidian Jul 25, 2025, 9:08 AM

#

Is there any benchmarking available for Translation of data using whisper ? (WER / CER)

green wolf Jul 26, 2025, 12:37 PM

#

What is whisper

loud plinth Jul 26, 2025, 8:46 PM

#

green wolf What is whisper

https://openai.com/index/whisper/

green wolf Jul 26, 2025, 8:49 PM

#

loud plinth https://openai.com/index/whisper/

Okay thanks

pastel sparrow Jul 26, 2025, 11:25 PM

#

Hi, I need some help debugging this

📎 audio_listener.py 📎 main.py 📎 pyproject.toml

#

I've tried whisper models base and large and I get nonsensical results, I've gotten Japanese, English(but far off from what i said), and in my most recent attempt, Viet

#

I only speak english and ive never been to the east of the world so my accent isn't influenced by there

#

Testing on this video (from my mic) for the first 30 ish seconds

https://youtu.be/ZNYmK19-d0U?si=rZROMHQkI07kTeJc

#

Results:

Good morning. Today, I hope you are more compacted and to the right. Today, I hope you are more compacted and to the right. Madison land, London, and Cleveland. It was no less than 10 hours ago that a black September owl was talking about the worst attack that had happened to a black and a white and black owl. That was a very, very rough shock and a terrible shock.

hollow wigeon Jul 29, 2025, 12:55 PM

#

very interesting, this video is only transcribable if no preprocessing is done, any attempt to remove background noise and do VAD will make whisper large (v2 and v3) completely give up

https://www.youtube.com/watch?v=LJGpG-dDPJ0

YouTube

散歩するアンドロイド

ラスベガスで全財産を全投入💵勝負！

▶ Play video

#

anyone got any idea whats going on?

strong mango Jul 30, 2025, 11:46 AM

#

pastel sparrow Hi, I need some help debugging this

Here's a full transcript for the video.

#

📎 Transcript.txt

pastel sparrow Jul 31, 2025, 4:08 PM

#

strong mango

it's not about the transcript. I need help debugging why my code produces an incredibly wrong transcript

strong mango Jul 31, 2025, 4:08 PM

#

Ah.

#

I could probably help, then.

#

Either that, or you could use ChatGPT.

pastel sparrow Jul 31, 2025, 4:09 PM

#

Ive tried, no luck

strong mango Jul 31, 2025, 4:10 PM

#

What about Claude?

pastel sparrow Jul 31, 2025, 4:10 PM

#

did so as well

strong mango Jul 31, 2025, 4:11 PM

#

Grok 4, if you have access?

pastel sparrow Jul 31, 2025, 4:11 PM

#

Haven

#

Haven't tried it

strong mango Jul 31, 2025, 4:11 PM

#

Well, there you go.

#

It's very smart, so it should be able to easily help.

muted axleBOT Aug 6, 2025, 2:56 PM

#

<:book_icon:1363314738255364126> Rule 3: Stay on topic.

-# Be mindful of what other users in a channel might find helpful or interesting when posting. Stay on topic in order to keep conversations focused and productive.

-# Consider posting in #off-topic or an appropriate channel.

tender rivet Aug 13, 2025, 3:02 PM

#

I just came across this awesome news: https://www.phoronix.com/news/FFmpeg-Lands-Whisper

FFmpeg 8.0 Merges OpenAI Whisper Filter For Automatic Speech Recogn...

The upcoming FFmpeg 8.0 multimedia library release continues to get more exciting almost by the day

#

this is so cool

#

trasncriptions directly on ffmpeg with the open source model

#

I love ffmpeg.. really, I couldn't think of a software that had a better impact in the world..

viscid kite Aug 16, 2025, 12:25 AM

#

hey all, i was wondering if this could be something that you could all make use of.
it's called owhisper - basically ollama for realtime speech-to-text. more in docs.
https://docs.hyprnote.com/owhisper/what-is-this

Hyprnote Docs

What is OWhisper? - Hyprnote Docs

ebon cosmos Aug 26, 2025, 12:58 AM

#

viscid kite hey all, i was wondering if this could be something that you could all make use ...

This!! I’ve been looking for something like this!! Thank you!!

distant dome Aug 31, 2025, 1:40 PM

#

Whisper is no longer working properly because OpenAI is shutting down the Whisper voice Aug 8.

#

And I am not going to use the new one 😒

distant dome Aug 31, 2025, 6:35 PM

#

They retired the #

#

They will

wintry heron Sep 2, 2025, 12:23 AM

#

tender rivet I just came across this awesome news: https://www.phoronix.com/news/FFmpeg-Lands...

That's next level, I'm definitely going to be using this. My little web app project has FFMPEG installed in the docker container, will be super handy to be able to call AI-based subtitle generation, if I'm reading this correctly!

wheat monolith Sep 2, 2025, 7:15 AM

#

https://chatgpt.com/canvas/shared/68b582a3ab9c8191bbb984cac7c3b077

ChatGPT

Meta

A conversational AI system that listens, learns, and challenges

#

Meta meta meta ...... game inside VM inside VM loool :3 open for co-creation 😉 prompt exchange, link the sun and send the moons as func(open moon = "{" and closed moon = "}" end this with a shadow or full moon 😉 :3);

#

https://chatgpt.com/canvas/shared/68b69c30f9988191b99a331ef8e6f58b new fork

#

So fun... T_T

hollow wigeon Sep 12, 2025, 4:41 PM

#

https://www.youtube.com/watch?v=pAH2z27NlWI

YouTube

日テレNEWS

【マダム・タッソー】ロンドンの観光名所・ろう...

イギリス・ロンドンの観光名所となっている、ろう人形館「マダム・タッソー」に5日、ある“新メンバー”が登場しました。新たに展示されたのは「人」ではなく…人気の軽食「ソーセージロール」。そのワケとは。

この動画の記事を読む＞
https://news.ntv.co...

▶ Play video

#

another video that whisper really struggles with

#

are there some optimal set of parameters for japanese?

#

it really struggles with clear, low-background noise language

#

if the chunk size is above 5s it skips whole sentences and even then it doesnt translate whats literally being said but often omits or changes parts of the grammar

#

1
00:00:00,000 --> 00:00:05,000
ロンドンの観光名所になっている老人業館マダム

2
00:00:05,000 --> 00:00:10,000
タッソーにはイギリスの王室のメンバーや世界の著名人の老人

3
00:00:10,000 --> 00:00:14,000
業が数多く展示されています。

4
00:00:15,000 --> 00:00:20,000
ロンドンのマダムタッソーに新たな仲間が増えました。

5
00:00:20,000 --> 00:00:23,000
ソーセージロールです。

6
00:00:25,000 --> 00:00:30,000
今回新たに展示されたのは人ではなく、

7
00:00:30,000 --> 00:00:33,000
イギリスで人気の軽食ソーセージロールです。

8
00:00:35,000 --> 00:00:38,000
6月5日がナショナルソーセージロールデーとされていて、

9
00:00:39,000 --> 00:00:42,000
マダムタッソーにも参加しました。

10
00:00:40,000 --> 00:00:43,000
マダムタッソーではイギリスのチェーン店グレックス社の

11
00:00:44,000 --> 00:00:47,000
ソーセージロールそっくりに作っています。

12
00:00:45,000 --> 00:00:49,000
今月末まで展示されることになりました。

13
00:00:50,000 --> 00:00:53,000
食べ物の老人業が飾られるのは初めてです。

14
00:00:55,000 --> 00:00:58,000
グレックス社のソーセージロールはイギリスで人気のスナックです。

15
00:01:00,000 --> 00:01:03,000
およそ100万個が販売されているということです。

16
00:01:05,000 --> 00:01:07,000
ソーセージロールはカリカリで柔らかく、

17
00:01:08,000 --> 00:01:10,000
柔らかくて柔らかく、

18
00:01:10,000 --> 00:01:13,000
調味料もとても良いです。

19
00:01:15,000 --> 00:01:17,000
老人業の製作チームは、

20
00:01:18,000 --> 00:01:20,000
ソーセージロールのパリッとしたパイの層と、

21
00:01:20,000 --> 00:01:23,000
サクサク感を再現するために試行錯誤を重ね、

22
00:01:24,000 --> 00:01:26,000
数か月をかけたものです。

23
00:01:25,000 --> 00:01:28,000
すべて作品を完成させたということです。

ruby night Sep 16, 2025, 9:53 PM

#

What's up with the Whisper being so horribly broken in the ChatGPT SVM for the last 3 or 4 weeks? It hallucinates all the time... Is it deliberate action by OpenAI to discourage users from SVM? I tested it on several different devices including the browser with high quality microphone and results are basically the same... Because of that we (I mean, me and ChatGPT) started calling it the Careless Whisper...

rocky light Sep 18, 2025, 4:53 PM

#

Which model are you using?

ruby night Sep 20, 2025, 12:40 PM

#

rocky light Which model are you using?

ChatGPT with gpt-4o and SVM, but the STT hallucinations persist when using gpt-5 too. AVM doesn't seem to have these issues, but being AVM it's horrible when it comes to conversation being significant.

wind herald Sep 23, 2025, 12:09 PM

#

HyprNote looks great but looking for something like that for Windows. Does anyone have any suggestions?

abstract orbit Sep 25, 2025, 4:37 PM

#

Gaymongus

ruby night Sep 25, 2025, 9:12 PM

#

Since the last weekend the Whisper-related issue in ChatGPT I reported in previous msg seems to be fixed!

trim rune Sep 26, 2025, 5:26 PM

#

Hey guys i am having some troubles using the transcripition method . for some reason some audio files sometimes dosent transcript all the content in it.
did you guys ever got that problem?
how did you solve them?

visual crypt Sep 27, 2025, 9:55 AM

#

Hi @trim rune
Yeah, that happens sometimes. A few reasons could be background noise, overlapping voices, or the model hitting length limits and cutting off. I usually fix it by either cleaning up the audio first (noise reduction, splitting long files into chunks) or using a different transcription tool/model. Breaking the file into smaller segments tends to help the most.

hollow wigeon Sep 28, 2025, 11:03 AM

#

I have a lot of "almost correct" subtitles in Polish. I.e. not CC but regular TV subtitles. Would it be possible to use Whisper to correct each of the lines in this subtitle given an audio segment and the original almost correct line?

analog remnant Sep 28, 2025, 2:43 PM

#

hi everyone i just want to know how can i use whisper and is it free for the chatgpt plus user ? Thank for any respons please

dense imp Oct 10, 2025, 9:30 AM

#

Has anyone had a lot of success with diarization using things like pyannote? I seemed always have problems

worldly laurel Oct 10, 2025, 4:23 PM

#

how can i start using sora

ripe stag Oct 10, 2025, 5:04 PM

#

hollow wigeon I have a lot of "almost correct" subtitles in Polish. I.e. not CC but regular TV...

All you need is the audio files (wav, mp3, or mp4), feed those into Whisper and it should transcribe into text.

ripe stag Oct 10, 2025, 5:08 PM

#

analog remnant hi everyone i just want to know how can i use whisper and is it free for the cha...

You don’t even need ChatGPT Plus, you can use Whisper for free as a free user, after getting it set up and running on your computer, just open the windows terminal/command prompt and run whisper “path of your audio file”, don’t forget to add the type of audio file at the end and have quotes.

ripe stag Oct 10, 2025, 5:09 PM

#

dense imp Has anyone had a lot of success with diarization using things like pyannote? I s...

I haven’t used pyannote but you should just be able to use WhisperX as that is basic Whisper plus speech diarization all in one, no need for a separate software.

dense imp Oct 10, 2025, 5:38 PM

#

ripe stag I haven’t used pyannote but you should just be able to use WhisperX as that is b...

👯‍♂️ Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels)
https://github.com/m-bain/whisperX

WhispherX is built on pyannote. I tried it a few years back without much luck. I'll try again to see if its better.

GitHub

GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition w...

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization) - m-bain/whisperX

ripe stag Oct 10, 2025, 5:40 PM

#

Ok yeah that’s a while ago, definitely try it again and see how it goes.

vocal sierra Oct 11, 2025, 2:15 AM

#

what can whisper do

#

in ChatGPT once i uploaded the audio file

#

but it's a concern if ChatGPT puts out an error due to some dependency issue iirc

#

i wonder if they fixed it now

#

because i tried it earlier it starts to complain 😤

ripe stag Oct 11, 2025, 2:29 AM

#

@dense imp I was working on getting WhisperX to work on my windows pc, and I finally got it to work. Here's what you need to do:

I take it you already have the basic Whisper installed (without speech diarization), if so you already have dependencies needed for WhisperX (PyTorch, ffmpeg, etc.). Regarding Python specifically, I recommend version 3.11, it worked the best for me, so in your environment system variables on your Windows PC make sure your paths for both user and system variables has only the python 3.11 version path, as other python version paths might cause conflicts.
Download Git for HuggingFace models: https://git-scm.com/downloads/win
Install required Python packages in Windows Command Prompt:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install whisperx pyannote.audio pydub
pip install ffmpeg-python
If you already have a later version of CUDA installed on your computer (e.g., 13.0), that's fine, as it is backward-compatible. If you don’t have a CUDA GPU, just install the CPU version of PyTorch with --index-url https://download.pytorch.org/whl/cpu.
Get Hugging Face token: https://huggingface.co/join
Sign up, create a new token, make it Read access, and give it any name you'd like.
Accept each of these model's conditions while still logged in your Hugging Face account:
https://huggingface.co/pyannote/speaker-diarization-3.1
https://huggingface.co/pyannote/segmentation-3.0
Run the script file attached in Python. Fill in yourhuggingfacetoken with your own token and pathtoaudiofile in mp4_file and wav_file with the path to your own audio file.

Hope this helps.

📎 Script.txt

pyannote/speaker-diarization-3.1 · Hugging Face

#

@vocal sierra Whisper transcribes audio files to text, and if you have WhisperX it will add speech diarization. Regarding your concern with ChatGPT, you don't upload the audio file directly into ChatGPT, you have to install Whisper first, and then once that's running then you can take an audio file of yours in wav, mp3, or mp4 audio file format, then run the following one-line command to test: whisper "pathtoaudiofile.wav", replace pathtoaudiofile with your actual path for your audio file in File Explorer and replace .wav with whatever your audio file format is. If you want to add diarization, you can go through the process I outlined above to get WhisperX running.

dense imp Oct 11, 2025, 3:05 AM

#

ripe stag <@989731000022097961> I was working on getting WhisperX to work on my windows pc...

Yeah, I'm familiar. I got it to compile last time. It was the results that was the main issue. The diarization just had incredibly poor accuracy. Not really sure why.

ripe stag Oct 11, 2025, 3:13 AM

#

Oh gotcha, yeah that’s interesting because my diarization also wasn’t very accurate, but it did compile, just like yours. Hopefully it gets improved soon.

manic holly Oct 21, 2025, 8:54 AM

#

I've just released a full benchmark on a STT model trained for one of our customer. 9-page full of value to compare models + GitHub code to be able to compare models by yourself ! Send me a DM and I send you the PDF 🙂

marble crater Oct 24, 2025, 10:27 PM

#

its called whisper guys so shhh we have to whisper

glossy bone Oct 25, 2025, 1:49 PM

#

It includes whispering

slow hound Oct 27, 2025, 12:47 PM

#

I used open AI whisper -large-v3 to build this give your feedback guyys - https://aitranscript.in/

AI Transcriber

AI Video Transcriber - Transform Your Videos Instantly | 99% Accuracy

Transform your videos into accurate transcripts instantly with AI. Upload, transcribe, and analyze your content with our powerful video transcription tool. Free trial available.

solar dew Oct 27, 2025, 11:01 PM

#

What do u guys think about the new gpt 4o diarize

ripe stag Oct 28, 2025, 1:22 PM

#

solar dew What do u guys think about the new gpt 4o diarize

I haven’t tried it as I’m not sure if I want to have to pay for the API tokens, but I heard it’s a lot more accurate than Whisper. However, you’re restricted to what OpenAI allows you to do in the API so you won’t get as much customization and options as Whisper being open-source.

solar dew Oct 31, 2025, 7:25 PM

#

They should release whisper diarize open

marsh canopy Nov 3, 2025, 10:05 PM

#

When we say whisper are we talking about the translation model?

worthy fable Nov 4, 2025, 11:38 AM

#

solar dew What do u guys think about the new gpt 4o diarize

The model seems pretty powerful but I still find a lot of bug, and it still doesn't support many of the old features, like prompting and real time. Did you have success with it ?

#

Mostly I find that it always try to translate audio in english, even if the language parameter is set and no one speaks english in the original audio

slow hound Nov 7, 2025, 8:14 AM

#

🎥✨ Introducing AITranscript.in
— Free, Fast, and No Sign-Up Needed!

🚀 AITranscript turns any video into clear, actionable insights — instantly.
No login. No limits. No catch.

💡 What you get:
🎧 Automatic Transcription — powered by Whisper
🧠 AI-Generated Action Points — focus on what truly matters
💬 Smart Chat Assistant — ask anything from the video context
📜 Instant View Mode — see full script + key takeaways before download

🆓 100% Free. Forever.
⚡ Just upload a video → get insights → done.

🔗 Try now → https://aitranscript.in

AI Transcriber

AI Video Transcriber - Transform Your Videos Instantly | 99% Accuracy

Transform your videos into accurate transcripts instantly with AI. Upload, transcribe, and analyze your content with our powerful video transcription tool. Free trial available.

outer pawn Nov 8, 2025, 11:58 PM

#

Check your Transcriptions! Please read attachment. Very quick update. I had a song that I wrote in English, I took the English Lyrics and transcribed them in Chinese Simplified to create the song in Chinese. I used AI tools and created the song, but wanted to see how the lyrics looked after Chinese conversion. I took the song .mp3 file and had the song file transcribed back into text using melobytes. (which I discovered later uses 'Whisper". The song was in Chinese, so it generated text output in Chinese. I converted txt back into English and was shocked that Whisper had given attribution, (credit for the song as writer and composer to a Li Zongsheng, which IS a real person, he is a well known musician. I was shocked. I did extensive testing, and reported it to OpenAI. I then downloaded Whisper on my home computer and ran the text again. For a 2nd and 3rd time received the note that the song was written composed by Li Zongsheng. Running the same tests on a different song, some spam/donation for some other persons YouTube account was in my transcription. Please look at the attachment for a lot more information and check your transcriptions/conversions as it is likely that the output contains credit and spam injected in your work. The results differ with different languages. I did not see any errors in English conversion/text. Chinese was the most problematic. But millions of people have used Whisper to transcribe audio into text, so this is a real problem. Look at the attachment and GitHub: https://github.com/openai/whisper/discussions/2685

📎 message.txt

GitHub

Critical Bug: Whisper Fabricates False Copyright Attribution (Li Zo...

Critical Bug: Whisper Fabricates False Copyright Attribution (Li Zongsheng) on Original Chinese Music ### 2. In the "Add a body" field (the large text box), paste this entire text: ST...

wheat pond Nov 9, 2025, 3:59 PM

#

I create a script that allow you to to transcribe videos and add stylish subtitles using Whisper and .A S S files for your shorts videos (9:16)
https://github.com/revanflp/Frot
⭐ it !

GitHub

GitHub - revanflp/Frot: A simple tool to transcribe videos and add ...

A simple tool to transcribe videos and add stylish subtitles using Whisper and .ASS files for your shorts videos (9:16) - revanflp/Frot

main current Nov 13, 2025, 6:07 AM

#

So Ive been running into an issue when trying to upload a file to transcribe using Whisper. It tells me that it cannot perform this operation because its using a path instead of being uploaded but I am dragging the file into the chat box and creating the prompt

#

I decided to try using the API, but it wont allow me to attach my card for billing (separate issue lol) But is this issue im having with uploading the file because its too large?

jolly forge Nov 16, 2025, 6:38 AM

#

can someone send me the most accurate settings and model possible i can use for max accuracy im not too concerned with time

storm blaze Nov 16, 2025, 8:13 PM

#

wheat pond I create a script that allow you to to transcribe videos and add stylish subtitl...

.a#s is crazy

wheat pond Nov 16, 2025, 8:14 PM

#

storm blaze .a#s is crazy

yep

spring umbra Nov 17, 2025, 6:53 AM

#

oblique thunder Nov 20, 2025, 12:05 AM

#

I used Whisper as the transcription engine for a new project I’ve been building called ContentRob. It takes any informational video and turns it into high-quality written content, SEO articles, tutorials, case studies, and even share-ready infographics.

Whisper handles the speech-to-text layer, and the output is then processed into different content formats. You can also export as PDF or DOCX, repurpose the same video into multiple formats, and publish or schedule posts directly.

If you want to try the demo, it’s available here:
https://contentrob.com/

Open to connect, collaborate, or discuss the implementation details.

ContentRob - Transform Videos Into High-Quality Content

AI-powered platform that converts videos into blog posts, tutorials, and articles. Export, repurpose, and publish across platforms effortlessly.

amber marsh Nov 20, 2025, 7:03 PM

#

Hello

remote crow Nov 21, 2025, 4:07 AM

#

Um

#

Yur

proud venture Nov 24, 2025, 10:17 PM

#

main current So Ive been running into an issue when trying to upload a file to transcribe usi...

Check your local browser settings. I remember on older versions of some browsers it would enter a "fakepath" as a dummy which would simulate the upload, but not release the files. It could also be filesize, but I'd expect you'd get an error to that effect.

brisk flicker Nov 28, 2025, 9:30 AM

#

now I am looking for dev who have rich experience with openai, whisper, electron and ffmpeg. now I am going to try to get app for ai interview assistant. so we need to implement function that voice to text from mic, and speaker from live meeting

#

tacit tree Dec 1, 2025, 6:19 AM

#

Voice meeter potato

tacit tree Dec 1, 2025, 6:20 AM

#

brisk flicker

The standard

#

D bus the mixer then speech zo text

brisk flicker Dec 1, 2025, 6:34 AM

#

tacit tree The standard

do you have any experience with this ? @tacit tree

brisk flicker Dec 1, 2025, 6:36 AM

#

brisk flicker now I am looking for dev who have rich experience with openai, whisper, electron...

please help me.. !!!

tacit tree Dec 1, 2025, 6:36 AM

#

brisk flicker do you have any experience with this ? <@615439984660447273>

None

#

But i do produce music

#

In Ableton

#

Bassicly

#

Poržtato lets u route audio

#

Over ports

#

Same like pulse audio

#

But u have 8 channels

#

https://vb-audio.com/Voicemeeter/potato.htm

VB-Audio VoiceMeeter Potato

VoiceMeeter Potato, the Ultimate Virtual Audio Mixer for Windows

#

#

This is bsnnsna i think verdion eithb4

#

But the point is the vbsn

#

Vban

hollow dock Dec 9, 2025, 4:28 AM

#

wet wren Dec 17, 2025, 9:29 PM

#

brisk flicker now I am looking for dev who have rich experience with openai, whisper, electron...

Oh that’s doable for me

#

I’ve already been working on similar pipelines necessary for my own project actually lol

#

Takes live audio streams, handles audio transcription, uses context and information to determine various important variables. Long story short it is for a NVR system that leverages ai workflows to streamline alerting and monitoring of security systems via audio and video data.

#

Your could possibly use a heavily modified form of this pipeline that essentially just has the basic parts and systemization already together anyways

severe solar Dec 25, 2025, 11:51 AM

#

would it be legal/ethical to manually port whisper to windows

#

I mean I think chatgpt could do it if I gave enough time

#

WRONG THING not whisper Ignore my message

worn latch Dec 27, 2025, 10:43 PM

#

severe solar would it be legal/ethical to manually port whisper to windows

What do you mean? Can’t you already run whisper using it’s python library on windows? Worked for me

severe solar Dec 28, 2025, 4:07 AM

#

worn latch What do you mean? Can’t you already run whisper using it’s python library on win...

Look below my message, I accidentally said whisper, I meant to say atlas.

worn latch Dec 28, 2025, 7:40 AM

#

severe solar Look below my message, I accidentally said whisper, I meant to say atlas.

oh my bad

sour cloud Dec 29, 2025, 2:31 PM

#

I'd love to connect AI engineers who love learning

sonic cedar Dec 29, 2025, 3:21 PM

#

sour cloud I'd love to connect AI engineers who love learning

Let's connect

sour cloud Dec 30, 2025, 5:23 AM

#

sonic cedar Let's connect

Ok. msg me

swift karma Jan 12, 2026, 12:41 AM

#

I am running into problems with whisper-v3-large is just performs so much worse thatn v2. Refusing to translate, looping over sentences. Is there anything I can do

lofty mesa Jan 15, 2026, 10:10 PM

#

Ok

small swallow Jan 17, 2026, 10:56 AM

#

Wsp

tacit cairn Jan 17, 2026, 7:17 PM

#

Hola mi gente

marsh canopy Jan 19, 2026, 3:56 AM

#

severe solar I mean I think chatgpt could do it if I gave enough time

Whisper can be run offline but huntingface also has some stuff. Kinda depends on what the beam you need is and a few other factors. Regardless both need API access

severe solar Jan 19, 2026, 3:56 AM

#

marsh canopy Whisper can be run offline but huntingface also has some stuff. Kinda depends on...

lol I meant to say atlas

marsh canopy Jan 19, 2026, 3:57 AM

#

hollow dock

That's fun. I wonder how it does with silence

marsh canopy Jan 19, 2026, 3:59 AM

#

worthy fable The model seems pretty powerful but I still find a lot of bug, and it still does...

You got to set the language filter or it almost always defaults to English

worthy fable Jan 19, 2026, 7:28 AM

#

marsh canopy You got to set the language filter or it almost always defaults to English

At the time of my post, the language filter was not an available option and was ignored by the api if provided. Has something changed ? I can't find any changelog

hollow dock Jan 20, 2026, 1:40 PM

#

marsh canopy That's fun. I wonder how it does with silence

I actually have a minor bug in that silence comes out as [Blank Audio} occasionally but it is an edge case normally its fine

#

I'm just to lazy to fix it right nowq.

#

😄

crude axle Jan 20, 2026, 7:47 PM

#

ₛₕₕₕ...

lilac cairn Jan 21, 2026, 12:36 PM

#

Hello how do I fix wrong transcrib of the whisper small model on Serbian/Croatian/Bosnian

exotic cape Jan 21, 2026, 7:51 PM

#

New whisper coming out never

oak sapphire Jan 22, 2026, 6:11 PM

#

Hi

glacial ferry Jan 22, 2026, 9:40 PM

#

hi

mellow ingot Jan 25, 2026, 3:11 PM

#

Is the Whisper project still being actively developed, or has it gone quiet? It’s a great model, but it’s starting to fall behind the competition. It would be awesome to see a Whisper v4.

dusk notch Jan 26, 2026, 5:17 AM

#

s

keen magnet Feb 1, 2026, 5:51 AM

#

mellow ingot Is the Whisper project still being actively developed, or has it gone quiet? It’...

I’m also looking forward to the new version. Will OpenAI be launched new version this year?

keen magnet Feb 5, 2026, 12:48 AM

#

Mixtral’s Voxtral Transcribe 2 just launched. Really wish whisper have new version soon.

left matrix Feb 5, 2026, 11:56 AM

#

yes, many guys need speech to text servise!!!!

#

fast, and precise speech to text model

visual crypt Feb 7, 2026, 7:40 PM

#

what is the difference between whisper and elevenlabs?

blissful parrot Feb 7, 2026, 7:42 PM

#

Higgisfiead code

visual crypt Feb 7, 2026, 7:43 PM

#

?

gloomy sable Feb 7, 2026, 10:23 PM

#

visual crypt what is the difference between whisper and elevenlabs?

Whisper is open sourced

visual crypt Feb 7, 2026, 10:23 PM

#

To use method or capabilites, I wanna know

visual crypt Feb 8, 2026, 9:07 AM

#

@gloomy sable

#

Can you hear me now @gloomy sable

vocal moth Feb 17, 2026, 7:46 AM

#

What is the best STT or TTS model?

wary night Feb 19, 2026, 6:03 AM

#

@vocal moth For STT you may use Azure Voice API and for TTS you may use Cartesia.

vocal moth Feb 19, 2026, 2:47 PM

#

wary night <@1453030554197819516> For STT you may use Azure Voice API and for TTS you may ...

thank you.
Should I choose between Azure Voice API and Deepgram?

wary night Feb 19, 2026, 2:51 PM

#

Azure voice api gives you real time language changes facility.

pine shuttle Feb 22, 2026, 2:24 PM

#

help my whisper-large-v3 is tweaking

#

#

why is it doing that?

vocal moth Feb 23, 2026, 5:46 PM

#

wary night Azure voice api gives you real time language changes facility.

what is mean ?

#

I think Deepgram also provides that function.

wary night Feb 23, 2026, 6:15 PM

#

I recently done a PoC on it.

maiden sentinel Feb 25, 2026, 7:37 AM

#

bruh

whole venture Feb 26, 2026, 3:47 AM

#

ai

#

am i right guys

gusty fjord Feb 26, 2026, 3:52 AM

#

pine shuttle

That happens when it runs out of memory ):

visual crypt Feb 28, 2026, 8:02 PM

#

wary night I recently done a PoC on it.

So, you need help right?

wary night Feb 28, 2026, 8:11 PM

#

No

visual crypt Mar 3, 2026, 8:11 PM

#

What is your POC?

keen magnet Mar 8, 2026, 8:40 AM

#

pine shuttle

We all need new S2T model.

hybrid robin Mar 8, 2026, 3:50 PM

#

pine shuttle why is it doing that?

Stuck Emotional over run.

wary night Mar 10, 2026, 12:53 PM

#

visual crypt What is your POC?

AI base sales agent.

dark shore Mar 28, 2026, 3:36 PM

#

hello

silver wraith Mar 30, 2026, 4:23 PM

#

vocal moth What is the best STT or TTS model?

rrs? Qwen 3 tts best cloning best prosidy and lightweight if you are low on resources 1.7b or 0.6 b.
Works like a charm on reds openwebui with a fastAPI wrapper

lean patrol Apr 9, 2026, 12:29 AM

#

What’s whispering?

void bramble Apr 10, 2026, 11:56 PM

#

lean patrol What’s whispering?

I think it's a open source tool for LLMs to hear speech or something

tawny horizon Apr 12, 2026, 1:23 AM

#

I still can't find the speech-to-text of oss that goes beyond whisper in Japanese.

narrow viper Apr 18, 2026, 7:30 AM

#

hii

visual crypt Apr 20, 2026, 3:40 AM

#

hi

keen magnet Apr 20, 2026, 3:06 PM

#

Can we expect newer whisper model this year?

keen magnet May 2, 2026, 3:21 AM

#

Can we expect a better open-weight whisper model this year?

scenic condor May 7, 2026, 8:41 AM

#

no

strong orchid May 13, 2026, 8:51 PM

#

Since this incident happens today (5/13) https://status.openai.com/incidents/01KRG0AZKH41DV4D9SNJSXM33Q#01KRG0AZKHH37CKBST5E3WBQW6 we are with realtime api down. Have anyone with the same issue?

In our case, the flow is now:

SIP INVITE reaches OpenAI.
OpenAI dispatches the realtime.call.incoming webhook to our server.
We call /v1/realtime/calls/{call_id}/accept.
The accept request returns HTTP 200.
Immediately after that, connecting to wss://api.openai.com/v1/realtime?call_id={call_id} returns:

{
"error": {
"message": "No session found for the provided call_id",
"type": "invalid_request_error",
"code": "call_id_not_found",
"param": ""
}
}

OpenAI Status

Realtime API - SIP/WebRTC flow are down - OpenAI Status

All impacted services have now fully recovered.

tender escarp May 15, 2026, 10:02 PM

#

Since the realtime api went down my applications was getting the following error:

#

Realtime call failed {
status: 400,
statusText: 'Bad Request',
body: '{\n' +
' "error": {\n' +
' "message": "The Realtime Beta API is no longer supported. Please use /v1/realtime for the GA API.",\n' +
' "type": "invalid_request_error",\n' +
' "code": "beta_api_shape_disabled",\n' +
' "param": ""\n' +
' }\n' +
'}'
}

#

i made some changes and now i cant escape this error

#

hardy snow May 17, 2026, 11:51 AM

#

hi

worthy glen May 18, 2026, 6:31 PM

#

Ok