dull sable Jun 23, 2023, 5:47 AM

#

Thanks for the suggestion! I don't see a direct/legitimate way to programmatically use this tool, so probably not (I guess I never specified that's one of my requirements, but I'm mostly hoping for general advice). I'm also not sure if this can actually remove the offending sections rather than just make them sound like speech, since that seems to be what it is doing.

cerulean flint Jun 23, 2023, 5:48 AM

#

dull sable Thanks for the suggestion! I don't see a direct/legitimate way to programmatical...

well it is in Beta right now.. maybe there will be an API later. So far i use it for realy bad interviews with a lot of external noise.. works fine for me in my case

small juniper Jun 23, 2023, 3:49 PM

#

tpacker Hi seem to be quite

normal bloom Jun 23, 2023, 9:26 PM

#

i made a little tool so that my mom can use whisper at translate.mom (hopefully i'm not breaking any rules!)

tight stirrup Jun 24, 2023, 1:01 AM

#

Hey, I'm using the local version of whisper and was wondering if I compress an audio file from 1,411 to 96 Kbps, in general would there be much of a speedup in transcription time, and how much of a decrease in accuracy would I see?

fathom escarp Jun 24, 2023, 3:41 PM

#

I don't think there would be much speedup

rugged pasture Jun 24, 2023, 6:24 PM

#

translate.mom then voila what's happening?

sonic mango Jun 25, 2023, 1:53 PM

#

Hello everyone. Wanted to know if whisper was handling diarization ?

sonic mango Jun 25, 2023, 2:00 PM

#

rugged pasture translate.mom then voila what's happening?

looks pretty good

autumn bolt Jun 26, 2023, 6:11 AM

#

hello question how can you have whisper

eternal osprey Jun 26, 2023, 3:04 PM

#

I played around with the Lex Fridman interview with Mark Zuckerberg using Whisper and ChatGPT. This could be a really cool use case for processing interviews: https://lastmileai.dev/workbooks/clj9c2dxw01uzr0gvlk8rbx57

🔊 Lex Fridman interviews... | LastMile AI

In this workbook, we'll do some cool things with Lex Fridman's most recent interview of Mark Zuckerberg about Meta's next AI model release (the next version of LLaMA)! We hope this inspires you to explore workbooks with Whisper, an audio-to-text model.

autumn bolt Jun 26, 2023, 3:50 PM

#

Hello everyone. I would like to know how or where to contact the marketing team.

simple latch Jun 26, 2023, 8:39 PM

#

autumn bolt hello question how can you have whisper

https://www.youtube.com/watch?v=ABFqbY_rmEk

YouTube

Kevin Stratvert

How to Install & Use Whisper AI Voice to Text

In this step-by-step tutorial, learn how to transcribe speech into text using OpenAI's Whisper AI. Whisper AI is an AI speech recognition system that can transcribe and translate audio files in approximately 100 different languages.

📚 RESOURCES

Install Python: https://www.python.org/
Install PyTorch: https://pytorch.org/get-started/locally/...

▶ Play video

hearty shore Jun 27, 2023, 12:58 PM

#

eternal osprey I played around with the Lex Fridman interview with Mark Zuckerberg using Whispe...

Thanks for pre-emptively answering a question I had about using whisper to identify multiple voices (e.g. in a podcast or interview). Nice lateral thinking as well, I wouldn't have considered ChatGPT for picking up on the flow of communication! 🤗

autumn bolt Jun 28, 2023, 8:06 AM

#

If anyone is into AI development on a beginner scale hmu I got a project I need help on

static thunder Jun 28, 2023, 9:28 AM

#

hihi

real river Jun 28, 2023, 9:31 AM

#

Too loud.
Can you whisper?

desert dust Jun 28, 2023, 10:21 AM

#

desert dust Jun 28, 2023, 10:45 AM

#

Why

plain swift Jun 28, 2023, 1:17 PM

#

Can you pull the captions from video websites for training?

chrome sail Jun 28, 2023, 2:22 PM

#

Hey, what's the required specification to run the Whisper Model on a VPS?

steel fern Jun 30, 2023, 2:53 PM

#

Should like to check, is anyone interested in - or has there been any discussion - on audio-to-text transcription that captures details of speaker identities (i.e. "speaker diarization")?

#

I'm following the guide here https://github.com/m-bain/whisperX and referring to https://github.com/m-bain/whisperX/blob/main/whisperx/transcribe.py for command-line options

#

I'm having some limited success, but seem to be constrained by sample sizes no more than a couple of minutes long ..

cerulean flint Jun 30, 2023, 6:20 PM

#

autumn bolt hello question how can you have whisper

It is a GitHub Project

cerulean flint Jun 30, 2023, 6:21 PM

#

steel fern Should like to check, is anyone interested in - or has there been any discussion...

there is a whisper fork that tries speaker identification.. i am not on my computer rn butmaybe you can google it

steel fern Jun 30, 2023, 7:16 PM

#

cerulean flint there is a whisper fork that tries speaker identification.. i am not on my compu...

Hi @cerulean flint many thanks for your suggestion, I was intending to give more detail earlier as I'm working on sample code from WhisperX and actually I think it looks promising

#

Not sure if this is a fork from Whisper .. if there are other projects I'm interested

#

I managed earlier to break down a 2-minute audio sample by speakers, labelled SPEAKER_00, SPEAKER_01, .. this seemed like a good start, except on inspection it wasn't very accurate

steel fern Jun 30, 2023, 7:38 PM

#

regarding the 2-minute sample (it was an export from the leading 2 minutes of a longer 30 minute segment) it produced a 15 line transcript which in terms of word accuracy was quite good, I printed the speaker labels next to the text segments like so

SPEAKER_02 And he is now suspended, ready for receiving his travel next week.
SPEAKER_02 And [name] and [name] are available.
SPEAKER_02 So the donor blood group is outposed, the recipient is outposed.
SPEAKER_02 It's a 0-1-1 mismatch.
SPEAKER_02 They had full cross-match on the 18th of the 5th, which was negative.
SPEAKER_05 Do you want any other antibody samples?
SPEAKER_02 No.
...

#

When I ran a 10 minute sample though against the same code, I only managed to get the first 70 lines of transcript, which corresponded to about 2/3 of the sample

steel fern Jun 30, 2023, 8:17 PM

#

here is the full source code I was using

import whisperx
device = "cpu"
language = "en"
audio_filename = "audio-sample.mp3"
# -- transcription
model = whisperx.load_model("large-v2", device, compute_type="int8", language=language)
audio = whisperx.load_audio(audio_filename)
result = model.transcribe(audio, batch_size=16)
# -- alignment
model_a, metadata = whisperx.load_align_model(language_code=language, device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
# -- diarization
YOUR_HF_TOKEN = "hf_xxx"
diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
diarize_segments = diarize_model(audio_filename, min_speakers=4, max_speakers=7)
# -- speaker assignment
result = whisperx.assign_word_speakers(diarize_segments, result)
# -- print result
for i in range(len(result["segments"])):
    print("{} {}".format(result["segments"][i]['speaker'], result["segments"][i]['text']))

steel fern Jul 1, 2023, 1:01 AM

#

Can update, the problem I had earlier - with audio length - is because the print method is broken .. !

#

sometimes a speaker isn't assigned to a text segment and if that happens then result["segments"][i]['speaker'] terminates the script early

#

I rewrote the printout, i.e. the part following result = whisperx.assign_word_speakers(diarize_segments, result) as

# -- print result
for i in range(len(result["segments"])):
    speaker = result["segments"][i].get('speaker', 'SPEAKER-UNIDENTIFIED')
    text = result["segments"][i]['text']
    print("{} {}".format(speaker, text))

scarlet crest Jul 1, 2023, 6:46 PM

#

anyone might know why

const transcription = await openai.createTranscription({
  file: buffer,
  model: 'whisper-1',
  response_format: 'json'
});

returns an error?: Error creating transcription: RequiredError: Required parameter model was null or undefined when calling createTranscription.

this is the openai configuration: ```js
const configuration = new Configuration({
apiKey: OPENAI_API_KEY,
organization: OPENAI_ORGANIZATION
});

export const openai = new OpenAIApi(configuration);

untold sparrow Jul 4, 2023, 6:01 AM

#

Why medium.en require more VRAM but have less speed than tiny.en ?

cerulean flint Jul 4, 2023, 6:56 AM

#

untold sparrow Why *medium.en* require more VRAM but have less speed than *tiny.en* ?

?

#

that is exactly how it should be?

#

large is bigger and the slowest

untold sparrow Jul 4, 2023, 7:14 AM

#

cerulean flint ?

ohhh mb i tough it was better in term of performance and recognition

cerulean flint Jul 4, 2023, 7:33 AM

#

untold sparrow ohhh mb i tough it was better in term of performance and recognition

it is, but that takes more time 😉

slate geode Jul 5, 2023, 4:37 AM

#

How does using the api compare to running whisper.cpp locally? I'm using the api for my app but I'm wondering if it would be worth it to generate the transcripts locally for a performance boost.

slate geode Jul 5, 2023, 6:15 AM

#

Also, I'm using zh (chinese) for the language and getting errors for my requests. Works for french and english though.

grand gull Jul 5, 2023, 8:03 AM

#

Is this only for the whisper api?

#

I wanted to know other people's solution on how to get "real time" voice transcription with microphone

stiff remnant Jul 5, 2023, 9:13 AM

#

does the sample rate of an mp3 affect the usage cost?

tacit wave Jul 5, 2023, 9:36 AM

#

Has anyone used fast-whisper?

woven bluff Jul 5, 2023, 10:56 AM

#

grand gull I wanted to know other people's solution on how to get "real time" voice transcr...

I use Deepgram, it has Whisper model as well as their own models, it has fast accurate real time transcription

small juniper Jul 5, 2023, 3:39 PM

#

I wanted to know other people s solution

untold sparrow Jul 6, 2023, 2:22 AM

#

#

Hi, why im getting nothing when i enter the commands ?
whisper --model tiny.en "test.mp3"
like i get

devout shuttle Jul 6, 2023, 2:25 AM

#

hello

untold sparrow Jul 6, 2023, 5:15 PM

#

is it possible to translate the output file ?

#

into another language

cerulean flint Jul 6, 2023, 6:06 PM

#

untold sparrow into another language

at least not with whisper.. but google translate etc. can do it for sure?

untold sparrow Jul 6, 2023, 6:27 PM

#

cerulean flint at least not with whisper.. but google translate etc. can do it for sure?

yes it doing great but it would be good to automate

sly apex Jul 7, 2023, 4:49 AM

#

Anyone able to get half decent results with small whisper models running locally on OrangePi or RK3588 boards?

valid kernel Jul 7, 2023, 7:17 AM

#

ллллллллллллллллллллл

#

єєєєєєєєєєєєєєєєєєєєєєєждлорнепаквіфівссапрролджє.дбьотипсіячсмитьбю.

clear gazelle Jul 7, 2023, 2:02 PM

#

Hello, I have a question about Whisper. I want to try incorporating it into a small Python program for a voice assistant. Can someone please help me?

untold sparrow Jul 10, 2023, 4:44 AM

#

whisper --model medium.en "audio.mp3" --output_format txt srt

#

I use this command and i only want to have the format **txt **and **srt **but it doesn't work

#

anyone can help me ?

autumn bolt Jul 10, 2023, 6:21 AM

#

open ai is a company that makes ai models and stuff like that

tardy kettle Jul 10, 2023, 1:53 PM

#

Apple should use this tech for their dictation function on the keyboard.

#

Prove me wrong

echo ridge Jul 10, 2023, 3:37 PM

#

When does whisper come to code interpreter ?

#

So I can ask for transcript of my audio files directly from chatgpt

fair parrot Jul 11, 2023, 4:33 AM

#

echo ridge When does whisper come to code interpreter ?

That’s a great idea. Maybe there will be a plug-in or something

#

Idk if that is even possible though

proud kite Jul 12, 2023, 3:11 AM

#

When I submit my recording , the output is error, instead of "transafjkl" that I typed.

How to solve this, Thank you.

untold crystal Jul 12, 2023, 4:35 PM

#

Hey! My teacher told me to use this command line arguments from whisper, is he correct?

whisper --model medium --language Spanish --output_format {txt,vtt}

small juniper Jul 12, 2023, 7:16 PM

#

Hey My teacher told me to use this

#

I tried to post a link, but cannot. You can google for this: "Audio Course from HuggingFace". It uses Whisper.

autumn bolt Jul 13, 2023, 3:18 AM

#

openai_logo

golden summit Jul 13, 2023, 1:38 PM

#

how possible is it to have a video who you can talk to? sort of chatgpt in the form of a video?

small juniper Jul 13, 2023, 3:04 PM

#

how possible is it to have a video who

clever barn Jul 13, 2023, 10:20 PM

#

is ti possible to touch a stray cat

#

it*

copper ridge Jul 16, 2023, 6:18 AM

#

What is whisper?

cerulean flint Jul 16, 2023, 7:27 AM

#

untold sparrow ``whisper --model medium.en "audio.mp3" --output_format txt srt``

try only one

frail cloak Jul 16, 2023, 11:08 AM

#

Where do I access whisper?

fierce marsh Jul 16, 2023, 11:11 AM

#

Hello 👋

#

How to create image's in using to chat Gpt ?

frail cloak Jul 16, 2023, 11:11 AM

#

fierce marsh Hello 👋

Do you know Where do I access whisper?

fierce marsh Jul 16, 2023, 11:12 AM

#

frail cloak Do you know Where do I access whisper?

I don't know 😔

#

What is whisper?

frail cloak Jul 16, 2023, 11:12 AM

#

fierce marsh I don't know 😔

Ok thx for your help by the way you can make images on Dall E 2 website or bing

frail cloak Jul 16, 2023, 11:12 AM

#

fierce marsh What is whisper?

An ai voice generator I think

fierce marsh Jul 16, 2023, 11:14 AM

#

frail cloak Ok thx for your help by the way you can make images on Dall E 2 website or bing

Ok there is not available to create automatic With AI ?

fierce marsh Jul 16, 2023, 11:14 AM

#

frail cloak Ok thx for your help by the way you can make images on Dall E 2 website or bing

Thanks 👍

winged bolt Jul 16, 2023, 2:47 PM

#

/what was the beginning of the ford motors company

manic coral Jul 16, 2023, 10:21 PM

#

fierce marsh How to create image's in using to chat Gpt ?

You don't, That's Dall-E

manic coral Jul 16, 2023, 10:21 PM

#

fierce marsh What is whisper?

Whisper is voice to text, it's an API

manic coral Jul 16, 2023, 10:22 PM

#

frail cloak An ai voice generator I think

Nope.

manic coral Jul 16, 2023, 10:22 PM

#

winged bolt /what was the beginning of the ford motors company

Might want to use google as there are no bots on this server for you to make a command to (Except I guess Modmail counts in a way)

frail cloak Jul 16, 2023, 10:24 PM

#

manic coral Nope.

Ok

fierce marsh Jul 17, 2023, 1:35 AM

#

Hello hello how are you

autumn bolt Jul 17, 2023, 8:36 AM

#

hey guys im starting a community based server VORA-AI for AI development/showcasing and i need mods/staff

small juniper Jul 17, 2023, 3:08 PM

#

Let's reduce the noise in this channel, shall we? Suggestions:

Questions that Google and ChatGPT can easily answer, I expect to be ignored here.
Nonsensical and off-topic statements like "/what was the beginning of the ford motors company" should not happen.
Ask a complete question so the topic is clear. When answering a question, start a new thread on that question.

autumn bolt Jul 17, 2023, 6:00 PM

#

small juniper Let's reduce the noise in this channel, shall we? Suggestions: - Questions tha...

Are u a mod? Smh

quartz jungle Jul 17, 2023, 9:02 PM

#

Hi
So I have a folder with 45 audio files in MP3 format. I want to transcribe them to text using OpenAI Whisper API. I have an API key. After transcribing them, I want to take the entire output, and put it into a text file.

Please tell me the best way to do this.

livid mauve Jul 17, 2023, 11:34 PM

#

autumn bolt Are u a mod? Smh

server staff will have the "community volunteer" role

queen scarab Jul 17, 2023, 11:41 PM

#

quartz jungle Hi So I have a folder with 45 audio files in MP3 format. I want to transcribe t...

ChatGPT should be able to help you with, right?

manic coral Jul 17, 2023, 11:58 PM

#

quartz jungle Hi So I have a folder with 45 audio files in MP3 format. I want to transcribe t...

Here's one of my whisper programs

import openai
from pprint import pprint

openai.api_key = os.getenv("OPENAI_API_KEY")

# Prompt for user input
file_path = input("Enter the audio file path: ")
prompt = input("Enter the prompt text (optional): ")
response_format = input("Enter the response format (optional, defaults to json): ")
language = input("Enter the language of the input audio (optional): ")

# Open the audio file in binary mode
audio_file = open(file_path, "rb")

# Set the model ID
model_id = "whisper-1"

# Set the temperature to 0 if it is not provided
temperature = 0

# Call the API with user-input variables
transcript = openai.Audio.transcribe(model_id, audio_file, prompt=prompt, response_format=response_format, temperature=temperature, language=language)

pprint(transcript)```
This just prints the transcription to the screen you can modify it to instead save it to a file.

quartz jungle Jul 18, 2023, 10:14 AM

#

queen scarab ChatGPT should be able to help you with, right?

I tried to make it write a script in code interpreter, but it didnt work. And im not a coder so I dunno how to

quartz jungle Jul 18, 2023, 10:15 AM

#

manic coral Here's one of my whisper programs ```import os import openai from pprint import ...

Is there any GUI app that can do this in bulk?

cerulean flint Jul 18, 2023, 11:15 AM

#

quartz jungle Is there any GUI app that can do this in bulk?

you can use Google Collab for it

manic coral Jul 18, 2023, 2:02 PM

#

Not to my knowledge but if you want, I can write one this afternoon.

quartz jungle Jul 18, 2023, 4:12 PM

#

Not to my knowledge but if you want I

plucky palm Jul 18, 2023, 5:40 PM

#

how do i output as timestamps?

unique flare Jul 18, 2023, 9:05 PM

#

Is this available for plus subscribers yet,?

manic coral Jul 18, 2023, 9:08 PM

#

unique flare Is this available for plus subscribers yet,?

Whisper is an API, there is no waitlist, but you must write your own program to use it

calm ravine Jul 19, 2023, 6:05 AM

#

please advice about community role please.

livid mauve Jul 19, 2023, 6:06 AM

#

what roles are you asking about specifically?

alpine nacelle Jul 19, 2023, 6:29 AM

#

psst

calm ravine Jul 19, 2023, 9:41 AM

#

i ask the same part of the question, how do they look like and who are they..? showing exploration of neurological netwrking system.

#

which role Admin want to gave ..because i am Business information system and an engineer alieas plant manager.

rapid glen Jul 19, 2023, 10:15 AM

#

I hate you

untold crystal Jul 19, 2023, 2:12 PM

#

manic coral Here's one of my whisper programs ```import os import openai from pprint import ...

Hey! My teacher told me to use this command line arguments from whisper, is he correct?

whisper --model medium --language Spanish --output_format {txt,vtt}

wanton mica Jul 19, 2023, 3:40 PM

#

Gay

manic coral Jul 19, 2023, 5:39 PM

#

untold crystal Hey! My teacher told me to use this command line arguments from whisper, is he c...

Absolutely not, whisper is an api, it does not work that way.

#

(There is no whisper exec file)

untold crystal Jul 19, 2023, 5:45 PM

#

manic coral (There is no whisper exec file)

what is exec file?

remote hedge Jul 19, 2023, 6:35 PM

#

Working on my first python project. I put the code and the output in here: https://justpaste.it/a6j5p
The goal is to run Whisper to transcript the audio and save it as a .txt with the same name. Currently there is no txt file, even though the output says so. According to the error messages there might be something wrong with ffmpeg but when I run Whisper in CLI the transcription works so I don't think that's the problem

remote hedge Jul 19, 2023, 7:43 PM

#

could it be a permission issue?

thorny phoenix Jul 19, 2023, 11:04 PM

#

is whisper better than elevenlabs?

thorny phoenix Jul 19, 2023, 11:04 PM

#

remote hedge could it be a permission issue?

hi, ubuntu user

clever basin Jul 19, 2023, 11:37 PM

#

whisper is really good, definitely better than all other speech to text things

alpine tendon Jul 19, 2023, 11:55 PM

#

remote hedge could it be a permission issue?

Why whisper when you can say it out loud and be proud?

cerulean flint Jul 20, 2023, 1:44 AM

#

remote hedge Working on my first python project. I put the code and the output in here: http...

try this if possible:
(change folder etc. ofc)

autumn bolt Jul 20, 2023, 3:04 AM

#

Hey so, I have never used AI before. However, just watched the Iron Man again movies and honestly I'm convinced I need to spend time and try and make my own for around my home. However, I genuinely have no idea where to start. I have experience in some coding, java and javascript. But, if a new language is needed I'm willing to invest. I think what I'd want the AI to be able to do proceeds as:

Control lights
Control Music
Regular ChatGPT for questions/talk
Book events
Control TV
Control Camera (not really sure how this would work, but, I could say turn on garage camera, and it'd pull up on the screen)

Those are just the ones I can think of right now, but, if AI can learn, could it learn who people are? For example, "Charlie has just walked into the house", or if I asked, where is the dog, they could say "The dog is outside". Because of inspiration, id want it to have a Jarvis voice (but who knows, maybe Ill want it different for my own style).

Another question I was curious about, is if prompted correctly could AI just code this for me? not sure how this works and half of this is even possible. But please lmk!

calm ravine Jul 20, 2023, 4:13 AM

#

A.I is a helping tools, It's depends on the roots they enquired.

cerulean flint Jul 20, 2023, 10:08 AM

#

autumn bolt Hey so, I have never used AI before. However, just watched the Iron Man again mo...

all of this stuff here is no AI per definition, learning algorithms that are very limited in adding knowledge above their initial data. But it is not self-aware and not yet able to "evolve"

thorny phoenix Jul 20, 2023, 11:52 AM

#

autumn bolt Hey so, I have never used AI before. However, just watched the Iron Man again mo...

python

#

also learn about prompt engineering

valid wharf Jul 20, 2023, 10:05 PM

#

Hey, pssst

#

im whispering to you

#

shhhh

valid wharf Jul 20, 2023, 10:07 PM

#

autumn bolt Hey so, I have never used AI before. However, just watched the Iron Man again mo...

Wait you’re planning to have something like Jarvis in your house?

autumn bolt Jul 21, 2023, 7:32 AM

#

valid wharf Wait you’re planning to have something like Jarvis in your house?

Yes

tawdry dagger Jul 21, 2023, 12:56 PM

#

Heyy, just wanted to ask if there is a way to run faster-whisper without the python? Cause I'm using command prompt and am not quite sure on how to convert...

remote hedge Jul 21, 2023, 2:29 PM

#

How does whisper handle multiple languages being spoken in a file? I'm trying to transcribe subs for my wedding video but my family and my wife's family speak different languages

remote hedge Jul 21, 2023, 2:41 PM

#

remote hedge How does whisper handle multiple languages being spoken in a file? I'm trying to...

From my first experiments it handles it rather poorly. Or am I forgetting some setting that improves it?

chrome bloom Jul 21, 2023, 8:38 PM

#

thorny phoenix also learn about prompt engineering

i think you mean proompt

mortal canyon Jul 21, 2023, 11:18 PM

#

guysssss shhhhh whisper🤫🤫🤫🤫

#

no I think they meant prompt not proompt

manic coral Jul 21, 2023, 11:55 PM

#

remote hedge How does whisper handle multiple languages being spoken in a file? I'm trying to...

I would probably try chirp if you have access


Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. Chirp achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages.```
```Chirp is available through the Cloud Speech-to-Text API. The API lets you do inference for transcription against the Chirp model```

faint dawn Jul 22, 2023, 6:38 PM

#

guys

#

can someone please help

#

it's kinda urgent

#

I am using Whisper API and trying to simply transcirbe a text from a voice message that is in English

#

but for a strange reason the text gets transcribed to Greek

#

meanwhile I have put NOWHERE for Greek

#

can someone help?

tawdry dagger Jul 22, 2023, 8:14 PM

#

faint dawn can someone help?

Maybe specify that the language is English through your model?

spice grail Jul 23, 2023, 12:51 PM

#

How we use Whisper I don't understand

manic coral Jul 24, 2023, 2:04 AM

#

import openai
from pprint import pprint

openai.api_key = os.getenv("OPENAI_API_KEY")

# Prompt for user input
file_path = input("Enter the audio file path: ")
prompt = input("Enter the prompt text (optional): ")
response_format = input("Enter the response format (optional, defaults to json): ")
language = input("Enter the language of the input audio (optional): ")

# Open the audio file in binary mode
audio_file = open(file_path, "rb")

# Set the model ID
model_id = "whisper-1"

# Set the temperature to 0 if it is not provided
temperature = 0

# Call the API with user-input variables
transcript = openai.Audio.transcribe(model_id, audio_file, prompt=prompt, response_format=response_format, temperature=temperature, language=language)

pprint(transcript)```

thin fox Jul 24, 2023, 7:59 AM

#

hello all,

are there any Ai researchers here? working on custom models in text or image generation?

prime cradle Jul 24, 2023, 8:50 AM

#

  File "docs.py", line 5, in <module>
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
AttributeError: module 'openai' has no attribute 'Audio'```

I am getting this error while trying to copying and pasting the basic api reference from the docs

autumn bolt Jul 25, 2023, 10:18 AM

#

🙂

frozen spoke Jul 25, 2023, 5:36 PM

#

That wasn't a different language... Why did it delete my message?

#

Can whisper gain IPA as a language?

mossy yacht Jul 25, 2023, 7:00 PM

#

fascinate

valid kernel Jul 25, 2023, 7:08 PM

#

я

lapis coyote Jul 26, 2023, 11:32 PM

#

prime cradle ```Traceback (most recent call last): File "docs.py", line 5, in <module> ...

Which version of openai package are you using? I'm on 0.27.0 and dir(openai.Audio) works fine

lapis coyote Jul 26, 2023, 11:52 PM

#

manic coral ```import os import openai from pprint import pprint openai.api_key = os.getenv...

In my testing, the language setting has no effect. The API acted exactly like where is no language parameter. What's you experience?

lapis coyote Jul 26, 2023, 11:53 PM

#

faint dawn but for a strange reason the text gets transcribed to Greek

One way might work is to feed some prompt in English into the call.

manic coral Jul 27, 2023, 12:39 AM

#

lapis coyote In my testing, the language setting has no effect. The API acted exactly like wh...

I have no means to answer that, I've not used it for anything multilingual

quiet mist Jul 27, 2023, 7:56 AM

#

ornate trellis Jul 27, 2023, 7:58 AM

#

I'm facing an issue I'm new to this

import whisper
from typing import Annotated
from fastapi import FastAPI, File

app = FastAPI()

@app.post("/ta")
async def transcribe_audio(audio_file_upload: Annotated[bytes, File()]):
    model = whisper.load_model("base")
    result = model.transcribe(audio_file_upload, word_timestamps=True, fp16=True)
    return {"res": result}

I'm getting this error

 File "N:\audio-to-text\venv\Lib\site-packages\whisper\audio.py", line 131, in log_mel_spectrogram
    audio = torch.from_numpy(audio)
            ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected np.ndarray (got bytes)

steep monolith Jul 27, 2023, 8:42 AM

#

I have been using the default free Google speech-to-text tool for a mobile app. Is there any advantage to using Whisper?

ornate trellis Jul 27, 2023, 8:44 AM

#

steep monolith I have been using the default free Google speech-to-text tool for a mobile app. ...

not sure

steep monolith Jul 27, 2023, 8:45 AM

#

I'm only using it for short bits of text, so maybe I would run into issues with more extensive conversions

tacit bolt Jul 27, 2023, 8:45 AM

#

do open ai playground and azure playground give different responses for the same prompt?

steep monolith Jul 27, 2023, 8:46 AM

#

I assume OpenAI think Whisper fills a niche that the Google version doesn't fill... Maybe the niche is other-than-mobile settings.

cerulean flint Jul 27, 2023, 9:30 AM

#

steep monolith I assume OpenAI think Whisper fills a niche that the Google version doesn't fill...

i transcribe multi-hour .mp3 with whisper, pretty good results in germand and english, and i can work several files one after another - works well for me

steep monolith Jul 27, 2023, 9:57 AM

#

Is it expensive?

fading scroll Jul 27, 2023, 1:24 PM

#

sup

dim vigil Jul 27, 2023, 8:49 PM

#

do i need to pay to use whisper?

prisma sinew Jul 27, 2023, 9:07 PM

#

dim vigil do i need to pay to use whisper?

The api has a cost. Check the openai pricing page. I think it’s like 2 cents a minute? But don’t quote me I forget. Check the page

stone nebula Jul 27, 2023, 11:58 PM

#

so i've been using whisper to get transcriptions from video clips i'm processing, and i've noticed an interesting quirk at the end of some of the transcriptions
[TRANSCRIPTION] It's the summer's biggest sensation. And now it all comes down to this. The Dancing with the Stars Grand Finale. Underdog Kelly Monaco The eye of the tiger, baby. Takes on favorite John O'Hurley. Be afraid. Be very afraid. The judges score and your vote decides the champion. Live. Plus, all the stars reunite for one final encore. It's the night America has been waiting for. Dancing with the Stars Grand Finale. Wednesday, 9, 8 Central. Only on ABC. Subs by www.zeoranger.co.uk

fading scroll Jul 28, 2023, 2:39 AM

#

duh

jaunty canopy Jul 28, 2023, 6:40 AM

#

Can I get the transcript in 30 sec batches, rather than the entire text, via whisper APIs?

lapis coyote Jul 28, 2023, 4:57 PM

#

stone nebula so i've been using whisper to get transcriptions from video clips i'm processing...

It happened from time to time when I asked it to transcribe short audio clips with fuzzy voice. I think that's one built-in flaw of the machine learning approach Whisper is using. My understanding is that Whisper is using a variation of the generative AI similar to ChatGPT, not the usual voice recognition ML models used by other providers. It's pretty unique. But it could lead to the problem that it's actively trying to figure out "what's next" in the process, which led to the "next" being the somehow popular websites. At least that my explanation based on my experience. It's so much so that I built code logic to hadle this situation correctly.

stone nebula Jul 28, 2023, 7:11 PM

#

lapis coyote It happened from time to time when I asked it to transcribe short audio clips w...

Makes sense, especially if it was trained with publicly available subtitles. My code uses the transcription and other data to generate a summary of the video clip, so it hasn't been problematic for me

#

Interestingly enough that doesn't seem to be an active website

untold crystal Jul 28, 2023, 11:00 PM

#

whisper needs actualization?

mighty fossil Jul 29, 2023, 7:09 PM

#

help meee!!!

#

why is my api key not working

#

today i finnaly made a project that i copied from youtube but my key is not working

#

it says you need tokens

#

whats that??

#

anyone?

young willow Jul 29, 2023, 11:05 PM

#

mighty fossil it says you need tokens

maybe you need to setup the billing plan?

mighty fossil Jul 29, 2023, 11:05 PM

#

How can I do that?

#

It asks for company name and stuff like that

young willow Jul 29, 2023, 11:06 PM

#

mighty fossil How can I do that?

On the OpenAI API section

mighty fossil Jul 29, 2023, 11:06 PM

#

I went there and also went on the pricing part. It asks for my role and the company h work for

young willow Jul 29, 2023, 11:07 PM

#

You have to setup the information

#

And you have to pay for use the API

shadow vapor Jul 30, 2023, 1:33 AM

#

Hey guys, can someone tell me if this is possible with whisper?
I want to use whispers language ID on code-switching speech, that is switching between languages mid or between sentences. Ultimately, I would like to make timestamps of when languages are used. For example, (0,-1 seconds) English -> (1-2) Spanish ->.... etc
Can this be done?

loud plinth Jul 30, 2023, 8:47 PM

#

Might OpenAI remove the Try In Playground link from the product page under Whisper? There is no playground option for Whisper. Why tease people into going to look for something that we know doesn't exist? Thanks.

untold crystal Aug 6, 2023, 2:16 PM

#

Question; "Will Whisper need an update at some point, or does it remain unupdated?"

#

ok mods can someone explain why the timeout? I was writing in english...

#

#

Do you know why its say me error? whisper: error: argument --output_format/-f: invalid choice: '{txt,vtt}' (choose from 'txt', 'vtt', 'srt', 'tsv', 'json', 'all')

#

This is the code:
whisper Histologia_general_teo_sem_1 --model medium --language Spanish --output_format {txt,vtt}

plush mulch Aug 6, 2023, 5:24 PM

#

untold crystal This is the code: `whisper Histologia_general_teo_sem_1 --model medium --languag...

try this: whisper .\Histologia_general_teo_sem_1 --model medium --language Spanish --output_format txt --output_format vtt --task transcribe

fierce surge Aug 6, 2023, 7:35 PM

#

Do i have to upgrade to gptplus in order to use whisper?

#

Ok. I got the answer by reading previous chat. I have to update.

fierce surge Aug 6, 2023, 7:40 PM

#

prisma sinew The api has a cost. Check the openai pricing page. I think it’s like 2 cents a m...

0.006 dollars/minute

untold crystal Aug 6, 2023, 9:10 PM

#

plush mulch try this: ```whisper .\Histologia_general_teo_sem_1 --model medium --language Sp...

yessss

#

to late to watch it

#

i resolved it later

#

but thanks man

#

is like

#

1 month

#

i didnd know how to do it

untold crystal Aug 6, 2023, 9:11 PM

#

plush mulch try this: ```whisper .\Histologia_general_teo_sem_1 --model medium --language Sp...

i only use; whisper .\Histologia_general_teo_sem_1 --model medium --language Spanish --output_format txt --output_format vtt

#

i didnt use --task transcribe and it worked

#

why should i need to use --task transcribe?

#

pd: sorry for the bad english

magic pollen Aug 7, 2023, 4:57 AM

#

Is it possible for Whisper to distinguish different speakers that appear in the same audio file?

weary skiff Aug 7, 2023, 11:26 AM

#

magic pollen Is it possible for Whisper to distinguish different speakers that appear in the ...

No. you need to use another model to do this task.

plush mulch Aug 7, 2023, 3:17 PM

#

untold crystal why should i need to use --task transcribe?

transcribe is to transcribe the video with the given language or auto detected language.
translate is to translate it to english.

seems like the default setting is transcribe

untold crystal Aug 7, 2023, 10:32 PM

#

plush mulch transcribe is to transcribe the video with the given language or auto detected l...

yes is the default

#

do you think is necesaraly?

#

it would improve the transcribtion?

#

or not?

#

btw

#

do you use--patience PATIENCE?

willow geyser Aug 9, 2023, 6:10 PM

#

has anyone used Whisper to live-transcribe and translate a panel discussion IRL?

storm geyser Aug 9, 2023, 11:48 PM

#

When whisper can identify different speakers is when I'll pay for it

untold crystal Aug 10, 2023, 1:16 PM

#

storm geyser When whisper can identify different speakers is when I'll pay for it

bro is free 🤣

storm geyser Aug 10, 2023, 1:19 PM

#

untold crystal bro is free 🤣

I suspect they'll bill for it when it can do more than translate voice to text

untold crystal Aug 10, 2023, 1:24 PM

#

untold crystal ok mods can someone explain why the timeout? I was writing in english...

well thats a posible future

glass mirage Aug 10, 2023, 9:57 PM

#

willow geyser has anyone used Whisper to live-transcribe and translate a panel discussion IRL?

im wondering over the same thing

#

you could technically cut the live recorded mp3 in word pieces and upload it to the server to be transcribed

#

and do these couple of things simultaneously using multithreading

#

however i dont see it being efficient enough to be a viable option especially on mobile

#

so no clue what could be used instead

#

for example assebly.ai is an amazing api but also pricey as hell

#

does anyone know any relatively good apis for this but without some ridiculous prices that no client would stand?

lethal narwhal Aug 12, 2023, 2:19 AM

#

I found a whisper app on the Google Play Store to dictate anything using a custom keyboard, is there is a similar app on the App Store for Apple phones?

autumn bolt Aug 13, 2023, 7:38 PM

#

Anyone familiar with CoreML?

true lotus Aug 14, 2023, 10:49 AM

#

Hey so does anyone here know where the bot channel is?

tender rivet Aug 14, 2023, 12:22 PM

#

true lotus Hey so does anyone here know where the bot channel is?

there is no chatbots on this discord

untold crystal Aug 14, 2023, 2:46 PM

#

whisper --model medium --language Spanish --output_format txt --output_format vtt

#

i used this and the whisper olny gave me format vtt, you know why?

tender rivet Aug 14, 2023, 2:59 PM

#

--output_format vtt 🤔

untold crystal Aug 15, 2023, 1:03 PM

#

tender rivet ``--output_format vtt`` 🤔

@tender rivet i think i didnt emphasise, but i want to give me --output_format txt --output_format vtt and only gave me --output_format vtt. Do you know why it dindt give me --output_format txt?

finite frost Aug 15, 2023, 3:36 PM

#

Hi guys, On iOS, chatgpt transcription feature through whisper translates my English to Russian sometimes automatically, possibly because my accent is Russian. Honestly the fact that it's possible is amazing, but I would prefer to control it. Does anyone know what happens and how to control it? Also, is there a dedicated app using whisper specifically for translation?

bleak marten Aug 16, 2023, 8:09 PM

#

To regain control explore the settings or preferences within the app that uses Whisper for transcription.

finite frost Aug 16, 2023, 10:02 PM

#

it's official chatgpt app

muted axleBOT Aug 18, 2023, 8:21 PM

#

Attention Off-topic chats have been cleared by a moderator.

plush mulch Aug 19, 2023, 7:45 PM

#

untold crystal <@207888046647934978> i think i didnt emphasise, but i want to give me --output_...

seems like only one output_format is available as output.
you can use this GUI to save the output in 5 different file formats without the need to run it multiple times, also you can directly save the subtitle to the video hardcoded or as .mkv file
https://github.com/meeksqueal/OpenAI-Whisper-GUI

GitHub

GitHub - meeksqueal/OpenAI-Whisper-GUI: A modern GUI application th...

A modern GUI application that transcribes and translates audio and video files, offering the option to save the subtitles as separate files, embed the subtitles in a .mkv format, or hardcode them i...

tardy pollen Aug 23, 2023, 1:46 AM

#

I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and the cost is immense. Fine, whatever. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special PyTorch-Cuda image so I'm extremely confused.

cerulean flint Aug 23, 2023, 1:53 AM

#

tardy pollen I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and ...

as long as you define cuda it should be fine

#

try using Collab free tier

tardy pollen Aug 23, 2023, 1:54 AM

#

I don't care about free tier, this is on the corpo dime.

cerulean flint Aug 23, 2023, 1:54 AM

#

i use it nearly everyday, works

tardy pollen Aug 23, 2023, 1:54 AM

#

How long would it take whisper to run through a 4GB file on an Nvidia A100 40GB?

cerulean flint Aug 23, 2023, 1:54 AM

#

not sure it can handle 4GB at all once

tardy pollen Aug 23, 2023, 1:54 AM

#

Does the GPU actually after its run time?

#

*affect

cerulean flint Aug 23, 2023, 1:55 AM

#

i part my large files into smaller ones

tardy pollen Aug 23, 2023, 1:55 AM

#

So the problem is my file is too big?!

#

that makes no sense to me I guess

cerulean flint Aug 23, 2023, 1:56 AM

#

i would try 1GB parts

#

ok

tardy pollen Aug 23, 2023, 1:56 AM

#

why would that matter, I'm just genuinely lost is all.

cerulean flint Aug 23, 2023, 1:56 AM

#

then i guess you should figure it out

tardy pollen Aug 23, 2023, 1:56 AM

#

no no I hear you -- smaller files. But why?

cerulean flint Aug 23, 2023, 1:57 AM

#

i don't know, all i can say is that i transcribed hundreds of .mp3 so far and Whisper has ad problems with bigger files (hours of interviews) one i parted it into smaller segments - ir runs smoothly

tardy pollen Aug 23, 2023, 1:58 AM

#

alright

#

Have you ever run it on a 40GB VRAM GPU instance though lol?

cerulean flint Aug 23, 2023, 1:58 AM

#

nope

tardy pollen Aug 23, 2023, 1:58 AM

#

I mean this is a $10K USD card?! It costs $5/hr to run in the cloud

#

That's why I'm confused. It's like running Whisper on a GPU doesn't matter or something. And the machine image I used was specifically from GCP with PyTorch-CUDA preinstalled for python 3.10.0. I mean pretty straightforward. Unless running it on a GPU doesn't actuall do anything for transcription 🤷‍♂️

cerulean flint Aug 23, 2023, 2:00 AM

#

that works for me

tardy pollen Aug 23, 2023, 2:01 AM

#

oh. this is from a notebook?

cerulean flint Aug 23, 2023, 2:01 AM

#

#

yes

tardy pollen Aug 23, 2023, 2:01 AM

#

can you export the script and upload it here? I'm a low-tech kind of guy that tends to do things from cli's and what not.

#

please

cerulean flint Aug 23, 2023, 2:02 AM

#

maybe later, need to finish here & heading to work

tardy pollen Aug 23, 2023, 2:02 AM

#

allllright

#

that script doesn't partition the file though

#

thanks

tardy pollen Aug 23, 2023, 12:45 PM

#

I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and the cost is immense. Fine, whatever. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special PyTorch-Cuda image so I'm extremely confused.

lapis jacinth Aug 23, 2023, 2:02 PM

#

🎙️ Whisper's Response to Blank Inputs - A Quirk?

Hi all! 👋

I noticed that when Whisper receives "blank" audio (no spoken words), it transcribes phrases like "Thank you for watching." Even OpenAI's ChatGPT mobile apps do the same.

Has anyone else seen this? Is it a known issue, or is there a workaround? It's intriguing but could be challenging in some scenarios.

Your insights would be super helpful!

unborn gust Aug 23, 2023, 5:20 PM

#

anyone having issues with the API today? I'm having extra errors and also responses that are not the full file.

untold crystal Aug 23, 2023, 9:33 PM

#

tardy pollen I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and ...

good question, i have a 3060 gpu laptop and only keep medium, its that ok?

untold crystal Aug 23, 2023, 9:33 PM

#

lapis jacinth 🎙️ **Whisper's Response to Blank Inputs - A Quirk?** Hi all! 👋 I noticed tha...

yes, not only Thank you for watching it use anothers phrases too

untold crystal Aug 23, 2023, 9:34 PM

#

lapis jacinth 🎙️ **Whisper's Response to Blank Inputs - A Quirk?** Hi all! 👋 I noticed tha...

and why you talk like chat gpt 😆

lapis jacinth Aug 23, 2023, 11:15 PM

#

untold crystal yes, not only Thank you for watching it use anothers phrases too

Interestingly, they're all "YouTube-isms", hinting at the data on which it was trained. 🍿
I have also seen "Thanks for watching", "Subtitles by...", and so on. I'm just waiting for it to say "Don't forget to like and subscribe!". 🤣

untold crystal Aug 23, 2023, 11:16 PM

#

lapis jacinth Interestingly, they're all "YouTube-isms", hinting at the data on which it was t...

youre not going to believe me, in spanish says: suscribete!
suscribete!
suscribete!
suscribete! jajajja

#

now the mononeural of the bot is going to put me timeout

untold crystal Aug 24, 2023, 12:57 PM

#

?

#

is the bot is with a mod

#

or is only a bot?

#

i want to report this to ai

#

#

guys whisper can use also gpu shared memory or only gpu dedicated?

tardy pollen Aug 24, 2023, 2:44 PM

#

I'm running whisper on a GCP Invidia A100 GPU 40GB. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special Cuda image meant precisely for my use-case. What am I missing here?

quartz hull Aug 28, 2023, 2:42 PM

#

Python 3.11.5
:/home/container$ if [[ -d .git ]] && [[ "${AUTO_UPDATE}" == "1" ]]; then git pull; fi; if [[ ! -z "${PY_PACKAGES}" ]]; then pip install -U --prefix .local ${PY_PACKAGES}; fi; if [[ -f /home/container/${REQUIREMENTS_FILE} ]]; then pip install -U --prefix .local -r ${REQUIREMENTS_FILE}; fi; /usr/local/bin/python /home/container/${PY_FILE}
Collecting openai-whisper
Using cached openai-whisper-20230314.tar.gz (792 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting triton==2.0.0 (from openai-whisper)
Downloading triton-2.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 46.7/63.3 MB 47.8 MB/s eta 0:00:01ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 47.1/63.3 MB 35.0 MB/s eta 0:00:01

How much space does whisper need?

#

I have like 2gbs on my vm

#

But still error

verbal estuary Aug 28, 2023, 8:18 PM

#

[413] Maximum content size limit (26214400) exceeded (26247606 bytes read)

so 25 MB is the max limit? is there a way to upgrade this limit?

vast summit Aug 30, 2023, 11:27 AM

#

Hi! Anyone know how I can use Whisper to dictate sms messages on ios?

near yew Aug 31, 2023, 7:07 AM

#

vast summit Hi! Anyone know how I can use Whisper to dictate sms messages on ios?

Whisper is speech to text, not text to speech

tender rivet Sep 1, 2023, 2:23 PM

#

please use #1050184247920562316

waxen dew Sep 2, 2023, 11:04 PM

#

near yew Whisper is speech to text, not text to speech

oh, bummer..

vagrant glade Sep 3, 2023, 9:58 AM

#

anyone here that has used whisper... is it able to interpret background noise? for example, could it delineate between crowd cheering and crowd booing?

autumn bolt Sep 4, 2023, 11:35 PM

#

Hey, any good APIs for using whisper in c#?

wide drift Sep 7, 2023, 11:38 PM

#

Heys guys, I am trying to create a real time voice reader but my code is not working.

cinder saddle Sep 9, 2023, 12:04 PM

#

wide drift Heys guys, I am trying to create a real time voice reader but my code is not wor...

If you use python i can help

autumn bolt Sep 9, 2023, 7:28 PM

#

wide drift Heys guys, I am trying to create a real time voice reader but my code is not wor...

Language?

humble hare Sep 9, 2023, 11:55 PM

#

Hey guys this is a thanks from me and my team to any contributors to whisper. This library basically converted a month's work to a few days.

humble hare Sep 9, 2023, 11:58 PM

#

wide drift Heys guys, I am trying to create a real time voice reader but my code is not wor...

if u are using python i can help u. i have made a similar project

wide drift Sep 10, 2023, 12:03 AM

#

Can you put the repo

humble hare Sep 10, 2023, 12:05 AM

#

sorry but the project is for an upcoming hackathon so i cannot share the repo as of right now

wide drift Sep 10, 2023, 2:57 AM

#

When it will be?

humble hare Sep 10, 2023, 12:24 PM

#

it will take some time bro I believe at the end of this year

surreal hull Sep 14, 2023, 5:32 PM

#

@humble hare bro I need your help with real time voice transcription

humble hare Sep 14, 2023, 5:32 PM

#

surreal hull <@834488209630560298> bro I need your help with real time voice transcription

How can i help?

surreal hull Sep 14, 2023, 5:36 PM

#

Can you share the idea?

humble hare Sep 14, 2023, 5:47 PM

#

surreal hull Can you share the idea?

i will send a screenshot in pm

tender rivet Sep 16, 2023, 4:26 PM

#

You are in the right place, check the oai lib for js or python and the code examples on the docs

half anvil Sep 17, 2023, 3:27 PM

#

Hey guys, has anyone tried hosting whisper large model on firebase cloud functions.

glad nova Sep 17, 2023, 6:33 PM

#

Hmm..

fair sapphire Sep 18, 2023, 11:08 PM

#

Hey so I'm trying to use whisper-1 speech to text to generate a transcription from audio in Portuguese using an audio/ogg file as a NodeJS Buffer, but when hitting the endpoint I get an error message that reads as follows: {"message":"","type":"server_error","param":null,"code":null}. This does not really tell me what the issue is, so I came here to ask for help

fair sapphire Sep 18, 2023, 11:54 PM

#

attempting to use Blob instead of Buffer returns an error saying file parameter does not exist, but it does

#

also fix your automod cuz I got timed out for sending an image of code written in english

fair sapphire Sep 19, 2023, 10:16 PM

#

fair sapphire Hey so I'm trying to use whisper-1 speech to text to generate a transcription fr...

Made the request using built in fetch function instead of the openai library and that fixed it, so idk what the issue could be

astral shell Sep 20, 2023, 7:22 AM

#

Is there any way to get timestamp for each word and not only the segment?

thorn echo Sep 20, 2023, 9:34 PM

#

is possible to identify different voices and have it tagged a different person?

full sage Sep 23, 2023, 3:31 AM

#

thorn echo is possible to identify different voices and have it tagged a different person?

This is possible, but you would need to modify the model because that behavior is not in it's orgins.

thorn echo Sep 23, 2023, 4:04 AM

#

full sage This is possible, but you would need to modify the model because that behavior i...

Do you have more info on how I can do that?

humble hare Sep 23, 2023, 6:00 AM

#

Hey the latest version of whisper requires option parameter with srt_writer. How do i get this parameter?

full sage Sep 23, 2023, 11:13 PM

#

thorn echo Do you have more info on how I can do that?

Hey so good news, turns out it has already been done by some folks before. Checkout this github project https://github.com/MahmoudAshraf97/whisper-diarization

GitHub

GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Reco...

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper - GitHub - MahmoudAshraf97/whisper-diarization: Automatic Speech Recognition with Speaker Diarization based on OpenAI W...

bold canopy Sep 24, 2023, 9:27 AM

#

has anyone used the offline whisper version?
it works but for some reason it refuses to use my gpu and uses my cpu instead

UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")

any idea how to reconfigure it? (edited)

#

because of this it seems to take longer when transcribing

stiff remnant Sep 24, 2023, 11:35 AM

#

Has anybody seen information as to when we might see improvements to Whisper, particularly for smaller languages?

full sage Sep 26, 2023, 3:18 AM

#

Hello are you using Python to run the whisper model? Or just the command line? If you're using Python you can set fp16 to False and it would use FP32 instead

full sage Sep 26, 2023, 3:18 AM

#

bold canopy has anyone used the offline whisper version? it works but for some reason it ref...

here you go!

bold canopy Sep 26, 2023, 6:20 AM

#

thanks!, ill give it a try

lapis jacinth Sep 26, 2023, 1:19 PM

#

Any thoughts as to how OpenAI are getting such fast transcriptions in ChatGPT-V? ⚡
Are they using a new Whisper model not available to the public?

plain creek Sep 26, 2023, 1:20 PM

#

i have used whisper offline, make sure you have the right CUDA stuff installed so that it runs on your GPU (if you want that)

#

otherwise it won't be able to detect your GPU as a device

toxic iris Sep 26, 2023, 6:22 PM

#

lapis jacinth Any thoughts as to how OpenAI are getting such fast transcriptions in ChatGPT-V?...

This is just a guess: They probably do not use Whisper. They probably just use a STT model provided by e.g.: Google

lapis jacinth Sep 26, 2023, 6:26 PM

#

toxic iris This is just a guess: They probably do not use Whisper. They probably just use a...

I would be shocked if they weren't using their own SOTA ASR for STT. 🧐
Should be easy enough to identify the model once we get our hands on it. Whisper has its own idiosyncratic quirks.

Anyone been lucky enough to receive the ChatGPT upgrade yet?

toxic iris Sep 26, 2023, 6:32 PM

#

lapis jacinth I would be shocked if they weren't using their own SOTA ASR for STT. 🧐 Should...

Whisper is anything but fast though. Maybe they just use a private faster model. I doubt that though, cause they would surely commercialise it.

#

Also Whisper would have to run on their backend. (That's extra cost too.) They could just use the STT model integrated in Android/ IOS.

lapis jacinth Sep 26, 2023, 6:34 PM

#

toxic iris Whisper is anything but fast though. Maybe they just use a private faster model....

That's what I'm saying, they're doing something differently with ChatGPT's ASR. My guess is it's a tuned or newer model of Whisper.
There have been open source reimplementations of Whisper that are 4x as fast as OpenAI's initial release, so it wouldn't be hard to make it faster.

toxic iris Sep 26, 2023, 6:35 PM

#

Still, a lokally running model would be my go-to as a developer.

#

I'd probably use something like this: https://github.com/alphacep/vosk-android-demo

GitHub

GitHub - alphacep/vosk-android-demo: Offline speech recognition for...

Offline speech recognition for Android with Vosk library. - GitHub - alphacep/vosk-android-demo: Offline speech recognition for Android with Vosk library.

lapis jacinth Sep 26, 2023, 6:40 PM

#

For sure, remote Whisper transcriptions would add cost and latency. But the accuracy gains are worth the cost, and they're printing money anyway.
Android/iOS STT doesn't give high enough accuracy for conversational AI, IMO.
And remote transcriptions with Whisper can be close enough to real time to work. I should know, I've done it. 🎤

north sail Sep 27, 2023, 2:11 AM

#

full sage Hey so good news, turns out it has already been done by some folks before. Check...

Hey sorry to ping you but I've been trying to get the colab notebook to work but it's just riddled with errors. Do you happen to know how to fix it?

full sage Sep 27, 2023, 2:19 AM

#

north sail Hey sorry to ping you but I've been trying to get the colab notebook to work but...

What are the errors you're getting?

north sail Sep 27, 2023, 2:23 AM

#

full sage What are the errors you're getting?

So in the 17th block of code you'll get an error at the result_aligned = whisperx.align(... part.

After looking through the repo someone suggested you change the beam_size from 1 to 7.

Then after you fix that error you'll get stuck at the 21st block of code
https://github.com/MahmoudAshraf97/whisper-diarization/issues/69

The owner of the repo suggests adding pip uninstall nvidia-cudnn-cu11 but that's already been done at the very beginning of the notebook

GitHub

CUDNN_STATUS_VERSION_MISMATCH · Issue #69 · MahmoudAshraf97/whisper...

Hello again - I built on my past errors and installed the correct version of CUDA to get past where I was stuck at previously. However, I seem to have run into a new problem, specifically in the wa...

#

Forgive me for my lack of knowledge regarding coding but I've been trying to debug it myself for quite some time now...

full sage Sep 27, 2023, 3:51 AM

#

north sail So in the 17th block of code you'll get an error at the `result_aligned = whispe...

Hey sorry for the late response but someone already made a good suggestion on the github repo

!sudo apt upgrade
!sudo apt install cuda-11-8 -y
!sudo apt install nvidia-kernel-common-460 nvidia-driver-460```

Note: after testing this, it may take a while to work. But it will work regardless in the end

autumn bolt Sep 27, 2023, 9:07 PM

#

I am sending audio recordings to the OpenAI Whisper API and cannot get mobile recordings to accept past a few seconds of data, I have no idea why. Desktop audio recordings function perfectly fine but whenever I try on my phone the transcriptions only get a word or two

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header 'Authorization: BearerTOKEN' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@C:\Users\katra\Desktop\71753801708__8C36058A-E077-4000-B93D-901529FBD0AE (1).mp4' \
  --form model=whisper-1

north sail Sep 28, 2023, 4:46 AM

#

full sage Hey sorry for the late response but someone already made a good suggestion on th...

I haven't had the time to actually test out the fix yet but thank you!

Btw how good is NeMo diarization anyway? I've been trying out the pyannote diarization and it's really janky tbh

full sage Sep 28, 2023, 5:19 AM

#

north sail I haven't had the time to actually test out the fix yet but thank you! Btw how ...

Both NeMo and Pyannote diarization are similar because they use neural networks to segment and cluster audio recordings by speaker labels which you in the colab of whisper diarization where they prepare to convert the data into NeMo combatibility.

But NeMo diarization between the two gets more update and is backed by a huge corporation(Nvidia) and perhaps a wider community while Pyannote is relying on open-source, which many can finetune the model to be better at tasks.

Both overall, both are free to use.

north sail Sep 28, 2023, 5:48 AM

#

full sage Both NeMo and Pyannote diarization are similar because they use neural networks ...

That's interesting. Thanks for the explanation!

The best method I've found for transcribing (and translating) Japanese is using silero-VAD. It fixes Whisper's hallucinations over longer recordings and makes the timings far better than without it.

The thing is most repos/open source programs that have Whisper and utilize a VAD all use pyannote for diarization. I wish I knew how to code something with NeMo so I could test and see for myself if it would yield better results

full sage Sep 28, 2023, 6:09 AM

#

north sail That's interesting. Thanks for the explanation! The best method I've found for ...

Good insight, I will be sure to checkout silero-VAD. And I agree, we see many opensource projects using OpenAI Whisper model because it is pretty popular and known by many. Right now I am building a project using GPT4 and Whisper Tiny/Base model to transcribe youtube videos and summarize it right on your cpu.

In my opinion it performed pretty well on English videos that were over 2 hours long in length.

north sail Sep 28, 2023, 6:45 AM

#

full sage Good insight, I will be sure to checkout silero-VAD. And I agree, we see many op...

Not sure if I can send links in this server, but there's already some research talking about how much better diarization is when LLMs are added to the equation. I hope we'll get to see more of stuff easily accessible to the public soon

I hope your project works out! I've seen a small number of repos on Github already try to implement a GPT model to help with transcriptions, but I haven't tested them out myself yet

#

https://news.superagi.com/2023/09/12/speech-processing-new-research-merges-large-language-models-with-acoustic-diarization-for-enhanced-accuracy/

SuperAGI News

SkyAGI

Speech Processing: LLM with Acoustic Diarization for Enhanced Accuracy

In recent developments in speech processing technology, researchers have introduced an innovative method that synergizes Large Language Models (LLMs) with

lapis jacinth Sep 28, 2023, 9:16 AM

#

north sail That's interesting. Thanks for the explanation! The best method I've found for ...

Plus one for silero-VAD with Whisper. It can accurately remove those non-speaking portions from your audio that might otherwise confuse Whisper. You may want to apply some post-processing to combine the resulting shorter transcripts.

north sail Sep 28, 2023, 10:11 PM

#

lapis jacinth Plus one for silero-VAD with Whisper. It can accurately remove those non-speakin...

The stuff I use automatically does it for me! I just assumed the VAD did it for me the whole time lol

autumn bolt Sep 30, 2023, 7:58 PM

#

ik this is prolly less but. Whisper is awesome ive been using it to get notes of my classes

#

i really like it

kind skiff Oct 1, 2023, 4:43 PM

#

it really is excellent, best speech rec i've ever used - plugged it into my ttrpg table blended with eleven labs. https://www.youtube.com/watch?v=CXIrFXDQBFQ . They way it punctuates normal speech is outstanding.

YouTube

TheBwt

Elevenlabs Foundry + Openai Whisper

Testing out a workflow for doing voices adhoc during play.

▶ Play video

weary trail Oct 1, 2023, 5:52 PM

#

I keep getting stuff like "由 Amara.org 社群提供的字幕" when theres silence and other chinese text talking about amara.org having transcribed this conversation even though i have told the ai to transcribe it as such when theres silence in an audio

#

I would send a picture but this dumb### ai keeps deleting my image and timeouting me

#

#

full root Oct 2, 2023, 7:52 AM

#

has anyone tried transcribing using whisper in a thread? because it just seems to not halt for me which is weird.

#

Im working on live audio that I have a threaded function that reads mp3 frames and then given some volume threshold saves the audio and calls a new thread to transcribe it using whisper

#

but that thread just doesnt do any transcribing until I ctrl+C the program

#

then the rest of that threaded function executes and I get the transcription

autumn bolt Oct 2, 2023, 8:07 AM

#

How do I use Whisper?

#

How do i even access it?

full root Oct 2, 2023, 8:11 AM

#

pip install whisper

autumn bolt Oct 2, 2023, 8:15 AM

#

I'm interested in watching a YouTube video that's in another language. While there are English subtitles available through "auto-translate", I'm curious to know if OpenAI offers a tool that can analyze the foreign language audio and convert it into an English voiceover?

#

Can someone do this for me? ^ I have no idea how to use Whisper

full root Oct 2, 2023, 8:25 AM

#

full root but that thread just doesnt do any transcribing until I ctrl+C the program

I figured out the processing was just being slowed down quite a lot

#

especially the call to "FeatureExtractor" (I am using faster-whisper)

#

My plan is to use a process instead

rustic atlas Oct 2, 2023, 7:57 PM

#

Good evening, does anyone know if the Whisper API can be used to generate an .SRT file with the transcripts?

#

I know it can be done by running the model locally with the Whisper package, but I don't have that computational capability and I would like to use the API

brave kestrel Oct 3, 2023, 7:06 PM

#

I'm sure it's possible

#

maybe use ffmpeg to separate the audio to a usable streamtype then use python or whatever to make the api calls, and generate SRT from the transcript

#

Commission somebody if you're not a programmer. Or ask GPT to help your journey

tender rivet Oct 4, 2023, 3:24 AM

#

kind skiff it really is excellent, best speech rec i've ever used - plugged it into my ttrp...

@prisma sinew @worn tundra check this out

kind skiff Oct 4, 2023, 3:25 AM

#

<.<

.>

tender rivet Oct 4, 2023, 3:26 AM

#

oops, sorry for the ping tho =P

prisma sinew Oct 4, 2023, 3:26 AM

#

Neat. I've got something very similar set up, not a parrot so much as a two-way, but similar

tender rivet Oct 4, 2023, 3:26 AM

#

I Loved the fact that it is integrated with foundry

kind skiff Oct 4, 2023, 3:27 AM

#

let me get my macro for ya'll then

prisma sinew Oct 4, 2023, 3:27 AM

#

I still need to decide the best method of natural capture, like how long of a pause, and background noise ignoring

#

Right now it just waits for a 1 sec pause lol

kind skiff Oct 4, 2023, 3:27 AM

#

chatgpt gave me a great technique

#

Okay:
main macro - https://gist.github.com/thebwt/8142c510c2d2ae31f0c8e6bbfb45016a
python fastapi endpoint to do the whisper stuff (I think I don't need this in the end... but haven't refactored yet) -
https://gist.github.com/thebwt/003f58a4454876c4e706b7d5875b6fbb

i'm lazy, so I just ssh port forwarded the running service to localhost.

source chat: https://chat.openai.com/share/8a8313a1-bc4e-4723-9aa5-702ed15b992a

Gist

foundry macro with stop detection

foundry macro with stop detection. GitHub Gist: instantly share code, notes, and snippets.

Gist

Whisper middleware api

Whisper middleware api. GitHub Gist: instantly share code, notes, and snippets.

ChatGPT

kind skiff Oct 4, 2023, 3:36 AM

#

prisma sinew Right now it just waits for a 1 sec pause lol

okay actually that's literally what I do

#

        mediaRecorder.stop();
      }```

prisma sinew Oct 4, 2023, 3:37 AM

#

kek

autumn bolt Oct 4, 2023, 12:54 PM

#

Can whisper only be installed on Windows?

#

Has anyone installed whisper to a MacBook that has Parallel Desktop installed (Windows VM)

brave kestrel Oct 4, 2023, 3:37 PM

#

do you mean a local installation of the model or the api?

tender rivet Oct 4, 2023, 7:51 PM

#

you can run whisper on anything that can run python (and has good enough hardware)

#

the OS is not a concern

autumn bolt Oct 4, 2023, 9:11 PM

#

brave kestrel do you mean a local installation of the model or the api?

Yeah local install I suppose. Is there an easy web version? It looks quite confusing to install and use.. is there an easy way to use with ?

brave kestrel Oct 4, 2023, 9:28 PM

#

Depends on what you want to use it for

#

it's likely somebody has built an app for that and whisper may not be ideal. It would probably involve installing programming tools and writing some code

#

You have a macbook so you have python. probably somebody could give you a couple terminal commands and it would start spitting out your transcript

brave kestrel Oct 5, 2023, 7:02 PM

#

autumn bolt Has anyone installed whisper to a MacBook that has Parallel Desktop installed (W...

If you have Chrome all you wanna do is transcribe, I have you. Here's a branch off another project I made. Save the index.html and run it in chrome.
https://github.com/danomation/chrome-gpt-assistant/tree/transcriber

GitHub

GitHub - danomation/chrome-gpt-assistant at transcriber

Simple javascript client side voice assistant for desktop chrome. - GitHub - danomation/chrome-gpt-assistant at transcriber

autumn bolt Oct 5, 2023, 9:07 PM

#

brave kestrel If you have Chrome all you wanna do is transcribe, I have you. Here's a branch o...

Thank you!

brave kestrel Oct 5, 2023, 9:10 PM

#

It wont stay on due to privacy so that limits its use

stoic jewel Oct 6, 2023, 2:26 PM

#

Lol, whisper just came up with an advertisement for an existing webshop by transcribing a silent wav file...

#

Go to Beadaholique.com for all of your beading supplies needs!

#

Super weird

brave kestrel Oct 7, 2023, 5:16 AM

#

yep, it gets hallucinations all the time

#

I'm a novice when it comes to whisper, but the more I use it the more I notice I need some way of handling dubious inputs

lime bobcat Oct 7, 2023, 7:40 PM

#

Hi. Anyone knows of a windows dictation software based on whisper?

hexed tide Oct 8, 2023, 4:50 AM

#

Does anyone know how many minutes can chatgpt listen and transcribe in app? Does plus customers has any advantage over this?

pure igloo Oct 9, 2023, 3:54 AM

#

Hello, I'm facing some issues getting Whisper to transcribe acronyms in audios
E.g when a speaker says 'SDR', it should be transcribed as 'SDR', not 'as the are'.
I would like to include a custom dictionary where the user would pre-record how all these acronyms are pronounced + how it should be spelt. Does anyone know how I can do this?

I just need some quick and short fine-tuning so that the model can transcribe a list of 10-20 acronyms accurately for each audio. This list of acronyms varies for each audio hence fine-tuning the model every time, which would take 5-10h in the post below is not sustainable.
Thanks in advance!

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2

Hugging Face Forums

Adding custom vocabularies on Whisper

Hey @sebasarango1180, If you have a corpus of paired audio-text data with examples of such terms/entities/acronyms, you could experiment with fine-tuning the Whisper model on this dataset and seeing whether this improves downstream ASR performance on this distribution of data. To do so, you can follow the blog post at Fine-Tune Whisper For Mult...

potent pier Oct 9, 2023, 2:26 PM

#

pure igloo Hello, I'm facing some issues getting Whisper to transcribe acronyms in audios E...

https://platform.openai.com/docs/guides/speech-to-text/improving-reliability

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

potent pier Oct 9, 2023, 2:26 PM

#

pure igloo Hello, I'm facing some issues getting Whisper to transcribe acronyms in audios E...

You might also try post-processing text with GPT4 or 3.5

pure igloo Oct 9, 2023, 2:32 PM

#

potent pier You might also try post-processing text with GPT4 or 3.5

oh yes, if its not possible to fine tune whisper with a custom dictionary then I'll go around it with llm post processing

#

thanks for the suggestion!

kind badge Oct 9, 2023, 7:03 PM

#

Hey, I'm reporting a very annoying bug when it comes to whisperemedium, it captures text great, but sometimes it loops and repeats one word over and over again, can you fix it please?

trim citrus Oct 10, 2023, 2:08 AM

#

has anyone ever run whisper jax in a production environemnt on a tpu

autumn bolt Oct 11, 2023, 2:03 AM

#

How can I set up Whisper on my MacBook Pro to utilize its voices when using macOS' "Text to Speech" function for more realistic speech output, similar to the voices in ChatGPT's "Voice chat" feature?

jaunty linden Oct 16, 2023, 3:51 AM

#

Does whisper support Brazilian Portuguese vs Portugal Portuguese?

brave kestrel Oct 16, 2023, 4:16 AM

#

Imagine world scale on device STT data collection using phones

#

Major privacy concerns but possible

#

@autumn bolt Learn some basics with python or node.js. There's a lot of ways you can accomplish that goal. Python is my suggestion. You can use python modules to talk to apple's speech synthesis api.

#

What you'd do is import modules for whisper (bundled into openai) and macos speech. start a python file, create an api call to whisper (or the module for the self hosted model) , do whatever you want to with the code, then send it to macos's tts with this
https://pypi.org/project/macos-speech/

PyPI

macos-speech

Leverage the macOS say command into you scripts

#

Probably possible to do this without any api calls using the opensource whisper models instead of burning api time

#

You dont have to be a hardcore programmer. There's libraries for just about everything with python

brave kestrel Oct 16, 2023, 5:11 AM

#

I asked gpt4 and it thinks you can just use "say" to accomplish the tts. Anyway... here's it's code that may or may not work. Good launching point. Get an openai api key to start. Add all this to a speech.py file, edit it, then start it in terminal

import sounddevice as sd
import scipy.io.wavfile
import requests
import openai
from subprocess import call

def main():
    recording = sd.rec(int(10 * 44100), samplerate=44100, channels=2)
    sd.wait()
    scipy.io.wavfile.write("output.wav", 44100, recording)

    response = requests.post("https://api.openai.com/v1/whisper/recognize/async",
                             headers={"Authorization": "Bearer <your OpenAI key>"},
                             data={"file": ("audio.wav", open("output.wav", "rb"), "audio/wav")})

    transcribed_text = response.json()['task']['postprocessed']['utterances'][0]['postprocessed']

    call(["say", transcribed_text])

if __name__ == "__main__":
    main()

replace "<your OpenAI key>" with your actual OpenAI key

#

Requirements:

pip install sounddevice
pip install scipy
pip install requests
pip install openai

Use self hosted whisper if you want. Might require some more imports and config

woeful plaza Oct 20, 2023, 12:18 AM

#

Is the API server down?

lapis jacinth Oct 20, 2023, 12:56 AM

#

woeful plaza Is the API server down?

I'm also seeing errors on the Whisper API.

lapis jacinth Oct 20, 2023, 1:18 AM

#

Interesting that ChatGPT Voice is still operational (at least for me) while the public Whisper API is down. hmmm
I would have expected that ChatGPT was dependent on that service too.

sly nymph Oct 20, 2023, 3:22 AM

#

I feel so much better now... released a fun dictation thingy-mo-bobber using whisper today to some friends, tested thoroughly of course - and now I'm getting nothing but error reports and can't even get it to work myself:

Whisper Transcription Error: Object reference not set to an instance of an object.
==========================================================================```

#

So... you're all not alone, it's me too - and my friends online. Whisper has gone completely silent. 🤫

sly nymph Oct 20, 2023, 4:30 AM

#

Seems to be back online and working now:

high mountain Oct 20, 2023, 12:10 PM

#

Hello, I wonder how is it possible that Whisper in ChatGPT during the voice conversation is so fast, but when using as API is slow as a turtle.

daring island Oct 20, 2023, 3:28 PM

#

Seems to be back online and working now:

sly nymph Oct 20, 2023, 4:18 PM

#

high mountain Hello, I wonder how is it possible that Whisper in ChatGPT during the voice conv...

The time of return is directly related to the length of the audio file afaik - not sure if when you are using the Whisper API you are sending larger files than you would with a user input pre-processed and transcribed through Whisper for a subsequent ChatGPT API call?

hexed tide Oct 22, 2023, 7:33 AM

#

Does anyone know if ChatGPT plus subscription allows recording for an hour using speech to text?

lofty aurora Oct 22, 2023, 7:40 AM

#

hexed tide Does anyone know if ChatGPT plus subscription allows recording for an hour using...

no, it doesn't

hexed tide Oct 22, 2023, 7:42 AM

#

lofty aurora no, it doesn't

What’s the cutoff time? I get this error now after recording for 10 minutes. Any alternative?

lofty aurora Oct 22, 2023, 7:43 AM

#

hexed tide What’s the cutoff time? I get this error now after recording for 10 minutes. Any...

where are you doing that?

small juniper Oct 23, 2023, 8:54 PM

#

Anyone know if the real-time Whisper API at OpenAI or Azure will allow fine-tuning any time soon?

stone breach Oct 29, 2023, 2:34 PM

#

Heh, neat. Transcribed a Flemish show with medium, then translated the same with large-v2 whisper, and then.. asked chatgpt-4 to translate sections of the english output to flemish and then the original flemish transcription output to english in the same convo -- nearly flawless english subtitles. Interesting how combining the two transcriptions gives just the right context to get a really clean output.

dreamy plover Oct 31, 2023, 12:01 PM

#

I'm planning to host hugging face whisper large-v2 model on Sagemaker. Let me know which instance should I use And I'm looking for transcription output within 5 minutes for a audio of length 30 - 45 minutes.

dense spruce Oct 31, 2023, 2:29 PM

#

Hey everyone, I hope it is the correct channel to ask this question. I have a script and use stable-ts.
But it is telling me that the module "stable_whisper" is not found. I already installed stable-ts + whisper

formal bramble Nov 1, 2023, 3:10 AM

#

This is perhaps a newbie observation, but I am used to voice typing and speaking various punctuation. I love whisper, but at first was kind of annoyed it would type out spoken punctuation. But I just realized when I emphasized the word THE, it typed it in all caps. I did not realize it listened to intonation and so I tested it out and depending on if I sound excited, it will put a period or exclamation point. Somehow, that was just very impressive to me. That is all. 🙂

surreal dragon Nov 2, 2023, 4:51 AM

#

formal bramble This is perhaps a newbie observation, but I am used to voice typing and speaking...

I did this exact same thing 😂 we are just too used to how we HAD to do it, the future has caught up with us :)

light marsh Nov 2, 2023, 2:12 PM

#

Hello, everyone.
I am developing STT model now, But I am confusing how should I do for it at first. let you provide me some advise.

elfin bear Nov 3, 2023, 11:02 AM

#

Hi, I'm using Whisper to generate subtitles for a music video. I use the 'max_line_width' and 'max_line_count' flags to format the output the way I want. Though, Whisper does not separate lines the way I need: on one line there is the end of a sentence and the beginning of the following sentence. Whisper does seem to detect that it is a different sentence as it generate an upper case letter. Do you have any insight on how to make whisper break lines at the end of sentences?

dreamy plover Nov 3, 2023, 6:06 PM

#

elfin bear Hi, I'm using Whisper to generate subtitles for a music video. I use the 'max_li...

Can you paste some output example?

elfin bear Nov 3, 2023, 6:27 PM

#

Sure, here are the first 3 lines. It is in french, but I hope this can still illustrate the point:

1
00:00:12,760 --> 00:00:14,900
Les Cyrânes Près du coufre, loin du

2
00:00:14,900 --> 00:00:17,900
ciel Plus je souffre et moins je

3
00:00:17,900 --> 00:00:21,180
sais Si ce que je crois est vrai

The upper case letters are good. What I would want to get:

1
00:00:12,760 --> 00:00:XX,XXX
Les Cyrânes

2
00:00:XX,XXX --> 00:00:XX,XXX
Près du coufre, loin du ciel

3
00:00:XX,XXX --> 00:00:XX,XXX
Plus je souffre et moins je sais

4
00:00:XX,XXX --> 00:00:21,180
Si ce que je crois est vrai

#

This is with max_line_width=35 and max_line_count=1

#

When asking for the output as text, the line breaks are good:

Les Cyrânes
Près du coufre, loin du ciel
Plus je souffre et moins je sais
Si ce que je crois est vrai

solid pollen Nov 3, 2023, 10:45 PM

#

Anyone has the perfect prompt/workflow to take Whisper output as text (ideally with timestamps), pass it through GPT-4 and get it formated in paragraphs according to the topic discussed?

still moon Nov 6, 2023, 1:39 AM

#

When training/fine-tuning whisper locally, using:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py

..my load_model() failes (as does torch.load() on anything I try). The training doesn't create a .pt file that matches openai-whisper's .pt files (which are ZIP files). The actual files in an official .pt are like:

$ unzip -v tiny.pt | head -7
Archive:  tiny.pt
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
   19363  Stored    19363   0% 1980-00-00 00:00 49a57bdd  archive/data.pkl
  344064  Stored   344064   0% 1980-00-00 00:00 e2b0aff3  archive/data/0
 1152000  Stored  1152000   0% 1980-00-00 00:00 72f960e2  archive/data/1
     768  Stored      768   0% 1980-00-00 00:00 2a87d98d  archive/data/10

Whereas I get a directory created with these files:

$ ls -1s
total 154444
4 README.md
36 added_tokens.json
4 all_results.json
4 config.json
4 eval_results.json
4 generation_config.json
484 merges.txt
147528 model.safetensors
52 normalizer.json
4 preprocessor_config.json
4 runs
2776 small.pt
4 special_tokens_map.json
2424 tokenizer.json
280 tokenizer_config.json
4 train_results.json
4 trainer_state.json
8 training_args.bin
816 vocab.json

GitHub

transformers/examples/pytorch/speech-recognition/run_speech_recogni...

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

#

anyone know how to get a proper .pt out of this thing? small.pt is not a proper zip:

$ file small.pt
small.pt: Zip archive data, at least v0.0 to extract, compression method=store
$ unzip -v small.pt
...End-of-central-directory signature not found. Either this file is not a zipfile, or...

#

(and obviously that 2mb small.pt file is not the full model anyway)

still moon Nov 6, 2023, 2:46 AM

#

https://discuss.huggingface.co/t/cannot-load-fine-tuned-whisper-model/55839

Hugging Face Forums

Cannot load fine-tuned whisper model

I fine-tuned whisper multilingual models for several languages. I have the checkpoints and exports through these: train_result = trainer.train(resume_from_checkpoint=maybe_resume) ... trainer.save_model(output_dir=EXPORT_DIR) Now I want to use these fine-tuned models in another script to test against a test set with whisper.transcribe(...) Wh...

#

boreal island Nov 6, 2023, 2:42 PM

#

Hey, guys. Do you know if it is possible to set prompt to whisperAi on .Net?

quiet delta Nov 6, 2023, 6:48 PM

#

hey everyone, a couple of days ago I built a voice-cloned text to speech assistant capable of answering questions in meetings, or in any peer to peer environment. Check it out:

https://github.com/AndreSlavescu/Meeting-Buddy

GitHub

GitHub - AndreSlavescu/Meeting-Buddy: automate answering in live me...

automate answering in live meetings by generating readable scripts on the fly - GitHub - AndreSlavescu/Meeting-Buddy: automate answering in live meetings by generating readable scripts on the fly

lethal briar Nov 7, 2023, 10:47 AM

#

Does anyone have experience resolving issues with ffmpeg on a Mac? have a Python file that uses whisper to transcribe some audio, but whenever I run it I get:
[Errno 20] Not a directory: 'ffmpeg'

I installed whisper using:
pip3 install -U openai-whisper

tried also using pip3 install ffmpeg-python, successfully installed ffmpeg. but running the file still didn’t work, so to further troubleshoot ran which ffmpeg. this gives ‘ffmpeg not found’

i then tried to add the ffmpeg executable to my .zprofile (as my terminal window is using zsh not bash) but this doesn't seem to resolve anything either, any suggestions?

iron mica Nov 7, 2023, 12:12 PM

#

lethal briar Does anyone have experience resolving issues with ffmpeg on a Mac? have a Python...

📎 message.txt

dusky ibex Nov 7, 2023, 3:40 PM

#

Hi everyone, I'm noticing a small bug with the "client.audio.transcriptions.create" endpoint where I will send a message in french and it will transcribe it in english. Is anyone else experiencing this? I guess maybe the "audio.transcription" API is getting confused with the "audio.translation" API?

muted axleBOT Nov 7, 2023, 5:06 PM

#

Whisper v3

We are releasing Whisper large-v3, the next version of our open source automatic speech recognition model (ASR) which features improved performance across languages. We also plan to support Whisper v3 in our API in the near future.

cunning umbra Nov 7, 2023, 5:50 PM

#

does whisper-1 point to large-v3 now or is it still large-v2

woeful kraken Nov 7, 2023, 7:19 PM

#

cunning umbra does `whisper-1` point to large-v3 now or is it still large-v2

Last I heard and in the online docs, it isn't in the OpenAI API yet

shell thunder Nov 7, 2023, 10:11 PM

#

sly nymph Seems to be back online and working now:

How did you get whisper to work on VoiceAttack, and would it work to trigger commands?

sudden aspen Nov 7, 2023, 10:36 PM

#

lethal briar Does anyone have experience resolving issues with ffmpeg on a Mac? have a Python...

install vlc

#

fork your own ffmpeg and put on whisper folder

true saddle Nov 8, 2023, 12:16 AM

#

TTS question. any voice that sounds decent in spanish?

sudden aspen Nov 8, 2023, 12:30 AM

#

on normal voice i use juniper idk about tts model sorry

prisma sinew Nov 8, 2023, 1:23 AM

#

true saddle TTS question. any voice that sounds decent in spanish?

I was told that Sky is good at Spanish compared to the others by the team

kind berry Nov 8, 2023, 6:00 PM

#

quiet delta hey everyone, a couple of days ago I built a voice-cloned text to speech assista...

Would this also work if you were to feed it voice recordings from someone who passed away (my brother), and have it respond to questions using his voice?

quiet delta Nov 8, 2023, 6:06 PM

#

kind berry Would this also work if you were to feed it voice recordings from someone who pa...

Yes absolutely

#

You just need a short 20 second audio clip, even 2 second will work but it won’t be as good

#

I recommend 20 seconds since that’s what I found to work great

#

And it does a complete voice clone and will read the output texts to your questions in your brothers voice

#

If you have a powerful GPU the processing will be much faster, if you have cpu, it’ll take some time. But it’s completely on the fly

stiff hill Nov 9, 2023, 12:26 AM

#

I'm looking to use whisper to transcript a podcast, is it capable of identifying speakers?

For example:

Speaker 1: "hello and welcome!"
Speaker 2: "To today's video!"

elder tapir Nov 9, 2023, 1:04 AM

#

stiff hill I'm looking to use whisper to transcript a podcast, is it capable of identifying...

things might have changed since but iirc not natively, although you can mess around with the prompt to make it do something similar

prisma sinew Nov 9, 2023, 1:19 AM

#

stiff hill I'm looking to use whisper to transcript a podcast, is it capable of identifying...

I built a bot that does this. You just need to separate user audio streams and identify them. Then have the results tie per person

dry bough Nov 9, 2023, 6:36 AM

#

Is there any free website where we can use whisper to transcript our audio files using our API?

Or any chrome extension that transcripts videos run from any source?

prisma sinew Nov 9, 2023, 6:56 AM

#

dry bough Is there any free website where we can use whisper to transcript our audio files...

I recommend reading the documentation for whisper. It has some very simple Python code where you input an audio file for transcript as an example. I don’t know of any such services as it would cost them and providing an api key is sketchy at best

#

Building your own little tool with help from ChatGPT could be a great learning experience too

dry bough Nov 9, 2023, 7:27 AM

#

prisma sinew I recommend reading the documentation for whisper. It has some very simple Pytho...

Thanks, yes this is definitely on my todo list.

zenith shore Nov 9, 2023, 7:49 AM

#

Something went wrong. If this issue persists please contact us through our help center at help.openai.com. How can we fix this problem?

#

My chartGPT account has been experiencing this problem since this afternoon

obtuse wyvern Nov 9, 2023, 8:15 AM

#

Folks whisper iis free to use?

vocal inlet Nov 9, 2023, 8:33 AM

#

Is this server error?

safe swift Nov 9, 2023, 9:16 AM

#

zenith shore Something went wrong. If this issue persists please contact us through our help ...

same for me

clear badger Nov 9, 2023, 10:09 AM

#

I am facing the same problem, I tried to build GPTs and when I tried to upload files all my OpenAI collapsed and now I am unable to chat, create and do anything with my main account. (This is the error I am recieving: Something went wrong. If this issue persists please contact us through our help center at help.openai.com.)

delicate edge Nov 9, 2023, 12:21 PM

#

I've got the same issue: Something went wrong. If this issue persists please contact us through our help center at help.openai.com.

cursive tangle Nov 9, 2023, 12:34 PM

#

same issue here

weak bridge Nov 9, 2023, 1:25 PM

#

Good morning! Are you having trouble accessing it?

mystic swallow Nov 9, 2023, 2:03 PM

#

weak bridge Good morning! Are you having trouble accessing it?

I have been having trouble accessing all day

spare wren Nov 9, 2023, 2:29 PM

#

mystic swallow I have been having trouble accessing all day

me too, no access via browser, however be able to access it through the app

analog eagle Nov 9, 2023, 3:23 PM

#

prisma sinew I was told that Sky is good at Spanish compared to the others by the team

Will this voice be in a new release?
This are all this voices available..

Experiment with different voices (alloy, echo, fable, onyx, nova, and shimmer) to find one that matches your desired tone and audience.

prisma sinew Nov 9, 2023, 4:19 PM

#

analog eagle Will this voice be in a new release? This are all this voices available.. > Exp...

Ya the voices are different between the iOS ChatGPT and the api. Not sure about api voices for that sorry

cunning umbra Nov 9, 2023, 5:40 PM

#

prisma sinew I recommend reading the documentation for whisper. It has some very simple Pytho...

i dont think large-v3 is even on the api yet, and the official implementation is really slow dalle_tired

#

have been working on getting faster-whisper updated

prisma sinew Nov 9, 2023, 5:40 PM

#

Its not too slow in my experience but if you want instant maybe so. Do smaller batches

weary trail Nov 10, 2023, 12:19 AM

#

cunning umbra i dont think `large-v3` is even on the api yet, and the official implementation ...

Official implementation? faster-whisper?

cunning umbra Nov 10, 2023, 12:50 AM

#

weary trail Official implementation? faster-whisper?

the openai/whisper python repo

#

faster-whisper is a ~4x faster reimplementation

void plume Nov 10, 2023, 2:58 AM

#

is this integrated with ChatGPT?

#

is there a way to play around with this using ChatGPT?

#

Or this must be using the code?

#

Anyone is here?

prisma sinew Nov 10, 2023, 3:46 AM

#

You can use whisper and voice with ChatGPT on the app

#

But it’s not really a thing you can save or build without using api

void plume Nov 10, 2023, 3:48 AM

#

I see. I'm trying to see the process of re-making the Korean song lol

#

I thought it was really damn cool

eternal jacinth Nov 10, 2023, 12:49 PM

#

Hi! Do you konw whether the web app is built based on gpt-4 turbo now?

polar pendant Nov 10, 2023, 5:52 PM

#

it is

gray spoke Nov 10, 2023, 8:07 PM

#

I tried to create a script that:

Converts MP4 video to MP3 audio using ffmpeg
Transcribe the video using whisper-1
Translates the video trying to maintain the timestamps

It does the 1. and 2. section very well, but when it comes to translate and maintain timestamps it starts bugging a bit.

For example:

1
00:00:00,000 --> 00:00:07,820
se torna difícil.

Well, here we have a samba in 11 by 8...

It has a non-translated phrase before the translated one.
Someone successfully created a script that could translate and maintain the same timestamps?

#

(I hope that's the right channel to ask)

prisma sinew Nov 10, 2023, 9:05 PM

#

gray spoke I tried to create a script that: 1. Converts MP4 video to MP3 audio using ffmpe...

You probably will want to track requests by timestamp yourself. Have the request file and response thread tied to a timestamp and then have a data object of some kind store timestamp to file to translation

sudden aspen Nov 10, 2023, 10:20 PM

#

whispererr on chatgpt when

gray spoke Nov 10, 2023, 11:16 PM

#

prisma sinew You probably will want to track requests by timestamp yourself. Have the request...

I will try that, thanks 😃👍

teal sphinx Nov 11, 2023, 7:18 AM

#

Anyone have a good js voice detection package/api that can work well with whisper? Not a lot of info online

forest oxide Nov 11, 2023, 2:07 PM

#

Hello, any updates in the official documentation?

misty spear Nov 11, 2023, 3:27 PM

#

Hi can report a error here in OpenAI will the openAI team attend the issues

lofty aurora Nov 11, 2023, 3:46 PM

#

misty spear Hi can report a error here in OpenAI will the openAI team attend the issues

you can report bugs in #1070006915414900886

misty spear Nov 11, 2023, 4:37 PM

#

lofty aurora you can report bugs in <#1070006915414900886>

I did that a while ago

lofty aurora Nov 11, 2023, 4:38 PM

#

misty spear I did that a while ago

that's the best I can suggest unfortunately

misty glacier Nov 11, 2023, 9:31 PM

#

ive made music on ableton forever, i never looked at an ai music maker thingy. im really stoned(sedated thinking), how good of results does it give? can i get specific instruments to remix into a daw?

chilly frigate Nov 12, 2023, 2:38 AM

#

Whisper 3.0 amazing

dense bone Nov 12, 2023, 4:54 PM

#

best mac or web app that uses whisper so I can quickly speech to text?

nocturne thistle Nov 12, 2023, 10:04 PM

#

chilly frigate Whisper 3.0 amazing

whats the main difference between 3.0 and 2.0

cunning umbra Nov 12, 2023, 10:06 PM

#

nocturne thistle whats the main difference between 3.0 and 2.0

1

little iron Nov 12, 2023, 11:16 PM

#

Hows it going yall

undone spindle Nov 13, 2023, 3:44 AM

#

cunning umbra 1

😂 😂

normal dirge Nov 13, 2023, 2:31 PM

#

Can I get text of audio file using whisper although that audio file is 250MB ?

#

Thanks in advance for your kind help.

pure veldt Nov 13, 2023, 5:39 PM

#

Is that possible create near realtime translation (audio->text <> TTS/text->new-lang). Or translate recorded meetups to other langs? Did you see similar GitHub repo, example?

hallow totem Nov 14, 2023, 12:14 AM

#

Hi everybody. This could be a silly question. I've been researching text to voice for an app I want to make. E.g.

User speaks into mic > Text appears

Most Browsers come with this functionality built in these days.

Is this likely how the ChatGPT app works OR is that one of the tings that whisper can do for me but better?

limber flower Nov 14, 2023, 3:34 AM

#

Hi All, based on the doc, seems Text-to-Speech only support python, not node, is this right?

vocal oasis Nov 14, 2023, 3:36 AM

#

Hi everybody. I have a question about speech to text. Is there any way to convert voice into text in real time? For example: I enter a voice file as soon as I click start, where will it translate the voice into text?

neon moth Nov 14, 2023, 4:57 AM

#

vocal oasis Hi everybody. I have a question about speech to text. Is there any way to conver...

Chrome has a live transcribe accessibility option. I don’t know if that is what you want though 🤔

#

Is Whisper a in-real—time speech-to-text AI? I am an online student and looking for ways to attend classes and having subtitles being fed through the audio.

vocal oasis Nov 14, 2023, 5:44 AM

#

neon moth Chrome has a live transcribe accessibility option. I don’t know if that is what ...

What I mean is that I'm looking into whether the open ai Whisper model's api has real-time transcription or not.

neon moth Nov 14, 2023, 6:00 AM

#

vocal oasis What I mean is that I'm looking into whether the open ai Whisper model's api has...

Oh, same then

wind peak Nov 14, 2023, 9:56 AM

#

Anyone know what is this "Whisper-1" model? Because i cannot find model in whisper that has this name?

normal dirge Nov 14, 2023, 7:39 PM

#

normal dirge Can I get text of audio file using whisper although that audio file is 250MB ?

Any advice? 🙂

wind peak Nov 15, 2023, 7:26 AM

#

normal dirge Any advice? 🙂

Yes you can split that to chunks

simple hornet Nov 15, 2023, 10:15 PM

#

Any ideas about this? chunks are working mp3 files so they don't seem corrupted

#

transcription works for the first chunk but getting that 400 for the rest

raw vortex Nov 15, 2023, 10:39 PM

#

@simple hornet how large of a file are you working with? And what size are you chunking it too? The “Invalid File Format” is a weird one

simple hornet Nov 15, 2023, 10:40 PM

#

i made the chunks 20mb, the first one works, 2nd and 3rd don't (sizes 20mb, 20mb, 12mb)

raw vortex Nov 15, 2023, 10:47 PM

#

simple hornet i made the chunks 20mb, the first one works, 2nd and 3rd don't (sizes 20mb, 20mb...

Maybe it’s a file integrity issue, but that also would also be weird since you seem to be chunking the first one I assume in the same way as the other ones. It might also just be a Whisper bug and not your code

simple hornet Nov 15, 2023, 10:48 PM

#

hmm weird. thanks tho

vapid lava Nov 16, 2023, 3:42 PM

#

hey there, can someone confirm only the large model is new v3 for whisper right. there is no insane new tiny v3 model?

#

cause i need a tiny one to run on android

remote fog Nov 17, 2023, 2:13 PM

#

Subtitle: Your open-source, self-hosted subtitle generator for seamless language translation
https://github.com/innovatorved/subtitle

GitHub

GitHub - innovatorved/subtitle: lets watch foreign drama in ENG Sub...

lets watch foreign drama in ENG Subtitles. Contribute to innovatorved/subtitle development by creating an account on GitHub.

simple hornet Nov 19, 2023, 6:26 AM

#

looking for info about transcription for multiple speakers... i feel like it must exist, just not sure where to start

still moon Nov 19, 2023, 6:57 AM

#

gray spoke I tried to create a script that: 1. Converts MP4 video to MP3 audio using ffmpe...

(I'm responding before seeing other potential replies): Have you seen the other whisper projects which attempt to provide more accurate alignment?

#

Got my own q: I'm using hf's seq2seq to fine-tune whisper models, but I'm preparing audio+transcription data for someone who breaths on a ventilator. I'm wondering if I can put "<|nospeech|>" tokens directly in my training text for the pauses amidst her sentences where the ventilator takes a breath, as well as to mark some areas just of ventilator noise so the model can learn, "this is NOT her speaking."

buoyant sonnet Nov 20, 2023, 4:36 PM

#

remote fog Subtitle: Your open-source, self-hosted subtitle generator for seamless language...

➜ subtitle git:(master) python3 subtitle.py example/story.mp4
/Users/ed/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
Model base exists
ERROR:root:/bin/sh: ./binary/whisper: cannot execute binary file
ERROR:root:Error while transcribing
ERROR:app.core.app:An error occurred: Error while transcribing
ERROR:root:An error occurred: cannot unpack non-iterable NoneType object

On my M2 Macbook - it downloaded the model ok (after I pip3 installed ffmpeg and gdown) but then I ran into this

#

guessing I have to chmod whisper or something?

remote fog Nov 20, 2023, 4:37 PM

#

buoyant sonnet ➜ subtitle git:(master) python3 subtitle.py example/story.mp4 /Users/ed/Library...

Currently binary is not compiled for Mac

#

I will add support for Mac within the next 24 hours.

buoyant sonnet Nov 20, 2023, 4:40 PM

#

ok no worries - I'm going through https://medium.com/gimz/how-to-install-whisper-on-mac-openais-speech-to-text-recognition-system-1f6709db6010 to install it globally, then I guess I can just symlink it

remote fog Nov 20, 2023, 4:41 PM

#

Nhh

#

The models used in this project are a bit different

buoyant badge Nov 20, 2023, 4:42 PM

#

Hi! I'm new here. I'm devvng a chatbot in python.

buoyant sonnet Nov 20, 2023, 4:43 PM

#

remote fog The models used in this project are a bit different

ah ok, I'll play around a bit anyway and keep an eye on things 👍

buoyant badge Nov 20, 2023, 4:43 PM

#

hmm. Whisper - just learning about it.. looks interesting

wind veldt Nov 21, 2023, 11:46 PM

#

whisper API question - any luck getting it to ignore silence in longer transcriptions (>30min)? it might be a bad prompt, but i either get something like Silence. Silence. Silence. ... or Yeah. Like. Yeah. Like. ... in between long pauses.

autumn bolt Nov 22, 2023, 7:12 PM

#

happy_avocado Whisper 3

normal dirge Nov 22, 2023, 8:32 PM

#

Hello, If my pc is supporting gpu, can whisper generate text in short time?

With cpu, it took around 924 seconds to get text from 29mb audio.

Please advice me if you have experience.
https://github.com/openai/whisper/#available-models-and-languages

GitHub

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale ...

Robust Speech Recognition via Large-Scale Weak Supervision - GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

autumn bolt Nov 22, 2023, 9:40 PM

#

Anyone know how to use Whipser with the openai npm package in Node.js? It worked for me before but no longer works after the update to version 4.20.0

unreal thorn Nov 23, 2023, 12:23 AM

#

SH#

autumn bolt Nov 23, 2023, 4:20 AM

#

autumn bolt Anyone know how to use Whipser with the openai npm package in Node.js? It worked...

Ok I found the info now. There's a node example in this page https://platform.openai.com/docs/api-reference/audio/createTranscription?lang=node

grave eagle Nov 23, 2023, 1:54 PM

#

neon moth Oh, same then

You can use the input of your microphone and split that into chunks and feed them to the API. Real-Time is a word, as AI, which is quite flexy. Speaking of Terms like in embedded systems: No, its really absolutely not real-time. When you attend class and you have enough calculation power in your laptop: Its okay to attend with a short delay. Whisper meanwhile does not support that out-of-the-box. Separating in chunks has to be done by your software. There is in fact a module for python for that called "whisper_mic" by Blake Mallory which appeared to work fine for me.

gray spoke Nov 23, 2023, 5:00 PM

#

still moon (I'm responding before seeing other potential replies): Have you seen the other ...

I actually did, but did not find one

brave turret Nov 23, 2023, 7:53 PM

#

Anyone know if it is possible to add prompt to huggingface distil-whisper?

normal dirge Nov 23, 2023, 8:28 PM

#

brave turret Anyone know if it is possible to add prompt to huggingface distil-whisper?

that models is focusing to detect text from audio, what kind of prompt are you talking?

brave turret Nov 23, 2023, 8:38 PM

#

Whisper is a transformer model with language modeling head on top which predicts next token just like GPT with the context of previous tokens. So because of that u can by hand append extra tokens for each chunk of audio while decoding. This can be used to provide or “prompt-engineer” a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those words correctly, but I don't want to implement it myself was wondering if hf had it out of the box like openai api does:
https://platform.openai.com/docs/guides/speech-to-text/prompting

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

normal dirge Nov 23, 2023, 8:40 PM

#

brave turret Anyone know if it is possible to add prompt to huggingface distil-whisper?

I think it would work therer as well.

#

transcribe(filepath, prompt="")
You can actually try in your new whisper model

vapid lava Nov 24, 2023, 11:59 AM

#

anyone knows if they will also release a tiny v3?

cunning umbra Nov 24, 2023, 4:36 PM

#

simple hornet looking for info about transcription for multiple speakers... i feel like it mus...

diarization?

cunning umbra Nov 24, 2023, 4:40 PM

#

grave eagle You can use the input of your microphone and split that into chunks and feed the...

naive chunking breaks transcribed words that are split on the chunk boundary, have to get creative

cunning umbra Nov 24, 2023, 4:48 PM

#

cunning umbra diarization?

https://huggingface.co/pyannote/speaker-diarization-3.1

pyannote/speaker-diarization-3.1 · Hugging Face

#

whisperx integrates everything but hasn't been updated for v3 yet

split salmon Nov 24, 2023, 5:30 PM

#

in case anyone didn't see, I have a question about the api here https://discord.com/channels/974519864045756446/1177642614795808849

#

it's really weird and i'm not really sure what to do

#

i got stuck for a while on this

unkempt arrow Nov 24, 2023, 6:51 PM

#

Hey guys, can the API return an object with start time end end time in ms for each word ? is punctuation separated ?

simple hornet Nov 25, 2023, 12:18 AM

#

@unkempt arrow doesn't seem like the API suports it from what I've seen, but take a look at WhisperX, whisper-timestamped, and faster-whisper on github

unkempt arrow Nov 25, 2023, 12:56 AM

#

@simple hornet i'll check it thanks. do you know what the json or json-verbose are returning compared to the "text" ?

split salmon Nov 25, 2023, 2:38 AM

#

split salmon i got stuck for a while on this

figured it out, needed to pass the correct mime type essentially

simple hornet Nov 25, 2023, 4:24 AM

#

unkempt arrow <@127825101780418560> i'll check it thanks. do you know what the json or json-ve...

I think json verbose gives phrase timestamps maybe? I looked briefly but idk if I saved an example, it was very verbose and not what I was looking for at the time, but I wasn't looking for word level timestamps, I was looking for diarization.

normal dirge Nov 27, 2023, 9:22 PM

#

Is there anyone who had set up local whisper using GPU?
I am having problem,and would be helpful if someonce can advice me

dapper bridge Nov 28, 2023, 10:54 AM

#

Hey, quick question about Whisper.
I'm working on a subtitles for my video. Whisper catches my speech perfectly, but most of the times it leaves a ton of sentences in one block in srt.
Is there a way to force Whisper to just throw only one sentence in one line? It would help a ton.

small juniper Nov 28, 2023, 3:33 PM

#

Is there anyone who had set up local

shrewd rose Nov 29, 2023, 10:33 AM

#

dapper bridge Hey, quick question about Whisper. I'm working on a subtitles for my video. Whis...

What I would do is either write some parsing algorithm that separates them evenly pondering the number of characters/length of the speech or use an external software to sync the subs with the text (basically you transcribe with whisper then you sync with something else). Look up for "thio joe youtube subtitles" on youtube, he mane a nice video and he explains how he syncs whisper with the actual video. The video is titled I Created Another App To REVOLUTIONIZE YouTube

main flame Nov 29, 2023, 11:59 AM

#

I shared my GPTs publicly there are 8 comments on that but i cant able to see it how to see that

steady needle Nov 30, 2023, 12:46 AM

#

I want to code when whisper TTS changes voices. Basically I want to record a conversation but then have two ai's do the talking to each other for anonymity. Is it possible?

prisma sinew Nov 30, 2023, 5:33 AM

#

steady needle I want to code when whisper TTS changes voices. Basically I want to record a con...

Whisper doesn't differentiate speakers. You could post process with GPT-4 to try to separate them based on semantics but that's not mega reliable.

#

If you have separate audio feeds for each user, then you can do that. I have a system that takes combined audio and then splits based on user before feeding to whisper. You get individual transcription that way and then feeding it to TTS is simple

magic nova Dec 1, 2023, 4:46 AM

#

Hello everyone!

I’m currently running Whisper locally in UE using the Runtime Speech Recognizer plugin. It works quite well, but I’m looking for faster and more accurate recognition. The small language models are very fast but unfortunately, they are extremely inaccurate in Hungarian, almost unusable. Only the large quantized language model begins to be usable, but it’s not precise enough and is incredibly slow. Can anyone help me with how to make a smaller language model more accurate using the available settings? I don’t fully understand what these parameters do. Also, is there a way to use Whisper with only Hungarian language libraries, since I only need Hungarian recognition? I’m guessing that could be smaller and maybe faster. Might be a silly question, but I’d appreciate any help. Thanks in advance!

near yew Dec 1, 2023, 5:34 AM

#

magic nova Hello everyone! I’m currently running Whisper locally in UE using the Runtime S...

Using the OpenAI API you can get faster responses since they have faster PCs

#

I've tried using whisper large on the OpenAI API and it's pretty fast

sand nacelle Dec 1, 2023, 6:22 AM

#

has anyone tried the TTS with non-English languages? i want to know if its able to read non-English texts well

near yew Dec 1, 2023, 6:26 AM

#

sand nacelle has anyone tried the TTS with non-English languages? i want to know if its able ...

I think the best way to know is to just test it

sand nacelle Dec 1, 2023, 6:52 AM

#

near yew I think the best way to know is to just test it

yeah, but im not fluent in any non-English language :/ wanted to know if anyone has any insights before i start reaching out to people who could test it for me!

magic nova Dec 1, 2023, 11:14 AM

#

near yew I've tried using whisper large on the OpenAI API and it's pretty fast

Thanks for the suggestion, Carrot. However, I need to run Whisper locally, and I'm focusing on optimizing smaller models for speed and accuracy. Do you have any insights in understanding the parameters to improve their performance?

normal dirge Dec 1, 2023, 3:35 PM

#

magic nova Hello everyone! I’m currently running Whisper locally in UE using the Runtime S...

result = model.transcribe(filepath, language="pt", fp16=False, verbose=True)

You can specify using language parameter

In your case, hungarian, you could need to add language code for that, not "hungarian", haha

magic nova Dec 2, 2023, 12:04 AM

#

normal dirge result = model.transcribe(filepath, language="pt", fp16=False, verbose=True) Yo...

Prince, I don't quite understand what you wrote. In the settings options, I only have what I showed in the screenshot and of course, I know how to change between different Whisper language models. The large model works well but is very slow, and I need this for a real-time application, hence my question. Based on the parameters visible in the picture, what would you recommend as the most accurate configuration for a smaller language model? And yes, the language is set to Hungarian. 🙂

spark minnow Dec 3, 2023, 3:55 PM

#

hi

gray cosmos Dec 3, 2023, 9:00 PM

#

has anyone figured out how to control (reduce) the duration each output segment to somewhere between 0.5 to 1 second ?
My output on average varies between 2 to 5 seconds

    const res = await openai.audio.transcriptions.create({
      file: fs.createReadStream(filePath),
      model: "whisper-1",
      prompt:
        "the duration of each output segment must be between 1 to 2 seconds",
      response_format: "verbose_json",
    });
    return res;

#

dusty isle Dec 4, 2023, 11:56 AM

#

Question: Does whisper work reliably with German Audio?

small juniper Dec 4, 2023, 4:24 PM

#

has anyone tried the TTS with non-

glass barn Dec 5, 2023, 3:19 AM

#

Hey guys, did anyone have deployed Whisper on Android and inference via GPU?

golden fjord Dec 5, 2023, 8:56 AM

#

Hello, I am pretty sure I made everything true but I am getting an error like this

#

Error loading "C:\Users\MSI-NB\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.

#

how can I fix this cudnn_cnn_infer64_8.dll thing?

#

I need the AI really fast

wispy thistle Dec 5, 2023, 8:59 AM

#

i believe the file is missing from the directory, can you check ?

#

also make sure you have nVidia GPU Computing Toolkit installed

golden fjord Dec 5, 2023, 9:00 AM

#

It is there let me send a screen shot again

wispy thistle Dec 5, 2023, 9:00 AM

#

ok

golden fjord Dec 5, 2023, 9:00 AM

#

wispy thistle also make sure you have nVidia GPU Computing Toolkit installed

It is

wispy thistle Dec 5, 2023, 9:01 AM

#

winerror126 means the file is not there or it cannot be loaded due to missing dependency !

golden fjord Dec 5, 2023, 9:01 AM

#

It is here

golden fjord Dec 5, 2023, 9:02 AM

#

wispy thistle winerror126 means the file is not there or it cannot be loaded due to missing de...

What can I do about it? I really need the AI ://

wispy thistle Dec 5, 2023, 9:03 AM

#

cudnn_cnn_infer64_8.dl is cudnn v8 , you need to check your pytorch installation ( check compatible cuda and cudnn)

golden fjord Dec 5, 2023, 9:03 AM

#

wispy thistle cudnn_cnn_infer64_8.dl is cudnn v8 , you need to check your pytorch installation...

I am really new at those things, haha

#

How can I check it?

wispy thistle Dec 5, 2023, 9:05 AM

#

can you tell me how did u install pytorch?

#

exact command you used ?

golden fjord Dec 5, 2023, 9:07 AM

#

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

#

used this code

#

@wispy thistle

wispy thistle Dec 5, 2023, 9:08 AM

#

run this cmd

#

nvcc --version

golden fjord Dec 5, 2023, 9:10 AM

#

here it is @wispy thistle

wispy thistle Dec 5, 2023, 9:11 AM

#

can you install cuda 11.8 and cudnn (https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installdriver-windows) and try again

NVIDIA Docs

Installation Guide

This cuDNN 8.9.6 Installation Guide provides step-by-step instructions on how to install and check for correct operation of NVIDIA cuDNN on Linux and Microsoft Windows systems.

golden fjord Dec 5, 2023, 9:12 AM

#

Yeah thank you

#

I also send you a friend request

#

Oh I can't haha

wispy thistle Dec 5, 2023, 9:13 AM

#

yea, ill send u

woven pier Dec 5, 2023, 9:52 PM

#

where can I download the large model from manually?

#

this is taking forever

versed juniper Dec 7, 2023, 2:11 AM

#

Hey guys I'm working on a whisper transcription app using NextJS however I'm getting:

  error: {
    message: 'Could not parse multipart form',
    type: 'invalid_request_error',
    param: null,
    code: null
  }
}```

Can someone please check my code to correct me where I'm wrong?

```import axios from "axios";
import { useRef, useEffect, useState } from "react";

const model = "whisper-1";

export default function UploadPage() {
  const inputRef = useRef();
  const [file, setFile] = useState();
  const [response, setResponse] = useState(null);

  const onChangeFile = () => {
    setFile(inputRef.current.files[0]);
  };

  useEffect(() => {
    const fetchAudioFile = async () => {
      if (!file) {
        return;
      }
      const formData = new FormData();
      formData.append("model", model);
      formData.append("file", file);
      console.log(formData);
      fetch("https://api.openai.com/v1/audio/transcriptions", {
        method: "POST",
        body: formData,
        headers: {
          "Content-Type": "multipart/form-data",
          Authorization: `Bearer youropenaikey`,
        },
      })
        .then((res) => res.json())
        .then((data) => {
          console.log(data);
          setResponse(data);
        })
        .catch((err) => {
          console.log(err);
        });
    };
    fetchAudioFile();
  }, [file]);

  return (
    <div
      style={{
        backgroundColor: "#f2f2f2",
        padding: "20px",
        borderRadius: "8px",
      }}
    >
      Transcribe
      <input
        type="file"
        ref={inputRef}
        accept=".mp3"
        onChange={onChangeFile}
        style={{ display: "block", marginTop: "20px" }}
      />
      {response && <div>{JSON.stringify(response, null, 2)}</div>}
    </div>
  );
}

Any help greatly appreciated!

queen solar Dec 7, 2023, 11:18 AM

#

Hello! I've got a problem with the whisper API not translating from french. Any one has an idea why?

mental grail Dec 7, 2023, 4:59 PM

#

How do I achieve diarization from the output of openAI whisper model?

runic wagon Dec 7, 2023, 8:04 PM

#

Hello, with the new python update for openai, I am having some trouble running some code from the Spyder IDE for transcription. Here is the code:


client = OpenAI(api_key=keyishere)
model_id = 'whisper-1'
audio_file_path = "C:\......."
audio_file = open(audio_file_path, 'rb')

response = client.audio.transcriptions.create(
    model=model_id,
    file=audio_file,
)
transcription_text = response.text
print(transcription_text)```

The code executes, and gives these errors:

 ```File ~\anaconda3\Lib\site-packages\openai\_base_client.py:1096 in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:856 in request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:894 in _request
    return self._retry_request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:966 in _retry_request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:894 in _request
    return self._retry_request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:966 in _retry_request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:908 in _request
    raise self._make_status_error_from_response(err.response) from None

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}```

Looks like it is attempting three times and then cutting it. I definitely have the space in my plan, I think (I have $18 in my free account). I am running this on the free version of ChatGPT. Anyone have an idea what could be wrong? ChatGPT hasn't figured it out yet lol.

kind badge Dec 9, 2023, 10:07 PM

#

I don't know if this is the right place but I have to complain

#

As much as I am delighted with Whisper and the fact that things that were once incomprehensible to me become understandable thanks to this tool, I am equally frustrated

#

I transcribe musicals in Italian and it is a very difficult and unsatisfying job. We know that Whisper has great potential and I know that one day it will be perfect, but why can't it be perfect now...?

near yew Dec 10, 2023, 7:37 PM

#

kind badge I transcribe musicals in Italian and it is a very difficult and unsatisfying job...

Which issues are you experiencing?

kind badge Dec 10, 2023, 7:42 PM

#

various

near yew Dec 10, 2023, 7:53 PM

#

kind badge various

Like?

kind badge Dec 10, 2023, 8:11 PM

#

Not being accurate, simulating, generating various nonsense, I would like to point out that I use tears in the subtitleediting program

spring relic Dec 14, 2023, 6:38 AM

#

has anyone tried out the whisper-large-v3 model? https://huggingface.co/openai/whisper-large-v3 had some questions about how to handle white noise

openai/whisper-large-v3 · Hugging Face

queen solar Dec 15, 2023, 12:14 AM

#

spring relic has anyone tried out the whisper-large-v3 model? https://huggingface.co/opena...

I haven’t but I’m quite interested. Thanks for sharing. That is not an official model tho right?

spring relic Dec 15, 2023, 4:06 AM

#

queen solar I haven’t but I’m quite interested. Thanks for sharing. That is not an official ...

i figured it out. can use this mechanism to figure out if there's voice in the audio clips or not and where they are located: https://github.com/snakers4/silero-vad

GitHub

GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-gr...

Silero VAD: pre-trained enterprise-grade Voice Activity Detector - GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

#

so i basically wrote a small function using this:

import torch

SAMPLING_RATE = 16000 #48000
torch.set_num_threads(1)
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True, onnx=False)
(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

def has_voice(file_name):
audio = read_audio(file_name, sampling_rate=SAMPLING_RATE)
speech_timestamps = get_speech_timestamps(audio, model, sampling_rate=SAMPLING_RATE)
if len(speech_timestamps)>0:
return True
else:
return False

weary skiff Dec 15, 2023, 1:04 PM

#

How the hell I fixed the issue with the timestamps starting from 0 even that the speech is not? also this issue seem to related to further issue when the transcribe afterword's not aligned with the speech.

chilly sluice Dec 15, 2023, 1:53 PM

#

is there any way for me to figure out if a provider is using whisper as their transcriber? any specific quirks with whisper i can test?

weary skiff Dec 15, 2023, 10:00 PM

#

I really need the high level top gear that already play with it. I finish installed and run and all work but I want to make some adjustments to make the output more as srt pro file and not just mouble jumple. sooooo who is with me?

#

my goal is that at the and maybe to use spacy to merge with logic cues and such... I am open to ideas

autumn bolt Dec 16, 2023, 11:06 AM

#

Is there an argument I can use to make the timestamps slightly longer? It cuts out too early.

spring relic Dec 16, 2023, 6:56 PM

#

not sure what you are trying to do?

full sage Dec 17, 2023, 7:22 PM

#

autumn bolt Is there an argument I can use to make the timestamps slightly longer? It cuts o...

you mean like at 30 seconds?\

viscid mauve Dec 19, 2023, 7:50 AM

#

Are we allowed to talk about whisper.cpp in this channel?

spring relic Dec 19, 2023, 10:41 AM

#

only in whispers..

tender rivet Dec 20, 2023, 3:47 PM

#

viscid mauve Are we allowed to talk about whisper.cpp in this channel?

you can ask for help with forks of the original project, but it is just less likely you will get an answer

alpine timber Dec 21, 2023, 2:05 PM

#

Hey. Anyone ever tried to fine-tune Whisper while "boosting" certain words? Not necessarily while training, but could also be helpful during inference.
For example, target is "rotoescoliosis", but prediction ended up being "rot escolosis". I know that on inference time we have access to initial_prompt, but in my use-case, during inference time it's impossible for me to know what will be inserted in the initial_prompt.
tldr; I have a corpus specific to my context, and I want to "boost" the words in it during training or inference, without initial_prompt.
I'd be very thankful if someone could point me in the right direction!

robust basalt Dec 22, 2023, 4:01 AM

#

Hey, is there anyone here who has worked with the whisper API in node.js? I'm trying to understand if the OpenAI node library supports it or not and so far, all I'm finding are a bunch of outdated code examples online. The official API documentation only seems to have a python example for sending audio to it... Am I missing something?

lost thunder Dec 23, 2023, 1:16 PM

#

robust basalt Hey, is there anyone here who has worked with the whisper API in node.js? I'm tr...

near yew Dec 24, 2023, 11:57 AM

#

robust basalt Hey, is there anyone here who has worked with the whisper API in node.js? I'm tr...

Yeah I've got it to work in node.js

#

But I've done it through the HTTP API, not the library

#

Here's how I've done it:

require('dotenv').config()
const axios = require('axios')

const { Blob } = require('buffer')
const buffer = Buffer.from(YOUR_FILE_DATA_HERE)
let data = new FormData()

data.append('file', new Blob([buffer]), 'audio.mp3')
data.append('model', 'whisper-1')
const req = {
    method: 'POST',
    url: 'https://api.openai.com/v1/audio/transcriptions',
    headers: {
        'Authorization': `Bearer ${process.env.API_KEY}`
    },
    data: data
}
let transcription = await axios.request(req)
transcription = transcription.data.text
console.log(transcription)

solar fern Dec 26, 2023, 10:03 PM

#

queen solar I haven’t but I’m quite interested. Thanks for sharing. That is not an official ...

it is

thorn raft Dec 27, 2023, 11:00 AM

#

lost thunder

Im not sure if an ios screenshot on safari a sufficient response (but I don’t know @jdo330 what IDE do you use)?

lost thunder Dec 27, 2023, 11:22 AM

#

thorn raft Im not sure if an ios screenshot on safari a sufficient response (but I don’t kn...

It was to tell that he can switch language example by clicking on the arrow.

zenith shore Dec 28, 2023, 8:33 AM

#

#

I am unable to log in using Google Mail

#

why？

ocean arch Dec 28, 2023, 2:15 PM

#

alpine timber Hey. Anyone ever tried to fine-tune Whisper while "boosting" certain words? Not ...

Tamper with the background noise, split it into like thirty sec intervals using a while loop , use gpt api to read it and adjust for grammar

near yew Dec 28, 2023, 7:15 PM

#

alpine timber Hey. Anyone ever tried to fine-tune Whisper while "boosting" certain words? Not ...

Yeah like s.k.6.9 said, you could use a chat model like GPT-3.5-turbo or GPT-4 to correct the grammar and give it certain words to correct

summer timber Dec 29, 2023, 12:11 AM

#

Has anyone here had any experience training a custom whisper model? I'd like to explore some stuff, for example training it on SDH subtitles for [EXPLOSION] type notation, or on music to better pick out lyrics out of songs

#

How would the data preparation work, etc

#

I really hope it's easy like loading audio + transcription in and have it go at it

near yew Dec 29, 2023, 10:16 AM

#

summer timber I really hope it's easy like loading audio + transcription in and have it go at ...

I don't think so, it's not like you can "fine-tune" a whisper model right now, and I don't think that supporting fine tuning for whisper is OpenAI's priority

alpine timber Jan 2, 2024, 6:02 PM

#

alpine timber Hey. Anyone ever tried to fine-tune Whisper while "boosting" certain words? Not ...

Update to this: I'm trying to make whispercpp's "grammar" function to work in my use-case: https://github.com/ggerganov/whisper.cpp/pull/1229
I thank all the suggestions you guys gave me. I thought about using a GPT model to correct the words, but I wanna try other options first.

alpine timber Jan 2, 2024, 6:08 PM

#

summer timber Has anyone here had any experience training a custom whisper model? I'd like to ...

It's possible: https://huggingface.co/blog/fine-tune-whisper
Just not "officially" through the OpenAI's API, as far as I know. You can use HuggingFace Transformers. And yeah it's easy.

summer timber Jan 2, 2024, 6:17 PM

#

alpine timber It's possible: https://huggingface.co/blog/fine-tune-whisper Just not "officiall...

Hmm it calls for an ASR dataset, which iirc is time-aligned per word. What I would have, realistically speaking, is an mp3 of a song and plaintext lyrics

#

Still doable?

#

oh nvm, it doesnt call for time aligned stuff

#

but it might still be a challenge to split the songs into manageable chunks, or could i train on the full 3-minute ish audios?

alpine timber Jan 2, 2024, 6:41 PM

#

summer timber but it might still be a challenge to split the songs into manageable chunks, or ...

As far as I know, you need to split the audios to at most 30 seconds, I'm unsure if it does so automatically for longer audios

summer timber Jan 2, 2024, 7:21 PM

#

alpine timber As far as I know, you need to split the audios to at most 30 seconds, I'm unsure...

looks like i'm going to manually prepare the data

#

not sure how much data i need for a decent result though

#

considering i'd just be training it more on english, and its already rather good at picking out lyrics from songs

alpine timber Jan 2, 2024, 7:23 PM

#

summer timber not sure how much data i need for a decent result though

That's usually very hard to determine :/

small juniper Jan 2, 2024, 7:51 PM

#

Has anyone here had any experience

#

I don't think so, it's not like you can

#

not sure how much data i need for a

warm pulsar Jan 3, 2024, 12:05 PM

#

Is it possible to increase the volume (make the agent speak louder) when using text to speech?

#

or change the affect/feeling of the text?

woven pier Jan 5, 2024, 4:12 PM

#

I'm running whisper locally and am wondering how I can get it to segment more. I'm trying to write subtitles for videos, but the segments with text are too grouped up

#

like I got two segments of 30 seconds each

warm pulsar Jan 5, 2024, 5:30 PM

#

woven pier I'm running whisper locally and am wondering how I can get it to segment more. I...

When you say run "whisper locally" you mean running it using the API right? So transcription still happens on OpenAI servers. Or can you do the computations on your computer as well?

woven pier Jan 5, 2024, 6:13 PM

#

warm pulsar When you say run "whisper locally" you mean running it using the API right? So t...

computations are on my computer

#

I've got a 3090 so I can run two instances of large

#

Well, just barely run two

#

I can't do anything else

warm pulsar Jan 5, 2024, 6:51 PM

#

woven pier computations are on my computer

Can you please send a tutorial link that shows how to run whisper offline on your own computer?
Or is it this tutorial: https://github.com/openai/whisper/discussions/1463 ?

woven pier Jan 5, 2024, 6:54 PM

#

warm pulsar Can you please send a tutorial link that shows how to run whisper offline on you...

I just did pip install whisper

warm pulsar Jan 5, 2024, 6:55 PM

#

woven pier I just did pip install whisper

Ok. Thanks

woven pier Jan 5, 2024, 6:55 PM

#

when you use a model for the first time it will download it. that download will be slooooooow

warm pulsar Jan 5, 2024, 6:56 PM

#

woven pier I'm running whisper locally and am wondering how I can get it to segment more. I...

Btw, what I understand is that you need to segment your subtitles but sometimes you end up with a lot of text which doesn't relate to the specific scene of the video you are trying to transcribe?

woven pier Jan 5, 2024, 6:56 PM

#

yeah

#

it'll show up like 5 seconds early

warm pulsar Jan 5, 2024, 7:01 PM

#

I would try this then:

Acquire transcription of the video.
Segment the transcription of the video based on the sentences which you just transcribed. For example, do string split "." to figure out where a sentence starts and ends.
So you will now have segmented sentences. Using these sentences, go back to your audio to find out where in the audio file itself a sentence could be possibly starting an ending (short pauses maybe) or somehow calculate the approximate duration of a sentence using tts (or maybe text based ChatGPT itself). Using this duration, segment the audio file, and then split your audio file and send it back to your whisper model.

#

@woven pier

#

I think there would be specific methods to figure out where speech starts and ends.

warm pulsar Jan 5, 2024, 7:07 PM

#

woven pier when you use a model for the first time it will download it. that download will ...

How big is it? I am assuming it will be downloaded to APPDATA (Windows) and I dont have much space in C: But I have plenty of space in other drives.

woven pier Jan 5, 2024, 7:09 PM

#

warm pulsar How big is it? I am assuming it will be downloaded to APPDATA (Windows) and I do...

#

it didn't download faster than like 70 kbps

#

and I have gigbit internet

warm pulsar Jan 5, 2024, 7:10 PM

#

woven pier

Ahh, I miss those 2008 days where I used to have 70kbps internet

woven pier Jan 5, 2024, 7:10 PM

#

don't you worry, they've got your back

#

👏

wheat kindle Jan 7, 2024, 10:17 AM

#

hey do you guys have any news on the opening of the GPT marketplace ?

livid mauve Jan 7, 2024, 4:40 PM

#

wheat kindle hey do you guys have any news on the opening of the GPT marketplace ?

All updates will be posted in #announcements 😄

gray spoke Jan 8, 2024, 11:26 AM

#

Hey guys,
There is a way to transcribe an audio with 1hr+ with whisper API without cutting the audio into parts?

#

I already tried to compress btw, but it is still too big

#

(more than 26MB)

gray spoke Jan 9, 2024, 11:33 AM

#

I compressed to 32k, monoaudio and 22khz and I got it to 25MB

#

But when I try to use whisper it loads infinitely

#

I stayed with the PC turned on for one day and it did not finish dalle_tired

brisk nova Jan 9, 2024, 10:33 PM

#

hey, am I allowed to share a link for my GPT here?

livid mauve Jan 10, 2024, 12:06 AM

#

brisk nova hey, am I allowed to share a link for my GPT here?

#1171489862164168774 is the place for it happy_avocado

brisk nova Jan 10, 2024, 12:45 AM

#

Ok, thank you

sudden adder Jan 10, 2024, 8:00 PM

#

Does anyone know how to change the words per line for whisper's speech to text?

brave kestrel Jan 11, 2024, 12:14 AM

#

gray spoke Hey guys, There is a way to transcribe an audio with 1hr+ with whisper API witho...

Split it with ffmpeg, send in batches, and return the text. You could even parallelize it

#

Are you concerned about edge cases where it splits during a word?

fathom tangle Jan 11, 2024, 8:13 AM

#

whisper token usage seems tom be fairly expensive compared to generations. I'm struggling to see the use case compared to just native text to speech for the cost. Am I doing something wrong?

brave kestrel Jan 11, 2024, 3:28 PM

#

https://openai.com/pricing

Pricing

Simple and flexible. Only pay for what you use.

#

Model    Usage
Whisper    $0.006 / minute (rounded to the nearest second)```

#

gpt-4    $0.03 / 1K tokens    $0.06 / 1K tokens```

#

in theory it's far cheaper than gpt-4

#

turbo models are super cheap though

Model    Input    Output
gpt-3.5-turbo-1106    $0.0010 / 1K tokens    $0.0020 / 1K tokens```

#

Also whisper is STT not TTS. Openai does have TTS models as well

languid breach Jan 11, 2024, 7:08 PM

#

Anybody can help me out, I'm using fastapi and I want to not write to file to transcribe my file-upload. I've spent hours but it I ended up going back to shutil cause in memory kept bugging out on filetype. Anybody have a snippet I could use?

languid breach Jan 11, 2024, 8:08 PM

#

https://community.openai.com/t/unrecognized-file-format-error-whisper-bytesio-cant-write-to-disk/582893/5

gray spoke Jan 12, 2024, 6:00 PM

#

brave kestrel Are you concerned about edge cases where it splits during a word?

Yes, and the timestamps bug out

#

I could just try to fix it manually, but I would like to do the process completely automatic

brave kestrel Jan 12, 2024, 7:20 PM

#

Probably stuck with splitting it up

#

https://ffmpeg.org/ffmpeg-filters.html#silencedetect

#

If you split the ffmpeg bits into smaller pieces and split off transcription you'll already be closer

weary island Jan 15, 2024, 7:46 PM

#

hey folks - i have a very simple Q - forgive the noobness -

assume I am recording a lecture during a classroom, while the professor is speaking - i want to trigger OpenAI to summarize the lecture so far.

so I want to trigger it by saying something like - "hey {assistant} can you summarize the notes so far?"

how can i do so?
2.how do i remove that prompt as its not part of the lecture?

small juniper Jan 16, 2024, 4:36 PM

#

hey folks - i have a very simple Q -

dapper bridge Jan 18, 2024, 5:00 PM

#

hey, i'm having a bit of a problems with whisper-ctranslate2 (and normal whisper as well)
in the first minutes of a wav file, it spits out really long statements and after a while it starts to split it for some reason
i can't force it to do it one way or the other, any reason why it's happening and how to fix it?

raw oriole Jan 19, 2024, 4:05 AM

#

Is whisper the same price when converting the audio from an mp4 to text and when converting like an audio recording

lone scaffold Jan 19, 2024, 4:56 PM

#

Does anyone know of a way to using TTS (text to speech), but instead using a plan text, using an SRT subtitle? So essentially dubbing?

Basically I have a videos with SRT transcription, but wanna generate a dub using OpenAI's text to speech. Given that SRT includes timestamps, I don't want the timestamps to be read out loud, but used as guides for timing.

still moon Jan 21, 2024, 2:25 PM

#

@lone scaffold commercial thing?

#

I've done something similar for existing_text + whisper_transcription alignment in a recent project.

#

it worked, well, perfectly. (at least I encountered zero failures). there are timing issues, by the way, with whisper's timestamps, so use the json output with word timings. lmk if you need help.

formal bramble Jan 21, 2024, 5:50 PM

#

Does anyone know if there’s any iOS app, that is basically a keyboard alternative, but with whisper functionality built-in?

lone scaffold Jan 21, 2024, 6:07 PM

#

still moon <@736187863926046720> commercial thing?

it's for my own use actually, I found a really helpful course but it was in Chinese, but it had proper transcription, but I wanted to see if I could turn the SRT transcription with timing into a speech with OpenAI's TTS. How did you make TTS have delays that correspond with the SRT subtitles?

glad seal Jan 22, 2024, 6:03 AM

#

formal bramble Does anyone know if there’s any iOS app, that is basically a keyboard alternativ...

Sadly I don't but there is one on Android. It's pretty basic stuff and I can't find a GitHub repo but looks like "just some guy" free apps. I have used it for a good few months and also double checked security doing packet capture. Nothing was sent remote. The same guy has a great whisper powered notepad app too.
https://play.google.com/store/apps/dev?id=6674916867778158495

Android Apps by Kai's Soapbox on Google Play

Kai Makes Stuff

#

I'm the only one to have reviewed the note one, which is actually really nice. The keyboard is actually really good too. Just set the timeout to 30 seconds on the recordings because at the very end of each recording before transcription, there will be a little break in the audio that might make you miss a word. Really nice stuff. Works better than Google's voice typing even on my Pixel 6 accuracy wise. The note app has model selection from tiny.en up to base. I don't know what inference backend is used. You can also run Whisper.cpp in Termux on Android just installing through pip on Python 3. All you need is

pip install whisper-cpp-python

glad seal Jan 22, 2024, 6:09 AM

#

formal bramble Does anyone know if there’s any iOS app, that is basically a keyboard alternativ...

Found you one! Not local unfortunately

Edit -> While this isn't local, it does look to be trustworthy.

https://whispermemos.com/

Whisper Memos

Whisper Memos transcribes your iOS voice memos and sends you an email with the transcription a few minutes later. It is based on OpenAI's new Whisper technology.

formal bramble Jan 22, 2024, 6:19 AM

#

glad seal Found you one! *Not local unfortunately* *Edit* -> *While this isn't local, it ...

This looks cool, I’ll check it out. Thanks! 🙂 That said I was looking specifically for a keyboard like something that I could use as input to any text field in any app without having to switch between apps.

still moon Jan 23, 2024, 8:30 AM

#

lone scaffold it's for my own use actually, I found a really helpful course but it was in Chin...

Not delays. just getting finer timings out of it using the word-based timings in the .json

#

#

that's from a court case.

#

(I'm kidding)

#

notice the word timings

#

I don't exactly know how accurate it is; you might want to evaluate it before you go too far.

#

Here's a script I wrote (sort of.. me and chatgpt since it's still a lot faster even for little things like this, imo) .. that converts the .json to audacity/tenacity labels.txt you can import (File -> Import -> Labels):

#

#!/usr/bin/env python3
import argparse
import json

def format_float(value, sig_figs):
    return f"{value:.{sig_figs}f}"

def process_json(json_data, options):
    labels = []

    for segment in json_data['segments']:
        if not options.no_full and segment['text']:
            label_text = segment['text']
            if options.probs:
                label_text = f"({segment['avg_logprob']}/{segment['no_speech_prob']}) {label_text}"
            if options.token_full:
                label_text = f"[{len(segment['tokens'])}s] {label_text}"
            labels.append(f"{format_float(segment['start'], options.sig_figs)}\t{format_float(segment['end'], options.sig_figs)}\t{label_text}\n")

        if not options.no_words:
            for word in segment['words']:
                label_word = word['word']
                if options.probs:
                    label_word = f"({word['probability']}) {label_word}"
                if options.token_words:
                    label_word = f"[{word['token']}w] {label_word}"
                labels.append(f"{format_float(word['start'], options.sig_figs)}\t{format_float(word['end'], options.sig_figs)}\t{label_word}\n")

    return labels

📎 message.txt

#


def main():
    parser = argparse.ArgumentParser(description='Convert Whisper JSON to Audacity Labels')
    parser.add_argument('file', type=str, help='JSON file to process')
    parser.add_argument('-o', '--output', type=str, default=None, help='Output filename')
    parser.add_argument('-nf', '--no-full', action='store_true', help='Disable output of full text string label')
    parser.add_argument('-nw', '--no-words', action='store_true', help='Disable output of individual word labels')
    parser.add_argument('-sf', '--sig-figs', type=int, default=3, help='Max significant float digits')
    parser.add_argument('-p', '--probs', action='store_true', help='Include probabilities in labels')
    parser.add_argument('-tw', '--token-words', action='store_true', help='Include token info in word labels')
    parser.add_argument('-tf', '--token-full', action='store_true', help='Include token info in full text labels')
    args = parser.parse_args()

    with open(args.file, 'r') as file:
        json_data = json.load(file)

    labels = process_json(json_data, args)

    output_file = args.output or args.file.rsplit('.', 1)[0] + '.transx.txt'
    with open(output_file, 'w') as file:
        file.writelines(labels)

    print(f"Labels written to {output_file}")

if __name__ == "__main__":
    main()

#

use with -nf so it'll only output the word tokens (otherwise it outputs the long spans of text AND duplicates it all as individual words too)

#

(whisper-json-to-labels someaudio.json will make someaudio.transx.txt)

weary saddle Jan 24, 2024, 9:45 AM

#

Hi, I was wondering what kinds of audio preprocessing do you guys do before sending the audio file to Whisper API to get the best results?
For example, currently I'm doing the following steps to preprocess the audio file:

def preprocess_audio(audio_clip: AudioSegment, audio_file_path: str, channels=1, bitrate=16000):
    audio_clip = trim_silence_pyannote(audio_file_path, audio_clip)
    audio_clip = effects.normalize(audio_clip)
    audio_clip = audio_clip.set_channels(channels)
    audio_clip = audio_clip.set_frame_rate(bitrate)
    return audio_clip

pseudo cove Jan 24, 2024, 10:06 PM

#

anyone know if anyone has integrated whisper on a telephony platform, like voximplant has with dialogflow? or if someone is known to be working on that?

languid iron Jan 25, 2024, 9:47 AM

#

weary saddle Hi, I was wondering what kinds of audio preprocessing do you guys do before send...

Good question… do you notice any difference in quality between different formats? I do merge the channels because diarization requires it to be one, but I don’t do and other processing yet

weary saddle Jan 25, 2024, 9:57 AM

#

So far I haven't experimented with different audio formats yet because our inputs are pretty much just mp3 files.

languid iron Jan 25, 2024, 1:09 PM

#

I get all sorts of formats, journalists transcribing raw interviews from whatever device they use, sometimes it’s just the iphone‘s default record app. I’ll try to convert everything to a tbd common format

alpine timber Jan 25, 2024, 2:51 PM

#

weary saddle Hi, I was wondering what kinds of audio preprocessing do you guys do before send...

Other than changing bitrate and number of channels like you do, I normally don't do any preprocessing besides trimming silence since Whisper can be prone to hallucinations on silence.

weary saddle Jan 25, 2024, 2:54 PM

#

I've noticed it hallucinates and sometimes repeats words on silences, overlapping speech, noise, anything that technically isn't human speech.

#

Hence I've been wondering if anybody managed to solve this via preprocessing the audio.

small juniper Jan 25, 2024, 3:18 PM

#

anyone know if anyone has integrated

stone quiver Jan 25, 2024, 4:02 PM

#

Thought it'd be neat to share that I used Whisper-JAX to transcribe 428 hours of audio in just under 6

gloomy mist Jan 26, 2024, 4:00 PM

#

Hey, I'm currently using faster-whisper for TTS processing. However, my 15-second audio takes almost 2 seconds to process on my MacBook M2 with 16 GB RAM. Unfortunately, this is too long. I would like it to be closer to 1 second. Is there any way I can speed up the process? I would prefer not to use a model worse than 'small'. I will mainly be transcribing German audio.

from faster_whisper import WhisperModel
import datetime

model_size = "small"

model = WhisperModel(model_size, device="cpu", compute_type="int8")

start_time = datetime.datetime.now()
segments, info = model.transcribe("audio.mp3", beam_size=5)

end_time = datetime.datetime.now()

print(
    "Detected language '%s' with probability %f"
    % (info.language, info.language_probability)
)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

print("Transcription took: ", end_time - start_time)```


```Transcription took:  0:00:02.104246```

blazing cave Jan 27, 2024, 11:59 PM

#

I plan on making my audio compression code available as an api.

wondering if there is an appetite for it (I use in my own app)

it usually returns an mp3 link that is 10% of original size.

UI version here:
shownotes.io/crush

If I get enough interest will make the api version.

autumn bolt Jan 28, 2024, 8:10 AM

#

how do i cap the amount of text/time per line in the transcript? for example I want

[2.0->4.0]  consectetur adipiscing elit.```
but its giving
```[0.0->4.0] Lorem ipsum dolor sit amet, consectetur adipiscing elit.```

autumn bolt Jan 28, 2024, 10:34 AM

#

replacement question! is there a way to get time per individual word

autumn bolt Jan 28, 2024, 12:58 PM

#

nevermind

languid iron Jan 28, 2024, 6:49 PM

#

autumn bolt replacement question! is there a way to get time per individual word

Yes isn’t there a parameter „timestamps_per_word“ or something like that?

sturdy jolt Feb 2, 2024, 11:32 AM

#

Hello, we are having issues accessing the GPT-4 API for some reason it's not available on our account even though the account is paid and we have spent money on the GPT-3 API (and also, for some reason, we haven't been charged this month). We have been working on a SAAS project for real estate brokers for a long time, have made a custom plugin for GPT, etc., but unfortunately, we cannot use it. Can you help us with what to do?

modern ember Feb 2, 2024, 4:30 PM

#

import os
import pygame
import speech_recognition as sr



def speak(text):
    voice = "en-US-ChristopherNeural"
    command = f'edge-tts --voice "{voice}" --text "{text}" --write-media "MichealOutput.mp3"'
    os.system(command)
    pygame.init()
    pygame.mixer.init()
    
    try:
        pygame.mixer.music.load("MichealOutput.mp3")
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(10)

    except Exception as e:
        print(e)

    finally:
        pygame.mixer.music.stop()
        pygame.mixer.quit()

def take_command():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration=0.5)
        print("Listening...")
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print("Recognizing...")
        query = r.recognize_sphinx(audio, language='en-us')
        print("hi")
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))

    return query

speak("hello, i am Micheal, your virtual assistant; How can i help you today")    
query = take_command()
print(query)

#

i want to implement whisper to this

#

this is a speech to text code

#

i am using sphinx

#

but sphinx isnt accurate

#

so can anyone help me

spark notch Feb 2, 2024, 11:39 PM

#

modern ember ```py import os import pygame import speech_recognition as sr def speak(text)...

Happy Friday https://chat.openai.com/share/b6606aff-ce32-4117-8d80-7ba421321ad8 ```import os
import pyaudio
import wave
import openai

Set your OpenAI API key

openai.api_key = "your_openai_api_key"

def record_audio(filename, duration=5):
"""Records audio from the microphone and saves it as a WAV file."""
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1024

audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)

print("Recording...")

frames = []

for _ in range(0, int(RATE / CHUNK * duration)):
    data = stream.read(CHUNK)
    frames.append(data)

print("Finished recording")

stream.stop_stream()
stream.close()
audio.terminate()

with wave.open(filename, 'wb') as wf:
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))

def transcribe_audio(filename):
"""Sends the audio file to Whisper for transcription."""
with open(filename, "rb") as audio_file:
transcript = openai.Audio.transcribe("whisper-1", audio_file)
return transcript["text"]

def main():
audio_filename = "recorded_audio.wav"
record_audio(audio_filename)
transcription = transcribe_audio(audio_filename)
print("Transcription:", transcription)

if name == "main":
main()

ChatGPT

A conversational AI system that listens, learns, and challenges

hot belfry Feb 3, 2024, 3:40 AM

#

I am so lost. Is there a way to use Whisper without downloading anything? They say is in the api, but I cant find the [whisper] model

woven pier Feb 4, 2024, 4:04 AM

#

weary saddle Hi, I was wondering what kinds of audio preprocessing do you guys do before send...

From microphone to transcription: normalize the data, flatten it, make it 16000 with librosa, output it to a wave file to verify the audio was manipulated properly. If the audio is too slow tell librosa it's 2x faster than it is.

misty ginkgo Feb 7, 2024, 8:28 PM

#

How can I use OpenAI Whisper in JavaScript/TypeScript?

autumn bolt Feb 7, 2024, 10:59 PM

#

I’m a subscriber, where’s my beta version with call option to other gpts and ability to locate and attach third party APIs? Can’t find it. GPT4 knows nothing of it…

tender rivet Feb 8, 2024, 12:01 AM

#

autumn bolt I’m a subscriber, where’s my beta version with call option to other gpts and abi...

click on "Explore GPTs" then on "Create"

autumn bolt Feb 8, 2024, 7:18 PM

#

tender rivet click on "Explore GPTs" then on "Create"

I’ve created many already. I m seeking the new higher functions.

tender rivet Feb 8, 2024, 7:18 PM

#

no idea of what you are talking about

#

the option to make API calls is under the "Actions" section of the custom gpt configuration

autumn bolt Feb 9, 2024, 2:26 AM

#

Yes, but there’s a dedicated Actions GPT that’s meant to help with that.

tender rivet Feb 9, 2024, 12:04 PM

#

there are some forked implementations of whisper that claim to run faster than the original and on CPU

#

I never tested, but the idea sounds pretty cool

pliant mural Feb 13, 2024, 4:34 PM

#

Hello !
So I have a Mac laptop from 2017 (MacOS Monterey 12.7.1). I have installed Python 3, Pytorch, ffmpeg, pip, and whisper via pip. Every one of these installation processes seem to have worked well and have been fully complted. But when I lauch "whisper [path/to/audio/file.mp3]" it says "zsh: permission denied: whisper". I have tried to run my terminal with root permission, but even then it says "-sh: whisper: command not found". Can anyone help please?

full surge Feb 14, 2024, 7:27 PM

#

172.17.0.1 - - [14/Feb/2024 20:22:08] "POST /asr?task=translate&language=da&output=srt&encode=false HTTP/1.1" 404 -

Why ?

inner quarry Feb 15, 2024, 1:37 PM

#

Bruh, there any paperwork on putting whisper into a executable pythons great and all but isn't liable to be easily distributed to less advanced computer users.

brave kestrel Feb 16, 2024, 8:24 PM

#

What do you mean

brave kestrel Feb 16, 2024, 8:26 PM

#

full surge 172.17.0.1 - - [14/Feb/2024 20:22:08] "POST /asr?task=translate&language=da&outp...

Can you give us more context?

brave kestrel Feb 16, 2024, 8:44 PM

#

inner quarry Bruh, there any paperwork on putting whisper into a executable pythons great and...

You're wanting to run whisper as a commandline tool or gui?

#

I think there's a self hosted version but it has requirements

#

Check out github

inner quarry Feb 16, 2024, 9:13 PM

#

brave kestrel You're wanting to run whisper as a commandline tool or gui?

I found the issue and already fixed it. When using pyinstaller, I had it installed via pip.

You have to install via conda for numpy to work.

brave kestrel Feb 16, 2024, 9:29 PM

#

Oh you literally meant creating an executable with Python

inner quarry Feb 16, 2024, 9:43 PM

#

Yeahhhh

unborn fable Feb 18, 2024, 9:51 PM

#

Anyone got any advice for bulk transcribing audio files? I was thinking renting a GPU server for a couple of hours and bulk processing with a self hosted model would be most cost effective. Also wondering if CPU models are catching up yet as these would be even cheaper.

inner quarry Feb 18, 2024, 10:33 PM

#

unborn fable Anyone got any advice for bulk transcribing audio files? I was thinking renting ...

I use the tiny model to transcribe live audio you can easily just use whisper locally and letting it run on a older laptop

brave kestrel Feb 19, 2024, 1:13 AM

#

Does this work on android? Disclaimer, voice is sent to OpenAI and temp stored until whisper transcribes http://switchmeme.com/lel2

unborn fable Feb 19, 2024, 3:26 PM

#

inner quarry I use the tiny model to transcribe live audio you can easily just use whisper lo...

That's great, how effective / accurate have you found this?

inner quarry Feb 19, 2024, 3:44 PM

#

unborn fable That's great, how effective / accurate have you found this?

So so, I would use a larger model for it

unborn fable Feb 19, 2024, 5:07 PM

#

inner quarry So so, I would use a larger model for it

Yeah, hence my problem I have to use the larger models for accuracy. Did you try tiny.en at all?

inner quarry Feb 19, 2024, 5:09 PM

#

unborn fable Yeah, hence my problem I have to use the larger models for accuracy. Did you try...

i did. i'm just finishing my .exe file to allow for someone to drag and drop audio into a program and getting a text document back.

#

a typical pc can run the small.en model

unborn fable Feb 19, 2024, 5:12 PM

#

PC without latest and greatest GPU?

inner quarry Feb 19, 2024, 5:17 PM

#

So I would think I got a 100 dolla windows notebook here and I could run it. it was short audio

#

but it ran.

#

plus all the whisper running i've done has been cpu

unborn fable Feb 19, 2024, 5:20 PM

#

Ok, thanks. I suppose I need to try it 😀

brave kestrel Feb 19, 2024, 5:41 PM

#

anyone else using the javascript MediaRecorder API to record and send to whisper? on iOS the mediaRecorder.onstop gets triggered too soon

brave kestrel Feb 19, 2024, 6:08 PM

#

lol wow... just increasing the volume sent to whisper worked

#

I used pydub

ripe shuttle Feb 22, 2024, 3:09 AM

#

Is anyone having trouble recording audio from the browser on an iphone and sending the audio file to whisper api? It is only transcribing the first second of the recording and other times it will return random characters

misty ginkgo Feb 22, 2024, 4:18 PM

#

Is the functionality to identify whether a user stops talking already provided in Whisper, like it is already in the ChatGPT app?

still light Feb 24, 2024, 9:51 AM

#

I'm new here👀 what is whisper?

autumn bolt Feb 25, 2024, 2:20 AM

#

Hi, is whisper capable of listening to the audio of a lecture recording and generating a clearer voice/new voice that can be dubbed over the video? My lecturer is very hard to understand due to his acent. It would be great if whisper can help!

surreal moon Feb 25, 2024, 1:19 PM

#

I'm not a developer, but the code used by the program I use in Windows to use Whisper for dictation in windows look is very straightforward. I imagine you would need to implement something to stop each recording before the 25 MB limit is reached and automatically start a new recording.

#

ChatGPT could almost certainly ask you how to adapt https://github.com/braden-w/whispering/blob/main/apps/web-desktop-app/src/lib/transcribeAudioWithWhisperApi.ts for your needs.

GitHub

whispering/apps/web-desktop-app/src/lib/transcribeAudioWithWhisperA...

Contribute to braden-w/whispering development by creating an account on GitHub.

#

The last time I looked in to natural sounding text to voice, they all had per unit pricing to use their APIs - but I have no idea how much it would cost etc

spice iris Feb 25, 2024, 4:10 PM

#

Hey,
Any idea how to get "whisper" to transcribe exactly what the text says?

For example, when the voice contains even a mild "insult" (You can shut up), whisper generates completely random text.

For a voice assistant, this is quite problematic. Any ideas? 🙂

tawny narwhal Feb 26, 2024, 2:15 PM

#

Hey
Anyone using https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperForConditionalGeneration ?

#

While trying to create the processor, memory goes insanely up, even with a very short audio, like 3 sec

#

It seems to be a glitch, but I can't find it. If anyone has seen this in the past let me know 😅
I'll try to create a small repo reproducing it

tawny narwhal Feb 26, 2024, 11:08 PM

#

Got it! I was using stereo audio. It needs to be mono 🤦

half terrace Feb 27, 2024, 10:28 AM

#

thanks for documenting your findings

muted axleBOT Feb 28, 2024, 7:02 PM

#

<:book_icon:1171408210398289941> `` Rule 1 `` Be respectful.

Treat others the way you would like to be treated, and assume best intentions. Don’t harass or attack others, and don’t engage in hateful or generally malicious behavior (e.g. sexism, racism, homophobia, etc.). Keep the negativity to a minimum.

limber cedar Feb 29, 2024, 7:37 PM

#

Why is whisper inference time not constant if the audio is always padded to 30s? For example, 30 seconds of input audio takes roughly 5x the inference time of 5 seconds of input audio.

surreal moon Mar 1, 2024, 3:19 PM

#

Do you sir have a time machine? Because that would be an awesome way to make money through crypto!

#

I'm absolutely loving using Whisper through the API for dictation in Windows. However, I'm worried that using it for anything longer, like, is going to get expensive pretty quickly. Does anybody know if there are any other options just for Whisper in terms of cost? Because I kind of just was assuming that just using Whisper just for my own individual use was never really going to add up. But I think I might be wrong about that given I used nearly $1 on a particular day, and I still haven't' used it for any longer form writing yet. It might almost be worth it to use something like Tasker or an equivalent in Windows as some sort of hacky workaround to get it under the monthly subscription using ChatGPT?

#

The other possibility is that on that particular day I was getting used to using the Windows program and I might have been leaving a lot of dead spaces in between words and sentences? It's supposed to be about two cents a minute, correct?

broken belfry Mar 4, 2024, 12:49 AM

#

There are locally run versions of whisper you could look into.
I can not vouch for the quality/speed or anything but have seen them on my "travels"

gilded oasis Mar 4, 2024, 2:39 PM

#

is this the proper place to ask about faster-whisper and implementations of it?

shrewd rose Mar 6, 2024, 6:22 PM

#

gilded oasis is this the proper place to ask about faster-whisper and implementations of it?

Yup 🙂

still light Mar 7, 2024, 12:58 PM

#

How to use whisper?

shrewd rose Mar 8, 2024, 9:27 AM

#

still light How to use whisper?

Heyy, you can either use whisper trought the openai api or directly on your machine since the model is open source: https://github.com/openai/whisper . Let me know if you have any other quesions.

shrewd rose Mar 9, 2024, 10:21 PM

#

Hello there, I have been having this issue lately, where the model, I am using base, would completely hallucinate stuff. My transcripts are quite long, and I had this issue where at some point, someone said "I'm hungry." and then the model started just repeating "I'm hungry." all over until the end of the transcript. If aynone could help it would be much appreciated.

#

shrewd rose Mar 11, 2024, 9:34 PM

#

Well I guess whisper was very hungry

#

anyways fixed it by setting vad=True

mellow pewter Mar 13, 2024, 9:44 AM

#

is there an issue with the model requests? i keep getting request timeouts from my python code

#

im using my company's openai acc with its own subscription and key and everything and it worked fine in the past.
i just keep on getting erorr codes such as 503 or "request timeout" randomly

surreal moon Mar 14, 2024, 1:16 AM

#

Has anybody else found that personal use of Whisper for dictation is extremely cheap? I was a bit worried about using it so much in Windows, given how easy it is to just, one hot key to start talking and one hot key to stop. And now that there's an Android keyboard, same issue, but I've been using it a lot. And I'm only at 93 cents for the month so far.

tranquil marten Mar 14, 2024, 9:47 PM

#

anyone unable to get whisper to generate any files after transcribing?

#

no .srt .txt or anything

woeful verge Mar 15, 2024, 12:45 AM

#

Hello guys.
I am currently trying to get whisper up and running on my machine.
Cuda is avalable and the standard device the model chooses to run on.
I checked with whisper --help
While using the CPU (AMD Ryzen 9 7900X3D) works like a charm, using the GPU (AMD Radeon RX 7900 XTX) doesnt work at all.
The following code:

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(f' The text in video: \n {result["text"]}')

raises this error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
    While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('HIP error: invalid device function\nHIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing HIP_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_HIP_DSA` to enable device-side assertions.\n',).

I dont know how to fix this.

sudden aspen Mar 15, 2024, 1:55 AM

#

Are you on linux ?

#

That's a really specific error i'm sorry

#

1- what is HIP
2- why does it say kernel
3- i dont think its a hardware issue

woeful verge Mar 15, 2024, 2:09 AM

#

sudden aspen Are you on linux ?

Yes I am on Ubuntu 22.04.
HIP is ROCms C++ Dialect. Cool. https://pytorch.org/docs/stable/notes/hip.html
Apart from that I dont have any clue

surreal moon Mar 15, 2024, 3:21 AM

#

If my phone can run the 'base' 74M model locally, sure my trusty GTX 750ti will make light work of the larger models? Right guys?

dim birch Mar 17, 2024, 11:02 AM

#

hey guys, does anyone know how to get the Whisper python API/module to add punctuation consistently to its .words output in verbose_json mode when transcribing? it's a short audio recording & i put in the exact script (including punctuation) for the audio as the prompt. transcription.text has punctuation as expected but transcription.words has none unless it was two words inter-connected by punctuation with no space in the script (ie, it's..very works but it's.. very with the space doesn't)

(in case it's unclear, I'm talking about using the OpenAI python client & API, not running Whisper fully locally)

dim birch Mar 17, 2024, 11:04 AM

#

woeful verge Hello guys. I am currently trying to get whisper up and running on my machine. ...

have you tried passing in the suggested arguments? HIP_LAUNCH_BLOCKING=1 or TORCH_USE_HIP_DSA

woeful verge Mar 17, 2024, 11:35 AM

#

dim birch have you tried passing in the suggested arguments? `HIP_LAUNCH_BLOCKING=1` or `T...

I did, yes.

dim birch Mar 17, 2024, 11:36 AM

#

woeful verge I did, yes.

i haven't seen this before but what was the output when you did that?

woeful verge Mar 17, 2024, 11:38 AM

#

Tbh. I just sold my XTX and bought a 4090. AMD for AI ist just stupid and ROCm a piece of garbage

#

I tried SO many things. Updating the kernel to 6.6 on 22.04 Ubuntu, passing every Local variable like in the fixes from other people.

#

I uninstalled and reinstalled a ROCm-compatible PyTorch and reinstalled ROCm drivers for my respective ubuntu and kernel version

#

Also adjusted the HIP debugging level but that didn’t tell me anything.

woeful verge Mar 17, 2024, 11:45 AM

#

dim birch i haven't seen this before but what was the output when you did that?

But thanks for the effort to reach out to me. I hope the new GPU fixes the problem. Just throwing money at it lol

neon cargo Mar 18, 2024, 4:45 PM

#

guys, I really dont know which programming language to learn next, I am abu clueless about which one to learn, at this rate ai will become good at many of the languages, which is going to be very important in the future. And please no gpt type answsers

scenic remnant Mar 20, 2024, 12:42 AM

#

neon cargo guys, I really dont know which programming language to learn next, I am abu clue...

Well, learning Python is a good all-around language to start with IMO. JavaScript can also be good, specifically Typescript. My suggestion is to find a specific platform or dataset to work with as a project, then have GPT-4 teach you

#

I've been learning Python using it to create simple scripts to analyze Reddit data using PRAW, a Python Reddit API wrapper that has taught me what the heck an API even is

neon cargo Mar 20, 2024, 7:29 AM

#

scenic remnant I've been learning Python using it to create simple scripts to analyze Reddit da...

But it still feels bland, u feeling me? I want to learn something which is heavily used in the future cause at the rate of ai progress, by the time I master python I will become absolute

tacit gulch Mar 20, 2024, 10:19 AM

#

Hey everyone! 👋

I'm diving into the integration of the Whisper API for a project that necessitates handling highly sensitive audio files, meant to be installed and operational within a highly secure local network. This setup is critical as it mandates that all data remain strictly offline for security reasons. I have a couple of key inquiries on this front and would greatly value any guidance or shared experiences:

Token Pre-Purchase: Is there an option to pre-purchase tokens or credits for Whisper API usage? We're looking for payment models that provide the flexibility to budget and plan ahead without immediate consumption.

Data Privacy and Usage Logs: Privacy is a top priority for us. In the context of Whisper's operation, can anyone shed light on the specifics captured within the usage logs? Are there any references to the processed data itself, or do the logs strictly record metrics such as transcribed minutes and the models utilized? Any insights into the data management and logging practices would be invaluable.

Navigating the secure and efficient use of Whisper API, particularly for projects with a high degree of sensitivity, is my current challenge. If you've had experience or have knowledge regarding deploying OpenAI technologies in similar secure, offline environments, I'd be incredibly thankful for your insights.

scenic remnant Mar 20, 2024, 12:26 PM

#

neon cargo But it still feels bland, u feeling me? I want to learn something which is heavi...

What exactly is your goal to master Python? Money? Vanity? I think you might be missing the point on what programming is about. The world's best programmers are constantly learning new things every year, which means there is no "best programmer". It's like asking who the best athlete is

#

Maybe it's Michael Phelps, but he probably isn't very good at basketball is he?

#

This isn't Pokemon my friend

neon cargo Mar 21, 2024, 9:44 AM

#

scenic remnant What exactly is your goal to master Python? Money? Vanity? I think you might be ...

My goal is to create a lot of cool and helpful things using programming, I need a language which pretty much has everything to do everything….

scenic remnant Mar 21, 2024, 12:56 PM

#

neon cargo My goal is to create a lot of cool and helpful things using programming, I need ...

Most programs that people actually use have more than one programming language. For example, the code structure is made from Python but the user interface is made from HTML and Javascript

muted axleBOT Mar 21, 2024, 1:07 PM

#

<:book_icon:1171408210398289941> `` Rule 1 `` Be respectful.

Practice kindness and positive regard. Harassment, hate speech (such as sexism, racism, or homophobia), or other malicious conduct will not be tolerated. Maintain a respectful and positive environment.

sudden aspen Mar 21, 2024, 1:09 PM

#

neon cargo My goal is to create a lot of cool and helpful things using programming, I need ...

1- Python tools and 2- html + js tools thumbsup
i have like 20 tools here all made by chatgpt

neon cargo Mar 21, 2024, 8:27 PM

#

scenic remnant Most programs that people actually use have more than one programming language. ...

then I have no clue what to use to build what!

sudden aspen Mar 21, 2024, 11:56 PM

#

neon cargo then I have no clue what to use to build what!

https://chat.openai.com/share/98131e78-5cdb-452b-9d96-7983cd571733

ChatGPT

A conversational AI system that listens, learns, and challenges

#

Then just save it as "clock.html" and run it thumbsup

ripe shuttle Mar 22, 2024, 1:23 AM

#

Does anyone know how to split audio file into chunks to send to whisper api using nodejs running in Vercel Serverless Functions?

neon cargo Mar 22, 2024, 5:07 PM

#

I need clues guys, CLUES

plain creek Mar 22, 2024, 5:19 PM

#

neon cargo I need clues guys, CLUES

Focus on building complex systems, and the languages used for those systems. Like C#. The language isn't as important as your ability to solve complex problems.

#

Enterprise

drowsy ruin Mar 22, 2024, 6:25 PM

#

I’m finetuning Whisper with my own English data and need clarity on standardizing transcripts, especially regarding punctuation and capitalization. I reviewed appendix C (pg 21) of the Whisper paper but I couldn't find details on capitalization and punctuation for English transcription. Any idea how text was standardized when training Whisper?

neon cargo Mar 23, 2024, 9:07 AM

#

plain creek Focus on building complex systems, and the languages used for those systems. Lik...

But the thing is, I cant spend too much time learning new categories. Something I can pick up fast and master it

plain creek Mar 23, 2024, 9:34 AM

#

neon cargo But the thing is, I cant spend too much time learning new categories. Something ...

Are you a professional programmer or are you only just getting into programming?

neon cargo Mar 23, 2024, 10:08 AM

#

plain creek Are you a professional programmer or are you only just getting into programming?

I am not so professional programmer, but I have done Java c plus and web dev with smart kit

glad cedar Mar 23, 2024, 3:07 PM

#

I got this and it's ok when I am choosing Chinese or Japanese

#

why

candid aspen Mar 23, 2024, 4:46 PM

#

glad cedar I got this and it's ok when I am choosing Chinese or Japanese

You may have to approach OpenAI directly. Chinese language support could be not offered currently.

neon cargo Mar 24, 2024, 10:04 AM

#

glad cedar I got this and it's ok when I am choosing Chinese or Japanese

What is this?

glad cedar Mar 24, 2024, 10:26 AM

#

neon cargo What is this?

I was trying to recognize a piece of audio and I got errors when I choose Cantonese

candid aspen Mar 24, 2024, 10:53 AM

#

I alread said it cannot find the specified language... Cantonese in this case...

#

it's not finding it

glad cedar Mar 24, 2024, 11:32 AM

#

candid aspen I alread said it cannot find the specified language... Cantonese in this case.....

I got it bro, just replying

fathom ridge Mar 25, 2024, 8:46 AM

#

So I got ROCm working with whisper on the integrated gpu on a Ryzen 5700g (lol), but it's basically the same speed as CPU. Is there any way to use this GPU to actually speed it up or was this just pointless?

#

I suspect it was pointless

fathom ridge Mar 25, 2024, 10:10 AM

#

Yea, this sucks. Keeps crashing the igpu requiring a reboot to clear.

upper sandal Mar 25, 2024, 9:58 PM

#

@fathom ridge Try unplugging monitor from integrated port

fathom ridge Mar 25, 2024, 9:59 PM

#

I'm using it. It runs a dashboard for my home assistant.

#

I can try it later but I want that more, heh.

upper sandal Mar 25, 2024, 10:00 PM

#

Yes I'm sure you are using it. But it is hogging resources

fathom ridge Mar 25, 2024, 10:02 PM

#

If it can't spare the resources to run a tab in firefox and do this then that configuration isn't suited to my needs.

upper sandal Mar 25, 2024, 10:04 PM

#

Do you have only one port to plug a monitor into

#

Ryzen 5700g is equivalent to a RX 550 or GTX 560 -- light gaming

fathom ridge Mar 25, 2024, 10:07 PM

#

yea, it's a small mitx pc with no dedicated gpu

#

I have things that run whisper fine, I just wanted to see if I could get this PC to be a bit faster. I wasn't expecting much, honestly I was surprised the integrated gpu on it even supported ROCm

upper sandal Mar 25, 2024, 10:08 PM

#

Some of the prominent capabilities of the processor include SSE 4.2 + AVX + AVX2 + AES + VAES + AMD SVM + FMA + RdRand + FSGSBASE + BMI2

#

AMD is pretty good man

#

I run full amd as well

fathom ridge Mar 25, 2024, 10:09 PM

#

I'm not getting bad performance, I was just trying to get the best possible.

upper sandal Mar 25, 2024, 10:16 PM

#

What would you suggest for a desktop running Ryzen 7800X3D / Radeon 7800XT GPU / 32 GB Ram that isn't being used for an ai project? @fathom ridge

fathom ridge Mar 25, 2024, 10:38 PM

#

idk what the 7800xt is good for in relation to AI, I have nvidia.

#

So I'd suggest finding a good game

#

This 5700g is the only time I've played with ROCm

quartz phoenix Mar 26, 2024, 8:35 AM

#

@fathom ridge yeah you're going to need a beefier GPU, APUs are not exactly a well supported scenario to begin with
And AMD is notorious for not properly documenting which GPU supports what ROCm features.
if you have around 400 $/€ burning a hole in your pocket, you could get an RX 6800.
But even with the new windows support its still a far cry from CUDA on nvidia.

sudden aspen Mar 26, 2024, 1:18 PM

#

upper sandal What would you suggest for a desktop running Ryzen 7800X3D / Radeon 7800XT GPU /...

I want to build a custom cheap but "can run llm pc" any ideas please?
Is there any kind of "next-gen" x99 Motherboard ?

upper sandal Mar 26, 2024, 7:09 PM

#

fathom ridge So I'd suggest finding a good game

hahaha

upper sandal Mar 26, 2024, 7:11 PM

#

sudden aspen I want to build a custom cheap but "can run llm pc" any ideas please? Is there a...

i know a lot about computer hardware but not when it pertains to this. I know you benefit from a threadripper-type processor!

#

i imagine fast hard drives also, like read/write speed. i cant answer for HD size.

#

for a LLM machine only, i think this is the kind of infrastructure you want

sudden aspen Mar 26, 2024, 10:42 PM

#

yeah i wanted something like that, but ddr5. thanks alot for the help

fathom ridge Mar 27, 2024, 2:27 AM

#

quartz phoenix <@182947598691336192> yeah you're going to need a beefier GPU, APUs are not exac...

If I'm buying a dedicated card for it I'm going nvidia and getting something that'll run mixtral too, heh. Right now I'm just running both ollama and whisper on my gaming PC and leaving it on.

fathom ridge Mar 28, 2024, 9:09 AM

#

Wishlist: VLC plugin that adds subtitles to any video

muted axleBOT Mar 28, 2024, 3:33 PM

#

<:book_icon:1171408210398289941> `` Rule 7 `` No self-promotion, soliciting, or advertising.

Do not post or direct message any members of this server to promote non-OpenAI services, products, or projects.

fallow gorge Mar 28, 2024, 8:46 PM

#

muted axle

What ? But it was related to open ai a_skull

#

Bad GPT

uncut hedge Mar 30, 2024, 7:07 AM

#

fathom ridge Wishlist: VLC plugin that adds subtitles to any video

It's complicated to do that.

#

You would need to either stream the audio to Whisper or upload the whole thing. And such proposal can cost a lot of tokens

fathom ridge Mar 30, 2024, 7:16 AM

#

Could just run it locally

hot belfry Mar 31, 2024, 5:21 PM

#

is there a tutorial to learn how to use whisper? Is it difficult? I want to record a 20 minutes speech of myself

candid siren Apr 1, 2024, 3:11 AM

#

lapis jacinth 🎙️ **Whisper's Response to Blank Inputs - A Quirk?** Hi all! 👋 I noticed tha...

Hello Ross, we are also encountering this problem. Is there any more suitable solution at present?

你好Ross 我们目前也遇到了这个问题，目前有什么比较合适的解决方式吗？

lapis jacinth Apr 1, 2024, 6:53 AM

#

candid siren Hello Ross, we are also encountering this problem. Is there any more suitable so...

None that I'm aware of, @candid siren .
The best solution I've found is to filter out those responses with GPT. Even this is difficult, given the variations in output. You might consider fine-tuning a model to recognise these "YouTube-isms".

brave shore Apr 1, 2024, 10:01 PM

#

confusedpenguin

viral prism Apr 5, 2024, 8:19 PM

#

ECONNRESET is killing me

willow grove Apr 7, 2024, 2:25 PM

#

What happens when you transcribe reversed speech, lol

(Made using ElevenLabs, which utilizes Whisper)

polar flume Apr 8, 2024, 4:13 PM

#

Has anyone figured out a way how to use whisper for real time transcription without transcribe into nonsense?

long anvil Apr 10, 2024, 2:06 PM

#

Hello!
I hope my question is relevant here. I am attempting to convert alignment results from WhisperX into a TextGrid file for the purpose of analyzing afterwards on Praat. I initially used Parselmouth to directly convert the WhisperX alignment, and I also tried to write the results to CSV in order to convert them into a TextGrid file, but it doesn't seem to work. Has anyone else done a similar thing before, or does anyone have suggestions on how to do this? Thank you!

tacit gulch Apr 11, 2024, 8:52 AM

#

Hi, I'd like to know if there's a built-in method to extract the confidence score for each word in the output. Additionally, is there a feature or an add-on available that allows these confidence scores to be color-coded directly in the results?

rapid spindle Apr 17, 2024, 12:08 AM

#

hi

#

when will v3 of whisper API be available? why the delay?

rapid spindle Apr 17, 2024, 12:16 AM

#

muted axle

When will you support v3 in your API ? It's been almost one year

whole agate Apr 19, 2024, 12:22 PM

#

Can anyone help me use whisper here?
I want to make some code in python that utlizizes whisper to create a SRT file or a VTT that has timestamps but only single words per each I don't want sentences or paragraphs like these:

1
00:00:00,000 --> 00:00:05,000
Open AI has recently decided to open source.

2
00:00:05,000 --> 00:00:09,000
Their translation and transcription AI whisper.

3
00:00:09,000 --> 00:00:18,000
So now it is under an MIT license and that includes both the code that's here as well as the model weights that were used to train the AI.

4
00:00:18,000 --> 00:00:26,000
So if you want it to go and try and make your own speech transcription AI with that data, you are free to do so. ```

I want single words for each... Is there a way I could do this with whisper?

whole agate Apr 19, 2024, 1:28 PM

#

anyone help?

#

please

granite pebble Apr 19, 2024, 7:13 PM

#

Hello, I am using the API for text voice, and I am struggling to get the model to pause long enough between paragraphs. I have tried inserting 3 dots (...); an hyphen, but nothing works. Any ideas?

unreal kraken Apr 20, 2024, 3:48 PM

#

Hello guys, after not using Whisper for 2 months, I got some issue.
The transcription is running well but at the end I don't have any text generated...

The key is (I guess): FileNotFoundError: [Errno 2] No such file or directory: '.\test.txt'

#

When I google it people have problem that after "No such file or directory:" they have ffmpeg, but I think my works well (I do reinstall), and there is nothing writte with ffmpego but '.\test.txt'.

restive reef Apr 21, 2024, 9:37 PM

#

unreal kraken Hello guys, after not using Whisper for 2 months, I got some issue. The transcri...

i am using whisper just fine. The problem occurs if your audio file has a too long name with spaces in it.

unreal kraken Apr 22, 2024, 4:14 PM

#

restive reef i am using whisper just fine. The problem occurs if your audio file has a too lo...

Damn... I did reinstall (It had helped before) and I have the same issue...

Do you know any command for whole uninstall: whisper, cuda, pip, ffmpeg etc. (connected with whisper)?

restive reef Apr 22, 2024, 4:17 PM

#

unreal kraken Damn... I did reinstall (It had helped before) and I have the same issue... Do ...