#gpt-realtime

1 messages · Page 3 of 1

dull sable
#

Thanks for the suggestion! I don't see a direct/legitimate way to programmatically use this tool, so probably not (I guess I never specified that's one of my requirements, but I'm mostly hoping for general advice). I'm also not sure if this can actually remove the offending sections rather than just make them sound like speech, since that seems to be what it is doing.

cerulean flint
small juniper
#

tpacker Hi seem to be quite

normal bloom
#

i made a little tool so that my mom can use whisper at translate.mom (hopefully i'm not breaking any rules!)

tight stirrup
#

Hey, I'm using the local version of whisper and was wondering if I compress an audio file from 1,411 to 96 Kbps, in general would there be much of a speedup in transcription time, and how much of a decrease in accuracy would I see?

fathom escarp
#

I don't think there would be much speedup

rugged pasture
#

translate.mom then voila what's happening?

sonic mango
#

Hello everyone. Wanted to know if whisper was handling diarization ?

sonic mango
autumn bolt
#

hello question how can you have whisper

eternal osprey
#

I played around with the Lex Fridman interview with Mark Zuckerberg using Whisper and ChatGPT. This could be a really cool use case for processing interviews: https://lastmileai.dev/workbooks/clj9c2dxw01uzr0gvlk8rbx57

In this workbook, we'll do some cool things with Lex Fridman's most recent interview of Mark Zuckerberg about Meta's next AI model release (the next version of LLaMA)! We hope this inspires you to explore workbooks with Whisper, an audio-to-text model.

autumn bolt
#

Hello everyone. I would like to know how or where to contact the marketing team.

simple latch
# autumn bolt hello question how can you have whisper

In this step-by-step tutorial, learn how to transcribe speech into text using OpenAI's Whisper AI. Whisper AI is an AI speech recognition system that can transcribe and translate audio files in approximately 100 different languages.

📚 RESOURCES

▶ Play video
hearty shore
autumn bolt
#

If anyone is into AI development on a beginner scale hmu I got a project I need help on

static thunder
#

hihi

real river
#

Too loud.
Can you whisper?

desert dust
desert dust
#

Why

plain swift
#

Can you pull the captions from video websites for training?

chrome sail
#

Hey, what's the required specification to run the Whisper Model on a VPS?

steel fern
#

Should like to check, is anyone interested in - or has there been any discussion - on audio-to-text transcription that captures details of speaker identities (i.e. "speaker diarization")?

#

I'm having some limited success, but seem to be constrained by sample sizes no more than a couple of minutes long ..

cerulean flint
cerulean flint
steel fern
#

Not sure if this is a fork from Whisper .. if there are other projects I'm interested

#

I managed earlier to break down a 2-minute audio sample by speakers, labelled SPEAKER_00, SPEAKER_01, .. this seemed like a good start, except on inspection it wasn't very accurate

steel fern
#

regarding the 2-minute sample (it was an export from the leading 2 minutes of a longer 30 minute segment) it produced a 15 line transcript which in terms of word accuracy was quite good, I printed the speaker labels next to the text segments like so

SPEAKER_02 And he is now suspended, ready for receiving his travel next week.
SPEAKER_02 And [name] and [name] are available.
SPEAKER_02 So the donor blood group is outposed, the recipient is outposed.
SPEAKER_02 It's a 0-1-1 mismatch.
SPEAKER_02 They had full cross-match on the 18th of the 5th, which was negative.
SPEAKER_05 Do you want any other antibody samples?
SPEAKER_02 No.
...
#

When I ran a 10 minute sample though against the same code, I only managed to get the first 70 lines of transcript, which corresponded to about 2/3 of the sample

steel fern
#

here is the full source code I was using

import whisperx
device = "cpu"
language = "en"
audio_filename = "audio-sample.mp3"
# -- transcription
model = whisperx.load_model("large-v2", device, compute_type="int8", language=language)
audio = whisperx.load_audio(audio_filename)
result = model.transcribe(audio, batch_size=16)
# -- alignment
model_a, metadata = whisperx.load_align_model(language_code=language, device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
# -- diarization
YOUR_HF_TOKEN = "hf_xxx"
diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)
diarize_segments = diarize_model(audio_filename, min_speakers=4, max_speakers=7)
# -- speaker assignment
result = whisperx.assign_word_speakers(diarize_segments, result)
# -- print result
for i in range(len(result["segments"])):
    print("{} {}".format(result["segments"][i]['speaker'], result["segments"][i]['text']))
steel fern
#

Can update, the problem I had earlier - with audio length - is because the print method is broken .. !

#

sometimes a speaker isn't assigned to a text segment and if that happens then result["segments"][i]['speaker'] terminates the script early

#

I rewrote the printout, i.e. the part following result = whisperx.assign_word_speakers(diarize_segments, result) as

# -- print result
for i in range(len(result["segments"])):
    speaker = result["segments"][i].get('speaker', 'SPEAKER-UNIDENTIFIED')
    text = result["segments"][i]['text']
    print("{} {}".format(speaker, text))
scarlet crest
#

anyone might know why

const transcription = await openai.createTranscription({
  file: buffer,
  model: 'whisper-1',
  response_format: 'json'
});

returns an error?: Error creating transcription: RequiredError: Required parameter model was null or undefined when calling createTranscription.

this is the openai configuration: ```js
const configuration = new Configuration({
apiKey: OPENAI_API_KEY,
organization: OPENAI_ORGANIZATION
});

export const openai = new OpenAIApi(configuration);

untold sparrow
#

Why medium.en require more VRAM but have less speed than tiny.en ?

cerulean flint
#

that is exactly how it should be?

#

large is bigger and the slowest

untold sparrow
cerulean flint
slate geode
#

How does using the api compare to running whisper.cpp locally? I'm using the api for my app but I'm wondering if it would be worth it to generate the transcripts locally for a performance boost.

slate geode
#

Also, I'm using zh (chinese) for the language and getting errors for my requests. Works for french and english though.

grand gull
#

Is this only for the whisper api?

#

I wanted to know other people's solution on how to get "real time" voice transcription with microphone

stiff remnant
#

does the sample rate of an mp3 affect the usage cost?

tacit wave
#

Has anyone used fast-whisper?

woven bluff
small juniper
#

I wanted to know other people s solution

untold sparrow
#

Hi, why im getting nothing when i enter the commands ?
whisper --model tiny.en "test.mp3"
like i get

devout shuttle
untold sparrow
#

is it possible to translate the output file ?

#

into another language

cerulean flint
untold sparrow
sly apex
#

Anyone able to get half decent results with small whisper models running locally on OrangePi or RK3588 boards?

valid kernel
#

ллллллллллллллллллллл

#

єєєєєєєєєєєєєєєєєєєєєєєждлорнепаквіфівссапрролджє.дбьотипсіячсмитьбю.

clear gazelle
#

Hello, I have a question about Whisper. I want to try incorporating it into a small Python program for a voice assistant. Can someone please help me?

untold sparrow
#

whisper --model medium.en "audio.mp3" --output_format txt srt

#

I use this command and i only want to have the format **txt **and **srt **but it doesn't work

#

anyone can help me ?

autumn bolt
#

open ai is a company that makes ai models and stuff like that

tardy kettle
#

Apple should use this tech for their dictation function on the keyboard.

#

Prove me wrong

echo ridge
#

When does whisper come to code interpreter ?

#

So I can ask for transcript of my audio files directly from chatgpt

fair parrot
#

Idk if that is even possible though

proud kite
#

When I submit my recording , the output is error, instead of "transafjkl" that I typed.

How to solve this, Thank you.

untold crystal
#

Hey! My teacher told me to use this command line arguments from whisper, is he correct?

whisper --model medium --language Spanish --output_format {txt,vtt}

small juniper
#

Hey My teacher told me to use this

#

I tried to post a link, but cannot. You can google for this: "Audio Course from HuggingFace". It uses Whisper.

autumn bolt
golden summit
#

how possible is it to have a video who you can talk to? sort of chatgpt in the form of a video?

small juniper
#

how possible is it to have a video who

clever barn
#

is ti possible to touch a stray cat

#

it*

copper ridge
#

What is whisper?

frail cloak
#

Where do I access whisper?

fierce marsh
#

Hello 👋

#

How to create image's in using to chat Gpt ?

frail cloak
fierce marsh
#

What is whisper?

frail cloak
frail cloak
fierce marsh
winged bolt
#

/what was the beginning of the ford motors company

manic coral
manic coral
manic coral
manic coral
frail cloak
fierce marsh
#

Hello hello how are you

autumn bolt
#

hey guys im starting a community based server VORA-AI for AI development/showcasing and i need mods/staff

small juniper
#

Let's reduce the noise in this channel, shall we? Suggestions:

  • Questions that Google and ChatGPT can easily answer, I expect to be ignored here.
  • Nonsensical and off-topic statements like "/what was the beginning of the ford motors company" should not happen.
  • Ask a complete question so the topic is clear. When answering a question, start a new thread on that question.
quartz jungle
#

Hi
So I have a folder with 45 audio files in MP3 format. I want to transcribe them to text using OpenAI Whisper API. I have an API key. After transcribing them, I want to take the entire output, and put it into a text file.

Please tell me the best way to do this.

livid mauve
queen scarab
manic coral
# quartz jungle Hi So I have a folder with 45 audio files in MP3 format. I want to transcribe t...

Here's one of my whisper programs

import openai
from pprint import pprint

openai.api_key = os.getenv("OPENAI_API_KEY")

# Prompt for user input
file_path = input("Enter the audio file path: ")
prompt = input("Enter the prompt text (optional): ")
response_format = input("Enter the response format (optional, defaults to json): ")
language = input("Enter the language of the input audio (optional): ")

# Open the audio file in binary mode
audio_file = open(file_path, "rb")

# Set the model ID
model_id = "whisper-1"

# Set the temperature to 0 if it is not provided
temperature = 0

# Call the API with user-input variables
transcript = openai.Audio.transcribe(model_id, audio_file, prompt=prompt, response_format=response_format, temperature=temperature, language=language)

pprint(transcript)```
This just prints the transcription to the screen you can modify it to instead save it to a file.
quartz jungle
quartz jungle
cerulean flint
manic coral
#

Not to my knowledge but if you want, I can write one this afternoon.

quartz jungle
#

Not to my knowledge but if you want I

plucky palm
#

how do i output as timestamps?

unique flare
#

Is this available for plus subscribers yet,?

manic coral
calm ravine
#

please advice about community role please.

livid mauve
#

what roles are you asking about specifically?

alpine nacelle
#

psst

calm ravine
#

i ask the same part of the question, how do they look like and who are they..? showing exploration of neurological netwrking system.

#

which role Admin want to gave ..because i am Business information system and an engineer alieas plant manager.

rapid glen
#

I hate you

untold crystal
wanton mica
#

Gay

manic coral
#

(There is no whisper exec file)

untold crystal
remote hedge
#

Working on my first python project. I put the code and the output in here: https://justpaste.it/a6j5p
The goal is to run Whisper to transcript the audio and save it as a .txt with the same name. Currently there is no txt file, even though the output says so. According to the error messages there might be something wrong with ffmpeg but when I run Whisper in CLI the transcription works so I don't think that's the problem

remote hedge
#

could it be a permission issue?

thorny phoenix
#

is whisper better than elevenlabs?

thorny phoenix
clever basin
#

whisper is really good, definitely better than all other speech to text things

alpine tendon
cerulean flint
autumn bolt
#

Hey so, I have never used AI before. However, just watched the Iron Man again movies and honestly I'm convinced I need to spend time and try and make my own for around my home. However, I genuinely have no idea where to start. I have experience in some coding, java and javascript. But, if a new language is needed I'm willing to invest. I think what I'd want the AI to be able to do proceeds as:

Control lights
Control Music
Regular ChatGPT for questions/talk
Book events
Control TV
Control Camera (not really sure how this would work, but, I could say turn on garage camera, and it'd pull up on the screen)

Those are just the ones I can think of right now, but, if AI can learn, could it learn who people are? For example, "Charlie has just walked into the house", or if I asked, where is the dog, they could say "The dog is outside". Because of inspiration, id want it to have a Jarvis voice (but who knows, maybe Ill want it different for my own style).

Another question I was curious about, is if prompted correctly could AI just code this for me? not sure how this works and half of this is even possible. But please lmk!

calm ravine
#

A.I is a helping tools, It's depends on the roots they enquired.

cerulean flint
thorny phoenix
#

also learn about prompt engineering

valid wharf
#

Hey, pssst

#

im whispering to you

#

shhhh

valid wharf
tawdry dagger
#

Heyy, just wanted to ask if there is a way to run faster-whisper without the python? Cause I'm using command prompt and am not quite sure on how to convert...

remote hedge
#

How does whisper handle multiple languages being spoken in a file? I'm trying to transcribe subs for my wedding video but my family and my wife's family speak different languages

remote hedge
chrome bloom
mortal canyon
#

guysssss shhhhh whisper🤫🤫🤫🤫

#

no I think they meant prompt not proompt

manic coral
# remote hedge How does whisper handle multiple languages being spoken in a file? I'm trying to...

I would probably try chirp if you have access


Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model. Chirp achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages.```
```Chirp is available through the Cloud Speech-to-Text API. The API lets you do inference for transcription against the Chirp model```
faint dawn
#

guys

#

can someone please help

#

it's kinda urgent

#

I am using Whisper API and trying to simply transcirbe a text from a voice message that is in English

#

but for a strange reason the text gets transcribed to Greek

#

meanwhile I have put NOWHERE for Greek

#

can someone help?

tawdry dagger
spice grail
#

How we use Whisper I don't understand

manic coral
#
import openai
from pprint import pprint

openai.api_key = os.getenv("OPENAI_API_KEY")

# Prompt for user input
file_path = input("Enter the audio file path: ")
prompt = input("Enter the prompt text (optional): ")
response_format = input("Enter the response format (optional, defaults to json): ")
language = input("Enter the language of the input audio (optional): ")

# Open the audio file in binary mode
audio_file = open(file_path, "rb")

# Set the model ID
model_id = "whisper-1"

# Set the temperature to 0 if it is not provided
temperature = 0

# Call the API with user-input variables
transcript = openai.Audio.transcribe(model_id, audio_file, prompt=prompt, response_format=response_format, temperature=temperature, language=language)

pprint(transcript)```
thin fox
#

hello all,

are there any Ai researchers here? working on custom models in text or image generation?

prime cradle
#
  File "docs.py", line 5, in <module>
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
AttributeError: module 'openai' has no attribute 'Audio'```

I am getting this error while trying to copying and pasting the basic api reference from the docs
autumn bolt
#

🙂

frozen spoke
#

That wasn't a different language... Why did it delete my message?

#

Can whisper gain IPA as a language?

mossy yacht
valid kernel
#

я

lapis coyote
lapis coyote
lapis coyote
manic coral
quiet mist
ornate trellis
#

I'm facing an issue I'm new to this

import whisper
from typing import Annotated
from fastapi import FastAPI, File

app = FastAPI()

@app.post("/ta")
async def transcribe_audio(audio_file_upload: Annotated[bytes, File()]):
    model = whisper.load_model("base")
    result = model.transcribe(audio_file_upload, word_timestamps=True, fp16=True)
    return {"res": result}

I'm getting this error

 File "N:\audio-to-text\venv\Lib\site-packages\whisper\audio.py", line 131, in log_mel_spectrogram
    audio = torch.from_numpy(audio)
            ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected np.ndarray (got bytes)
steep monolith
#

I have been using the default free Google speech-to-text tool for a mobile app. Is there any advantage to using Whisper?

steep monolith
#

I'm only using it for short bits of text, so maybe I would run into issues with more extensive conversions

tacit bolt
#

do open ai playground and azure playground give different responses for the same prompt?

steep monolith
#

I assume OpenAI think Whisper fills a niche that the Google version doesn't fill... Maybe the niche is other-than-mobile settings.

cerulean flint
steep monolith
#

Is it expensive?

fading scroll
#

sup

dim vigil
#

do i need to pay to use whisper?

prisma sinew
stone nebula
#

so i've been using whisper to get transcriptions from video clips i'm processing, and i've noticed an interesting quirk at the end of some of the transcriptions
[TRANSCRIPTION] It's the summer's biggest sensation. And now it all comes down to this. The Dancing with the Stars Grand Finale. Underdog Kelly Monaco The eye of the tiger, baby. Takes on favorite John O'Hurley. Be afraid. Be very afraid. The judges score and your vote decides the champion. Live. Plus, all the stars reunite for one final encore. It's the night America has been waiting for. Dancing with the Stars Grand Finale. Wednesday, 9, 8 Central. Only on ABC. Subs by www.zeoranger.co.uk

fading scroll
#

duh

jaunty canopy
#

Can I get the transcript in 30 sec batches, rather than the entire text, via whisper APIs?

lapis coyote
# stone nebula so i've been using whisper to get transcriptions from video clips i'm processing...

It happened from time to time when I asked it to transcribe short audio clips with fuzzy voice. I think that's one built-in flaw of the machine learning approach Whisper is using. My understanding is that Whisper is using a variation of the generative AI similar to ChatGPT, not the usual voice recognition ML models used by other providers. It's pretty unique. But it could lead to the problem that it's actively trying to figure out "what's next" in the process, which led to the "next" being the somehow popular websites. At least that my explanation based on my experience. It's so much so that I built code logic to hadle this situation correctly.

stone nebula
#

Interestingly enough that doesn't seem to be an active website

untold crystal
#

whisper needs actualization?

mighty fossil
#

help meee!!!

#

why is my api key not working

#

today i finnaly made a project that i copied from youtube but my key is not working

#

it says you need tokens

#

whats that??

#

anyone?

young willow
mighty fossil
#

How can I do that?

#

It asks for company name and stuff like that

young willow
mighty fossil
#

I went there and also went on the pricing part. It asks for my role and the company h work for

young willow
#

You have to setup the information

#

And you have to pay for use the API

shadow vapor
#

Hey guys, can someone tell me if this is possible with whisper?
I want to use whispers language ID on code-switching speech, that is switching between languages mid or between sentences. Ultimately, I would like to make timestamps of when languages are used. For example, (0,-1 seconds) English -> (1-2) Spanish ->.... etc
Can this be done?

loud plinth
#

Might OpenAI remove the Try In Playground link from the product page under Whisper? There is no playground option for Whisper. Why tease people into going to look for something that we know doesn't exist? Thanks.

untold crystal
#

Question; "Will Whisper need an update at some point, or does it remain unupdated?"

#

ok mods can someone explain why the timeout? I was writing in english...

#

Do you know why its say me error? whisper: error: argument --output_format/-f: invalid choice: '{txt,vtt}' (choose from 'txt', 'vtt', 'srt', 'tsv', 'json', 'all')

#

This is the code:
whisper Histologia_general_teo_sem_1 --model medium --language Spanish --output_format {txt,vtt}

plush mulch
fierce surge
#

Do i have to upgrade to gptplus in order to use whisper?

#

Ok. I got the answer by reading previous chat. I have to update.

untold crystal
#

to late to watch it

#

i resolved it later

#

but thanks man

#

is like

#

1 month

#

i didnd know how to do it

untold crystal
#

i didnt use --task transcribe and it worked

#

why should i need to use --task transcribe?

#

pd: sorry for the bad english

magic pollen
#

Is it possible for Whisper to distinguish different speakers that appear in the same audio file?

weary skiff
plush mulch
untold crystal
#

do you think is necesaraly?

#

it would improve the transcribtion?

#

or not?

#

btw

#

do you use--patience PATIENCE?

willow geyser
#

has anyone used Whisper to live-transcribe and translate a panel discussion IRL?

storm geyser
#

When whisper can identify different speakers is when I'll pay for it

storm geyser
untold crystal
glass mirage
#

you could technically cut the live recorded mp3 in word pieces and upload it to the server to be transcribed

#

and do these couple of things simultaneously using multithreading

#

however i dont see it being efficient enough to be a viable option especially on mobile

#

so no clue what could be used instead

#

for example assebly.ai is an amazing api but also pricey as hell

#

does anyone know any relatively good apis for this but without some ridiculous prices that no client would stand?

lethal narwhal
#

I found a whisper app on the Google Play Store to dictate anything using a custom keyboard, is there is a similar app on the App Store for Apple phones?

autumn bolt
#

Anyone familiar with CoreML?

true lotus
#

Hey so does anyone here know where the bot channel is?

tender rivet
untold crystal
#

whisper --model medium --language Spanish --output_format txt --output_format vtt

#

i used this and the whisper olny gave me format vtt, you know why?

tender rivet
#

--output_format vtt 🤔

untold crystal
# tender rivet ``--output_format vtt`` 🤔

@tender rivet i think i didnt emphasise, but i want to give me --output_format txt --output_format vtt and only gave me --output_format vtt. Do you know why it dindt give me --output_format txt?

finite frost
#

Hi guys, On iOS, chatgpt transcription feature through whisper translates my English to Russian sometimes automatically, possibly because my accent is Russian. Honestly the fact that it's possible is amazing, but I would prefer to control it. Does anyone know what happens and how to control it? Also, is there a dedicated app using whisper specifically for translation?

bleak marten
#

To regain control explore the settings or preferences within the app that uses Whisper for transcription.

finite frost
#

it's official chatgpt app

muted axleBOT
#

attention Attention  Off-topic chats have been cleared by a moderator.

plush mulch
# untold crystal <@207888046647934978> i think i didnt emphasise, but i want to give me --output_...

seems like only one output_format is available as output.
you can use this GUI to save the output in 5 different file formats without the need to run it multiple times, also you can directly save the subtitle to the video hardcoded or as .mkv file
https://github.com/meeksqueal/OpenAI-Whisper-GUI

GitHub

A modern GUI application that transcribes and translates audio and video files, offering the option to save the subtitles as separate files, embed the subtitles in a .mkv format, or hardcode them i...

tardy pollen
#

I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and the cost is immense. Fine, whatever. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special PyTorch-Cuda image so I'm extremely confused.

cerulean flint
#

try using Collab free tier

tardy pollen
#

I don't care about free tier, this is on the corpo dime.

cerulean flint
#

i use it nearly everyday, works

tardy pollen
#

How long would it take whisper to run through a 4GB file on an Nvidia A100 40GB?

cerulean flint
#

not sure it can handle 4GB at all once

tardy pollen
#

Does the GPU actually after its run time?

#

*affect

cerulean flint
#

i part my large files into smaller ones

tardy pollen
#

So the problem is my file is too big?!

#

that makes no sense to me I guess

cerulean flint
#

i would try 1GB parts

#

ok

tardy pollen
#

why would that matter, I'm just genuinely lost is all.

cerulean flint
#

then i guess you should figure it out

tardy pollen
#

no no I hear you -- smaller files. But why?

cerulean flint
#

i don't know, all i can say is that i transcribed hundreds of .mp3 so far and Whisper has ad problems with bigger files (hours of interviews) one i parted it into smaller segments - ir runs smoothly

tardy pollen
#

alright

#

Have you ever run it on a 40GB VRAM GPU instance though lol?

cerulean flint
#

nope

tardy pollen
#

I mean this is a $10K USD card?! It costs $5/hr to run in the cloud

#

That's why I'm confused. It's like running Whisper on a GPU doesn't matter or something. And the machine image I used was specifically from GCP with PyTorch-CUDA preinstalled for python 3.10.0. I mean pretty straightforward. Unless running it on a GPU doesn't actuall do anything for transcription 🤷‍♂️

cerulean flint
#

that works for me

tardy pollen
#

oh. this is from a notebook?

cerulean flint
#

yes

tardy pollen
#

can you export the script and upload it here? I'm a low-tech kind of guy that tends to do things from cli's and what not.

#

please

cerulean flint
#

maybe later, need to finish here & heading to work

tardy pollen
#

allllright

#

that script doesn't partition the file though

#

thanks

tardy pollen
#

I'm running whisper on a GCP Invidia A100 GPU 40GB. This is a huge instance and the cost is immense. Fine, whatever. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special PyTorch-Cuda image so I'm extremely confused.

lapis jacinth
#

🎙️ Whisper's Response to Blank Inputs - A Quirk?

Hi all! 👋

I noticed that when Whisper receives "blank" audio (no spoken words), it transcribes phrases like "Thank you for watching." Even OpenAI's ChatGPT mobile apps do the same.

Has anyone else seen this? Is it a known issue, or is there a workaround? It's intriguing but could be challenging in some scenarios.

Your insights would be super helpful!

unborn gust
#

anyone having issues with the API today? I'm having extra errors and also responses that are not the full file.

untold crystal
untold crystal
untold crystal
lapis jacinth
untold crystal
#

now the mononeural of the bot is going to put me timeout

untold crystal
#

?

#

is the bot is with a mod

#

or is only a bot?

#

i want to report this to ai

#

guys whisper can use also gpu shared memory or only gpu dedicated?

tardy pollen
#

I'm running whisper on a GCP Invidia A100 GPU 40GB. So I'm handling my first transcription and the transcription rate is as slow or slower than on my Mac. How can I confirm that the GPU is actually being used? The image I installed on top of the GPU is direct from GCP and is a special Cuda image meant precisely for my use-case. What am I missing here?

quartz hull
#

Python 3.11.5
:/home/container$ if [[ -d .git ]] && [[ "${AUTO_UPDATE}" == "1" ]]; then git pull; fi; if [[ ! -z "${PY_PACKAGES}" ]]; then pip install -U --prefix .local ${PY_PACKAGES}; fi; if [[ -f /home/container/${REQUIREMENTS_FILE} ]]; then pip install -U --prefix .local -r ${REQUIREMENTS_FILE}; fi; /usr/local/bin/python /home/container/${PY_FILE}
Collecting openai-whisper
Using cached openai-whisper-20230314.tar.gz (792 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting triton==2.0.0 (from openai-whisper)
Downloading triton-2.0.0-1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 46.7/63.3 MB 47.8 MB/s eta 0:00:01ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 47.1/63.3 MB 35.0 MB/s eta 0:00:01

How much space does whisper need?

#

I have like 2gbs on my vm

#

But still error

verbal estuary
#

[413] Maximum content size limit (26214400) exceeded (26247606 bytes read)

so 25 MB is the max limit? is there a way to upgrade this limit?

vast summit
#

Hi! Anyone know how I can use Whisper to dictate sms messages on ios?

near yew
tender rivet
waxen dew
vagrant glade
#

anyone here that has used whisper... is it able to interpret background noise? for example, could it delineate between crowd cheering and crowd booing?

autumn bolt
#

Hey, any good APIs for using whisper in c#?

wide drift
#

Heys guys, I am trying to create a real time voice reader but my code is not working.

cinder saddle
humble hare
#

Hey guys this is a thanks from me and my team to any contributors to whisper. This library basically converted a month's work to a few days.

humble hare
wide drift
#

Can you put the repo

humble hare
#

sorry but the project is for an upcoming hackathon so i cannot share the repo as of right now

wide drift
#

When it will be?

humble hare
#

it will take some time bro I believe at the end of this year

surreal hull
#

@humble hare bro I need your help with real time voice transcription

surreal hull
#

Can you share the idea?

humble hare
tender rivet
#

You are in the right place, check the oai lib for js or python and the code examples on the docs

half anvil
#

Hey guys, has anyone tried hosting whisper large model on firebase cloud functions.

glad nova
#

Hmm..

fair sapphire
#

Hey so I'm trying to use whisper-1 speech to text to generate a transcription from audio in Portuguese using an audio/ogg file as a NodeJS Buffer, but when hitting the endpoint I get an error message that reads as follows: {"message":"","type":"server_error","param":null,"code":null}. This does not really tell me what the issue is, so I came here to ask for help

fair sapphire
#

attempting to use Blob instead of Buffer returns an error saying file parameter does not exist, but it does

#

also fix your automod cuz I got timed out for sending an image of code written in english

fair sapphire
astral shell
#

Is there any way to get timestamp for each word and not only the segment?

thorn echo
#

is possible to identify different voices and have it tagged a different person?

full sage
thorn echo
humble hare
#

Hey the latest version of whisper requires option parameter with srt_writer. How do i get this parameter?

full sage
bold canopy
#

has anyone used the offline whisper version?
it works but for some reason it refuses to use my gpu and uses my cpu instead

UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")

any idea how to reconfigure it? (edited)

#

because of this it seems to take longer when transcribing

stiff remnant
#

Has anybody seen information as to when we might see improvements to Whisper, particularly for smaller languages?

full sage
#

Hello are you using Python to run the whisper model? Or just the command line? If you're using Python you can set fp16 to False and it would use FP32 instead

bold canopy
#

thanks!, ill give it a try

lapis jacinth
#

Any thoughts as to how OpenAI are getting such fast transcriptions in ChatGPT-V? ⚡
Are they using a new Whisper model not available to the public?

plain creek
#

i have used whisper offline, make sure you have the right CUDA stuff installed so that it runs on your GPU (if you want that)

#

otherwise it won't be able to detect your GPU as a device

toxic iris
lapis jacinth
toxic iris
#

Also Whisper would have to run on their backend. (That's extra cost too.) They could just use the STT model integrated in Android/ IOS.

lapis jacinth
toxic iris
#

Still, a lokally running model would be my go-to as a developer.

lapis jacinth
#

For sure, remote Whisper transcriptions would add cost and latency. But the accuracy gains are worth the cost, and they're printing money anyway.
Android/iOS STT doesn't give high enough accuracy for conversational AI, IMO.
And remote transcriptions with Whisper can be close enough to real time to work. I should know, I've done it. 🎤

north sail
full sage
north sail
# full sage What are the errors you're getting?

So in the 17th block of code you'll get an error at the result_aligned = whisperx.align(... part.

After looking through the repo someone suggested you change the beam_size from 1 to 7.

Then after you fix that error you'll get stuck at the 21st block of code
https://github.com/MahmoudAshraf97/whisper-diarization/issues/69

The owner of the repo suggests adding pip uninstall nvidia-cudnn-cu11 but that's already been done at the very beginning of the notebook

GitHub

Hello again - I built on my past errors and installed the correct version of CUDA to get past where I was stuck at previously. However, I seem to have run into a new problem, specifically in the wa...

#

Forgive me for my lack of knowledge regarding coding but I've been trying to debug it myself for quite some time now...

full sage
autumn bolt
#

I am sending audio recordings to the OpenAI Whisper API and cannot get mobile recordings to accept past a few seconds of data, I have no idea why. Desktop audio recordings function perfectly fine but whenever I try on my phone the transcriptions only get a word or two

curl --request POST \
  --url https://api.openai.com/v1/audio/transcriptions \
  --header 'Authorization: BearerTOKEN' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@C:\Users\katra\Desktop\71753801708__8C36058A-E077-4000-B93D-901529FBD0AE (1).mp4' \
  --form model=whisper-1
north sail
full sage
# north sail I haven't had the time to actually test out the fix yet but thank you! Btw how ...

Both NeMo and Pyannote diarization are similar because they use neural networks to segment and cluster audio recordings by speaker labels which you in the colab of whisper diarization where they prepare to convert the data into NeMo combatibility.

But NeMo diarization between the two gets more update and is backed by a huge corporation(Nvidia) and perhaps a wider community while Pyannote is relying on open-source, which many can finetune the model to be better at tasks.

Both overall, both are free to use.

north sail
# full sage Both NeMo and Pyannote diarization are similar because they use neural networks ...

That's interesting. Thanks for the explanation!

The best method I've found for transcribing (and translating) Japanese is using silero-VAD. It fixes Whisper's hallucinations over longer recordings and makes the timings far better than without it.

The thing is most repos/open source programs that have Whisper and utilize a VAD all use pyannote for diarization. I wish I knew how to code something with NeMo so I could test and see for myself if it would yield better results

full sage
# north sail That's interesting. Thanks for the explanation! The best method I've found for ...

Good insight, I will be sure to checkout silero-VAD. And I agree, we see many opensource projects using OpenAI Whisper model because it is pretty popular and known by many. Right now I am building a project using GPT4 and Whisper Tiny/Base model to transcribe youtube videos and summarize it right on your cpu.

In my opinion it performed pretty well on English videos that were over 2 hours long in length.

north sail
# full sage Good insight, I will be sure to checkout silero-VAD. And I agree, we see many op...

Not sure if I can send links in this server, but there's already some research talking about how much better diarization is when LLMs are added to the equation. I hope we'll get to see more of stuff easily accessible to the public soon

I hope your project works out! I've seen a small number of repos on Github already try to implement a GPT model to help with transcriptions, but I haven't tested them out myself yet

lapis jacinth
north sail
autumn bolt
#

ik this is prolly less but. Whisper is awesome ive been using it to get notes of my classes

#

i really like it

kind skiff
weary trail
#

I keep getting stuff like "由 Amara.org 社群提供的字幕" when theres silence and other chinese text talking about amara.org having transcribed this conversation even though i have told the ai to transcribe it as such when theres silence in an audio

#

I would send a picture but this dumb### ai keeps deleting my image and timeouting me

full root
#

has anyone tried transcribing using whisper in a thread? because it just seems to not halt for me which is weird.

#

Im working on live audio that I have a threaded function that reads mp3 frames and then given some volume threshold saves the audio and calls a new thread to transcribe it using whisper

#

but that thread just doesnt do any transcribing until I ctrl+C the program

#

then the rest of that threaded function executes and I get the transcription

autumn bolt
#

How do I use Whisper?

#

How do i even access it?

full root
#

pip install whisper

autumn bolt
#

I'm interested in watching a YouTube video that's in another language. While there are English subtitles available through "auto-translate", I'm curious to know if OpenAI offers a tool that can analyze the foreign language audio and convert it into an English voiceover?

#

Can someone do this for me? ^ I have no idea how to use Whisper

full root
#

especially the call to "FeatureExtractor" (I am using faster-whisper)

#

My plan is to use a process instead

rustic atlas
#

Good evening, does anyone know if the Whisper API can be used to generate an .SRT file with the transcripts?

#

I know it can be done by running the model locally with the Whisper package, but I don't have that computational capability and I would like to use the API

brave kestrel
#

I'm sure it's possible

#

maybe use ffmpeg to separate the audio to a usable streamtype then use python or whatever to make the api calls, and generate SRT from the transcript

#

Commission somebody if you're not a programmer. Or ask GPT to help your journey

tender rivet
kind skiff
#

<.<

.>

tender rivet
#

oops, sorry for the ping tho =P

prisma sinew
#

Neat. I've got something very similar set up, not a parrot so much as a two-way, but similar

tender rivet
#

I Loved the fact that it is integrated with foundry

kind skiff
#

let me get my macro for ya'll then

prisma sinew
#

I still need to decide the best method of natural capture, like how long of a pause, and background noise ignoring

#

Right now it just waits for a 1 sec pause lol

kind skiff
#

chatgpt gave me a great technique

#

Okay:
main macro - https://gist.github.com/thebwt/8142c510c2d2ae31f0c8e6bbfb45016a
python fastapi endpoint to do the whisper stuff (I think I don't need this in the end... but haven't refactored yet) -
https://gist.github.com/thebwt/003f58a4454876c4e706b7d5875b6fbb

i'm lazy, so I just ssh port forwarded the running service to localhost.

source chat: https://chat.openai.com/share/8a8313a1-bc4e-4723-9aa5-702ed15b992a

Gist

foundry macro with stop detection. GitHub Gist: instantly share code, notes, and snippets.

Gist

Whisper middleware api. GitHub Gist: instantly share code, notes, and snippets.

kind skiff
#
        mediaRecorder.stop();
      }```
prisma sinew
#

kek

autumn bolt
#

Can whisper only be installed on Windows?

#

Has anyone installed whisper to a MacBook that has Parallel Desktop installed (Windows VM)

brave kestrel
#

do you mean a local installation of the model or the api?

tender rivet
#

you can run whisper on anything that can run python (and has good enough hardware)

#

the OS is not a concern

autumn bolt
brave kestrel
#

Depends on what you want to use it for

#

it's likely somebody has built an app for that and whisper may not be ideal. It would probably involve installing programming tools and writing some code

#

You have a macbook so you have python. probably somebody could give you a couple terminal commands and it would start spitting out your transcript

brave kestrel
# autumn bolt Has anyone installed whisper to a MacBook that has Parallel Desktop installed (W...

If you have Chrome all you wanna do is transcribe, I have you. Here's a branch off another project I made. Save the index.html and run it in chrome.
https://github.com/danomation/chrome-gpt-assistant/tree/transcriber

GitHub

Simple javascript client side voice assistant for desktop chrome. - GitHub - danomation/chrome-gpt-assistant at transcriber

brave kestrel
#

It wont stay on due to privacy so that limits its use

stoic jewel
#

Lol, whisper just came up with an advertisement for an existing webshop by transcribing a silent wav file...

#

Go to Beadaholique.com for all of your beading supplies needs!

#

Super weird

brave kestrel
#

yep, it gets hallucinations all the time

#

I'm a novice when it comes to whisper, but the more I use it the more I notice I need some way of handling dubious inputs

lime bobcat
#

Hi. Anyone knows of a windows dictation software based on whisper?

hexed tide
#

Does anyone know how many minutes can chatgpt listen and transcribe in app? Does plus customers has any advantage over this?

pure igloo
#

Hello, I'm facing some issues getting Whisper to transcribe acronyms in audios
E.g when a speaker says 'SDR', it should be transcribed as 'SDR', not 'as the are'.
I would like to include a custom dictionary where the user would pre-record how all these acronyms are pronounced + how it should be spelt. Does anyone know how I can do this?

I just need some quick and short fine-tuning so that the model can transcribe a list of 10-20 acronyms accurately for each audio. This list of acronyms varies for each audio hence fine-tuning the model every time, which would take 5-10h in the post below is not sustainable.
Thanks in advance!

https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2

potent pier
potent pier
pure igloo
#

thanks for the suggestion!

kind badge
#

Hey, I'm reporting a very annoying bug when it comes to whisperemedium, it captures text great, but sometimes it loops and repeats one word over and over again, can you fix it please?

trim citrus
#

has anyone ever run whisper jax in a production environemnt on a tpu

autumn bolt
#

How can I set up Whisper on my MacBook Pro to utilize its voices when using macOS' "Text to Speech" function for more realistic speech output, similar to the voices in ChatGPT's "Voice chat" feature?

jaunty linden
#

Does whisper support Brazilian Portuguese vs Portugal Portuguese?

brave kestrel
#

Imagine world scale on device STT data collection using phones

#

Major privacy concerns but possible

#

@autumn bolt Learn some basics with python or node.js. There's a lot of ways you can accomplish that goal. Python is my suggestion. You can use python modules to talk to apple's speech synthesis api.

#

What you'd do is import modules for whisper (bundled into openai) and macos speech. start a python file, create an api call to whisper (or the module for the self hosted model) , do whatever you want to with the code, then send it to macos's tts with this
https://pypi.org/project/macos-speech/

#

Probably possible to do this without any api calls using the opensource whisper models instead of burning api time

#

You dont have to be a hardcore programmer. There's libraries for just about everything with python

brave kestrel
#

I asked gpt4 and it thinks you can just use "say" to accomplish the tts. Anyway... here's it's code that may or may not work. Good launching point. Get an openai api key to start. Add all this to a speech.py file, edit it, then start it in terminal

import sounddevice as sd
import scipy.io.wavfile
import requests
import openai
from subprocess import call

def main():
    recording = sd.rec(int(10 * 44100), samplerate=44100, channels=2)
    sd.wait()
    scipy.io.wavfile.write("output.wav", 44100, recording)

    response = requests.post("https://api.openai.com/v1/whisper/recognize/async",
                             headers={"Authorization": "Bearer <your OpenAI key>"},
                             data={"file": ("audio.wav", open("output.wav", "rb"), "audio/wav")})

    transcribed_text = response.json()['task']['postprocessed']['utterances'][0]['postprocessed']

    call(["say", transcribed_text])

if __name__ == "__main__":
    main()

replace "<your OpenAI key>" with your actual OpenAI key

#

Requirements:

pip install sounddevice
pip install scipy
pip install requests
pip install openai

Use self hosted whisper if you want. Might require some more imports and config

woeful plaza
#

Is the API server down?

lapis jacinth
lapis jacinth
#

Interesting that ChatGPT Voice is still operational (at least for me) while the public Whisper API is down. hmmm
I would have expected that ChatGPT was dependent on that service too.

sly nymph
#

I feel so much better now... released a fun dictation thingy-mo-bobber using whisper today to some friends, tested thoroughly of course - and now I'm getting nothing but error reports and can't even get it to work myself:

Whisper Transcription Error: Object reference not set to an instance of an object.
==========================================================================```
#

So... you're all not alone, it's me too - and my friends online. Whisper has gone completely silent. 🤫

sly nymph
#

Seems to be back online and working now:

high mountain
#

Hello, I wonder how is it possible that Whisper in ChatGPT during the voice conversation is so fast, but when using as API is slow as a turtle.

daring island
#

Seems to be back online and working now:

sly nymph
hexed tide
#

Does anyone know if ChatGPT plus subscription allows recording for an hour using speech to text?

hexed tide
small juniper
#

Anyone know if the real-time Whisper API at OpenAI or Azure will allow fine-tuning any time soon?

stone breach
#

Heh, neat. Transcribed a Flemish show with medium, then translated the same with large-v2 whisper, and then.. asked chatgpt-4 to translate sections of the english output to flemish and then the original flemish transcription output to english in the same convo -- nearly flawless english subtitles. Interesting how combining the two transcriptions gives just the right context to get a really clean output.

dreamy plover
#

I'm planning to host hugging face whisper large-v2 model on Sagemaker. Let me know which instance should I use And I'm looking for transcription output within 5 minutes for a audio of length 30 - 45 minutes.

dense spruce
#

Hey everyone, I hope it is the correct channel to ask this question. I have a script and use stable-ts.
But it is telling me that the module "stable_whisper" is not found. I already installed stable-ts + whisper

formal bramble
#

This is perhaps a newbie observation, but I am used to voice typing and speaking various punctuation. I love whisper, but at first was kind of annoyed it would type out spoken punctuation. But I just realized when I emphasized the word THE, it typed it in all caps. I did not realize it listened to intonation and so I tested it out and depending on if I sound excited, it will put a period or exclamation point. Somehow, that was just very impressive to me. That is all. 🙂

surreal dragon
light marsh
#

Hello, everyone.
I am developing STT model now, But I am confusing how should I do for it at first. let you provide me some advise.

elfin bear
#

Hi, I'm using Whisper to generate subtitles for a music video. I use the 'max_line_width' and 'max_line_count' flags to format the output the way I want. Though, Whisper does not separate lines the way I need: on one line there is the end of a sentence and the beginning of the following sentence. Whisper does seem to detect that it is a different sentence as it generate an upper case letter. Do you have any insight on how to make whisper break lines at the end of sentences?

dreamy plover
elfin bear
#

Sure, here are the first 3 lines. It is in french, but I hope this can still illustrate the point:

1
00:00:12,760 --> 00:00:14,900
Les Cyrânes Près du coufre, loin du

2
00:00:14,900 --> 00:00:17,900
ciel Plus je souffre et moins je

3
00:00:17,900 --> 00:00:21,180
sais Si ce que je crois est vrai

The upper case letters are good. What I would want to get:

1
00:00:12,760 --> 00:00:XX,XXX
Les Cyrânes

2
00:00:XX,XXX --> 00:00:XX,XXX
Près du coufre, loin du ciel

3
00:00:XX,XXX --> 00:00:XX,XXX
Plus je souffre et moins je sais

4
00:00:XX,XXX --> 00:00:21,180
Si ce que je crois est vrai

#

This is with max_line_width=35 and max_line_count=1

#

When asking for the output as text, the line breaks are good:

Les Cyrânes
Près du coufre, loin du ciel
Plus je souffre et moins je sais
Si ce que je crois est vrai

solid pollen
#

Anyone has the perfect prompt/workflow to take Whisper output as text (ideally with timestamps), pass it through GPT-4 and get it formated in paragraphs according to the topic discussed?

still moon
#

When training/fine-tuning whisper locally, using:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py

..my load_model() failes (as does torch.load() on anything I try). The training doesn't create a .pt file that matches openai-whisper's .pt files (which are ZIP files). The actual files in an official .pt are like:

$ unzip -v tiny.pt | head -7
Archive:  tiny.pt
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
   19363  Stored    19363   0% 1980-00-00 00:00 49a57bdd  archive/data.pkl
  344064  Stored   344064   0% 1980-00-00 00:00 e2b0aff3  archive/data/0
 1152000  Stored  1152000   0% 1980-00-00 00:00 72f960e2  archive/data/1
     768  Stored      768   0% 1980-00-00 00:00 2a87d98d  archive/data/10

Whereas I get a directory created with these files:

$ ls -1s
total 154444
4 README.md
36 added_tokens.json
4 all_results.json
4 config.json
4 eval_results.json
4 generation_config.json
484 merges.txt
147528 model.safetensors
52 normalizer.json
4 preprocessor_config.json
4 runs
2776 small.pt
4 special_tokens_map.json
2424 tokenizer.json
280 tokenizer_config.json
4 train_results.json
4 trainer_state.json
8 training_args.bin
816 vocab.json
GitHub

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

#

anyone know how to get a proper .pt out of this thing? small.pt is not a proper zip:

$ file small.pt
small.pt: Zip archive data, at least v0.0 to extract, compression method=store
$ unzip -v small.pt
...End-of-central-directory signature not found. Either this file is not a zipfile, or...
#

(and obviously that 2mb small.pt file is not the full model anyway)

still moon
boreal island
#

Hey, guys. Do you know if it is possible to set prompt to whisperAi on .Net?

quiet delta
#

hey everyone, a couple of days ago I built a voice-cloned text to speech assistant capable of answering questions in meetings, or in any peer to peer environment. Check it out:

https://github.com/AndreSlavescu/Meeting-Buddy

GitHub

automate answering in live meetings by generating readable scripts on the fly - GitHub - AndreSlavescu/Meeting-Buddy: automate answering in live meetings by generating readable scripts on the fly

lethal briar
#

Does anyone have experience resolving issues with ffmpeg on a Mac? have a Python file that uses whisper to transcribe some audio, but whenever I run it I get:
[Errno 20] Not a directory: 'ffmpeg'

I installed whisper using:
pip3 install -U openai-whisper

tried also using pip3 install ffmpeg-python, successfully installed ffmpeg. but running the file still didn’t work, so to further troubleshoot ran which ffmpeg. this gives ‘ffmpeg not found’

i then tried to add the ffmpeg executable to my .zprofile (as my terminal window is using zsh not bash) but this doesn't seem to resolve anything either, any suggestions?

dusky ibex
#

Hi everyone, I'm noticing a small bug with the "client.audio.transcriptions.create" endpoint where I will send a message in french and it will transcribe it in english. Is anyone else experiencing this? I guess maybe the "audio.transcription" API is getting confused with the "audio.translation" API?

muted axleBOT
#
Whisper v3

We are releasing Whisper large-v3, the next version of our open source automatic speech recognition model (ASR) which features improved performance across languages. We also plan to support Whisper v3 in our API in the near future.

cunning umbra
#

does whisper-1 point to large-v3 now or is it still large-v2

woeful kraken
shell thunder
sudden aspen
#

fork your own ffmpeg and put on whisper folder

true saddle
#

TTS question. any voice that sounds decent in spanish?

sudden aspen
#

on normal voice i use juniper idk about tts model sorry

prisma sinew
kind berry
quiet delta
#

You just need a short 20 second audio clip, even 2 second will work but it won’t be as good

#

I recommend 20 seconds since that’s what I found to work great

#

And it does a complete voice clone and will read the output texts to your questions in your brothers voice

#

If you have a powerful GPU the processing will be much faster, if you have cpu, it’ll take some time. But it’s completely on the fly

stiff hill
#

I'm looking to use whisper to transcript a podcast, is it capable of identifying speakers?

For example:

Speaker 1: "hello and welcome!"
Speaker 2: "To today's video!"

elder tapir
prisma sinew
dry bough
#

Is there any free website where we can use whisper to transcript our audio files using our API?

Or any chrome extension that transcripts videos run from any source?

prisma sinew
#

Building your own little tool with help from ChatGPT could be a great learning experience too

dry bough
zenith shore
#

Something went wrong. If this issue persists please contact us through our help center at help.openai.com. How can we fix this problem?

#

My chartGPT account has been experiencing this problem since this afternoon

obtuse wyvern
#

Folks whisper iis free to use?

vocal inlet
#

Is this server error?

clear badger
#

I am facing the same problem, I tried to build GPTs and when I tried to upload files all my OpenAI collapsed and now I am unable to chat, create and do anything with my main account. (This is the error I am recieving: Something went wrong. If this issue persists please contact us through our help center at help.openai.com.)

delicate edge
#

I've got the same issue: Something went wrong. If this issue persists please contact us through our help center at help.openai.com.

cursive tangle
#

same issue here

weak bridge
#

Good morning! Are you having trouble accessing it?

mystic swallow
spare wren
analog eagle
prisma sinew
cunning umbra
#

have been working on getting faster-whisper updated

prisma sinew
#

Its not too slow in my experience but if you want instant maybe so. Do smaller batches

weary trail
cunning umbra
#

faster-whisper is a ~4x faster reimplementation

void plume
#

is this integrated with ChatGPT?

#

is there a way to play around with this using ChatGPT?

#

Or this must be using the code?

#

Anyone is here?

prisma sinew
#

You can use whisper and voice with ChatGPT on the app

#

But it’s not really a thing you can save or build without using api

void plume
#

I see. I'm trying to see the process of re-making the Korean song lol

#

I thought it was really damn cool

eternal jacinth
#

Hi! Do you konw whether the web app is built based on gpt-4 turbo now?

polar pendant
#

it is

gray spoke
#

I tried to create a script that:

  1. Converts MP4 video to MP3 audio using ffmpeg
  2. Transcribe the video using whisper-1
  3. Translates the video trying to maintain the timestamps

It does the 1. and 2. section very well, but when it comes to translate and maintain timestamps it starts bugging a bit.

For example:

1
00:00:00,000 --> 00:00:07,820
se torna difícil.

Well, here we have a samba in 11 by 8...

It has a non-translated phrase before the translated one.
Someone successfully created a script that could translate and maintain the same timestamps?

#

(I hope that's the right channel to ask)

prisma sinew
sudden aspen
#

whispererr on chatgpt when

gray spoke
teal sphinx
#

Anyone have a good js voice detection package/api that can work well with whisper? Not a lot of info online

forest oxide
#

Hello, any updates in the official documentation?

misty spear
#

Hi can report a error here in OpenAI will the openAI team attend the issues

misty spear
lofty aurora
misty glacier
#

ive made music on ableton forever, i never looked at an ai music maker thingy. im really stoned(sedated thinking), how good of results does it give? can i get specific instruments to remix into a daw?

chilly frigate
#

Whisper 3.0 amazing

dense bone
#

best mac or web app that uses whisper so I can quickly speech to text?

nocturne thistle
little iron
#

Hows it going yall

undone spindle
normal dirge
#

Can I get text of audio file using whisper although that audio file is 250MB ?

#

Thanks in advance for your kind help.

pure veldt
#

Is that possible create near realtime translation (audio->text <> TTS/text->new-lang). Or translate recorded meetups to other langs? Did you see similar GitHub repo, example?

hallow totem
#

Hi everybody. This could be a silly question. I've been researching text to voice for an app I want to make. E.g.

User speaks into mic > Text appears

Most Browsers come with this functionality built in these days.

Is this likely how the ChatGPT app works OR is that one of the tings that whisper can do for me but better?

limber flower
#

Hi All, based on the doc, seems Text-to-Speech only support python, not node, is this right?

vocal oasis
#

Hi everybody. I have a question about speech to text. Is there any way to convert voice into text in real time? For example: I enter a voice file as soon as I click start, where will it translate the voice into text?

neon moth
#

Is Whisper a in-real—time speech-to-text AI? I am an online student and looking for ways to attend classes and having subtitles being fed through the audio.

vocal oasis
wind peak
#

Anyone know what is this "Whisper-1" model? Because i cannot find model in whisper that has this name?

wind peak
simple hornet
#

Any ideas about this? chunks are working mp3 files so they don't seem corrupted

#

transcription works for the first chunk but getting that 400 for the rest

raw vortex
#

@simple hornet how large of a file are you working with? And what size are you chunking it too? The “Invalid File Format” is a weird one

simple hornet
#

i made the chunks 20mb, the first one works, 2nd and 3rd don't (sizes 20mb, 20mb, 12mb)

raw vortex
simple hornet
#

hmm weird. thanks tho

vapid lava
#

hey there, can someone confirm only the large model is new v3 for whisper right. there is no insane new tiny v3 model?

#

cause i need a tiny one to run on android

remote fog
simple hornet
#

looking for info about transcription for multiple speakers... i feel like it must exist, just not sure where to start

still moon
#

Got my own q: I'm using hf's seq2seq to fine-tune whisper models, but I'm preparing audio+transcription data for someone who breaths on a ventilator. I'm wondering if I can put "<|nospeech|>" tokens directly in my training text for the pauses amidst her sentences where the ventilator takes a breath, as well as to mark some areas just of ventilator noise so the model can learn, "this is NOT her speaking."

buoyant sonnet
# remote fog Subtitle: Your open-source, self-hosted subtitle generator for seamless language...

➜ subtitle git:(master) python3 subtitle.py example/story.mp4
/Users/ed/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
Model base exists
ERROR:root:/bin/sh: ./binary/whisper: cannot execute binary file
ERROR:root:Error while transcribing
ERROR:app.core.app:An error occurred: Error while transcribing
ERROR:root:An error occurred: cannot unpack non-iterable NoneType object

On my M2 Macbook - it downloaded the model ok (after I pip3 installed ffmpeg and gdown) but then I ran into this

#

guessing I have to chmod whisper or something?

remote fog
#

I will add support for Mac within the next 24 hours.

buoyant sonnet
remote fog
#

Nhh

#

The models used in this project are a bit different

buoyant badge
#

Hi! I'm new here. I'm devvng a chatbot in python.

buoyant sonnet
buoyant badge
#

hmm. Whisper - just learning about it.. looks interesting

wind veldt
#

whisper API question - any luck getting it to ignore silence in longer transcriptions (>30min)? it might be a bad prompt, but i either get something like Silence. Silence. Silence. ... or Yeah. Like. Yeah. Like. ... in between long pauses.

autumn bolt
#

happy_avocado Whisper 3

normal dirge
autumn bolt
#

Anyone know how to use Whipser with the openai npm package in Node.js? It worked for me before but no longer works after the update to version 4.20.0

unreal thorn
#

SH#

grave eagle
# neon moth Oh, same then

You can use the input of your microphone and split that into chunks and feed them to the API. Real-Time is a word, as AI, which is quite flexy. Speaking of Terms like in embedded systems: No, its really absolutely not real-time. When you attend class and you have enough calculation power in your laptop: Its okay to attend with a short delay. Whisper meanwhile does not support that out-of-the-box. Separating in chunks has to be done by your software. There is in fact a module for python for that called "whisper_mic" by Blake Mallory which appeared to work fine for me.

gray spoke
brave turret
#

Anyone know if it is possible to add prompt to huggingface distil-whisper?

normal dirge
brave turret
#

Whisper is a transformer model with language modeling head on top which predicts next token just like GPT with the context of previous tokens. So because of that u can by hand append extra tokens for each chunk of audio while decoding. This can be used to provide or “prompt-engineer” a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those words correctly, but I don't want to implement it myself was wondering if hf had it out of the box like openai api does:
https://platform.openai.com/docs/guides/speech-to-text/prompting

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

normal dirge
#

transcribe(filepath, prompt="")
You can actually try in your new whisper model

vapid lava
#

anyone knows if they will also release a tiny v3?

cunning umbra
cunning umbra
#

whisperx integrates everything but hasn't been updated for v3 yet

split salmon
#

it's really weird and i'm not really sure what to do

#

i got stuck for a while on this

unkempt arrow
#

Hey guys, can the API return an object with start time end end time in ms for each word ? is punctuation separated ?

simple hornet
#

@unkempt arrow doesn't seem like the API suports it from what I've seen, but take a look at WhisperX, whisper-timestamped, and faster-whisper on github

unkempt arrow
#

@simple hornet i'll check it thanks. do you know what the json or json-verbose are returning compared to the "text" ?

split salmon
simple hornet
normal dirge
#

Is there anyone who had set up local whisper using GPU?
I am having problem,and would be helpful if someonce can advice me

dapper bridge
#

Hey, quick question about Whisper.
I'm working on a subtitles for my video. Whisper catches my speech perfectly, but most of the times it leaves a ton of sentences in one block in srt.
Is there a way to force Whisper to just throw only one sentence in one line? It would help a ton.

small juniper
#

Is there anyone who had set up local

shrewd rose
# dapper bridge Hey, quick question about Whisper. I'm working on a subtitles for my video. Whis...

What I would do is either write some parsing algorithm that separates them evenly pondering the number of characters/length of the speech or use an external software to sync the subs with the text (basically you transcribe with whisper then you sync with something else). Look up for "thio joe youtube subtitles" on youtube, he mane a nice video and he explains how he syncs whisper with the actual video. The video is titled I Created Another App To REVOLUTIONIZE YouTube

main flame
#

I shared my GPTs publicly there are 8 comments on that but i cant able to see it how to see that

steady needle
#

I want to code when whisper TTS changes voices. Basically I want to record a conversation but then have two ai's do the talking to each other for anonymity. Is it possible?

prisma sinew
#

If you have separate audio feeds for each user, then you can do that. I have a system that takes combined audio and then splits based on user before feeding to whisper. You get individual transcription that way and then feeding it to TTS is simple

magic nova
#

Hello everyone!

I’m currently running Whisper locally in UE using the Runtime Speech Recognizer plugin. It works quite well, but I’m looking for faster and more accurate recognition. The small language models are very fast but unfortunately, they are extremely inaccurate in Hungarian, almost unusable. Only the large quantized language model begins to be usable, but it’s not precise enough and is incredibly slow. Can anyone help me with how to make a smaller language model more accurate using the available settings? I don’t fully understand what these parameters do. Also, is there a way to use Whisper with only Hungarian language libraries, since I only need Hungarian recognition? I’m guessing that could be smaller and maybe faster. Might be a silly question, but I’d appreciate any help. Thanks in advance!

near yew
#

I've tried using whisper large on the OpenAI API and it's pretty fast

sand nacelle
#

has anyone tried the TTS with non-English languages? i want to know if its able to read non-English texts well

near yew
sand nacelle
magic nova
normal dirge
magic nova
# normal dirge result = model.transcribe(filepath, language="pt", fp16=False, verbose=True) Yo...

Prince, I don't quite understand what you wrote. In the settings options, I only have what I showed in the screenshot and of course, I know how to change between different Whisper language models. The large model works well but is very slow, and I need this for a real-time application, hence my question. Based on the parameters visible in the picture, what would you recommend as the most accurate configuration for a smaller language model? And yes, the language is set to Hungarian. 🙂

spark minnow
#

hi

gray cosmos
#

has anyone figured out how to control (reduce) the duration each output segment to somewhere between 0.5 to 1 second ?
My output on average varies between 2 to 5 seconds

    const res = await openai.audio.transcriptions.create({
      file: fs.createReadStream(filePath),
      model: "whisper-1",
      prompt:
        "the duration of each output segment must be between 1 to 2 seconds",
      response_format: "verbose_json",
    });
    return res;
dusty isle
#

Question: Does whisper work reliably with German Audio?

small juniper
#

has anyone tried the TTS with non-

glass barn
#

Hey guys, did anyone have deployed Whisper on Android and inference via GPU?

golden fjord
#

Hello, I am pretty sure I made everything true but I am getting an error like this

#

Error loading "C:\Users\MSI-NB\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.

#

how can I fix this cudnn_cnn_infer64_8.dll thing?

#

I need the AI really fast

wispy thistle
#

i believe the file is missing from the directory, can you check ?

#

also make sure you have nVidia GPU Computing Toolkit installed

golden fjord
#

It is there let me send a screen shot again

wispy thistle
#

ok

wispy thistle
#

winerror126 means the file is not there or it cannot be loaded due to missing dependency !

golden fjord
#

It is here

golden fjord
wispy thistle
#

cudnn_cnn_infer64_8.dl is cudnn v8 , you need to check your pytorch installation ( check compatible cuda and cudnn)

golden fjord
#

How can I check it?

wispy thistle
#

can you tell me how did u install pytorch?

#

exact command you used ?

golden fjord
#

used this code

#

@wispy thistle

wispy thistle
#

run this cmd

#

nvcc --version

golden fjord
#

here it is @wispy thistle

wispy thistle
golden fjord
#

Yeah thank you

#

I also send you a friend request

#

Oh I can't haha

wispy thistle
#

yea, ill send u

woven pier
#

where can I download the large model from manually?

#

this is taking forever

versed juniper
#

Hey guys I'm working on a whisper transcription app using NextJS however I'm getting:

  error: {
    message: 'Could not parse multipart form',
    type: 'invalid_request_error',
    param: null,
    code: null
  }
}```

Can someone please check my code to correct me where I'm wrong?

```import axios from "axios";
import { useRef, useEffect, useState } from "react";

const model = "whisper-1";

export default function UploadPage() {
  const inputRef = useRef();
  const [file, setFile] = useState();
  const [response, setResponse] = useState(null);

  const onChangeFile = () => {
    setFile(inputRef.current.files[0]);
  };

  useEffect(() => {
    const fetchAudioFile = async () => {
      if (!file) {
        return;
      }
      const formData = new FormData();
      formData.append("model", model);
      formData.append("file", file);
      console.log(formData);
      fetch("https://api.openai.com/v1/audio/transcriptions", {
        method: "POST",
        body: formData,
        headers: {
          "Content-Type": "multipart/form-data",
          Authorization: `Bearer youropenaikey`,
        },
      })
        .then((res) => res.json())
        .then((data) => {
          console.log(data);
          setResponse(data);
        })
        .catch((err) => {
          console.log(err);
        });
    };
    fetchAudioFile();
  }, [file]);

  return (
    <div
      style={{
        backgroundColor: "#f2f2f2",
        padding: "20px",
        borderRadius: "8px",
      }}
    >
      Transcribe
      <input
        type="file"
        ref={inputRef}
        accept=".mp3"
        onChange={onChangeFile}
        style={{ display: "block", marginTop: "20px" }}
      />
      {response && <div>{JSON.stringify(response, null, 2)}</div>}
    </div>
  );
}

Any help greatly appreciated!

queen solar
#

Hello! I've got a problem with the whisper API not translating from french. Any one has an idea why?

mental grail
#

How do I achieve diarization from the output of openAI whisper model?

runic wagon
#

Hello, with the new python update for openai, I am having some trouble running some code from the Spyder IDE for transcription. Here is the code:


client = OpenAI(api_key=keyishere)
model_id = 'whisper-1'
audio_file_path = "C:\......."
audio_file = open(audio_file_path, 'rb')

response = client.audio.transcriptions.create(
    model=model_id,
    file=audio_file,
)
transcription_text = response.text
print(transcription_text)```

The code executes, and gives these errors:

 ```File ~\anaconda3\Lib\site-packages\openai\_base_client.py:1096 in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:856 in request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:894 in _request
    return self._retry_request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:966 in _retry_request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:894 in _request
    return self._retry_request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:966 in _retry_request
    return self._request(

  File ~\anaconda3\Lib\site-packages\openai\_base_client.py:908 in _request
    raise self._make_status_error_from_response(err.response) from None

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}```

Looks like it is attempting three times and then cutting it. I definitely have the space in my plan, I think (I have $18 in my free account). I am running this on the free version of ChatGPT. Anyone have an idea what could be wrong? ChatGPT hasn't figured it out yet lol.
kind badge
#

I don't know if this is the right place but I have to complain

#

As much as I am delighted with Whisper and the fact that things that were once incomprehensible to me become understandable thanks to this tool, I am equally frustrated

#

I transcribe musicals in Italian and it is a very difficult and unsatisfying job. We know that Whisper has great potential and I know that one day it will be perfect, but why can't it be perfect now...?

near yew
kind badge
#

various

near yew
kind badge
#

Not being accurate, simulating, generating various nonsense, I would like to point out that I use tears in the subtitleediting program

spring relic
queen solar
spring relic
# queen solar I haven’t but I’m quite interested. Thanks for sharing. That is not an official ...

i figured it out. can use this mechanism to figure out if there's voice in the audio clips or not and where they are located: https://github.com/snakers4/silero-vad

GitHub

Silero VAD: pre-trained enterprise-grade Voice Activity Detector - GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

#

so i basically wrote a small function using this:

import torch

SAMPLING_RATE = 16000 #48000
torch.set_num_threads(1)
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad', model='silero_vad', force_reload=True, onnx=False)
(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

def has_voice(file_name):
audio = read_audio(file_name, sampling_rate=SAMPLING_RATE)
speech_timestamps = get_speech_timestamps(audio, model, sampling_rate=SAMPLING_RATE)
if len(speech_timestamps)>0:
return True
else:
return False

weary skiff
#

How the hell I fixed the issue with the timestamps starting from 0 even that the speech is not? also this issue seem to related to further issue when the transcribe afterword's not aligned with the speech.

chilly sluice
#

is there any way for me to figure out if a provider is using whisper as their transcriber? any specific quirks with whisper i can test?

weary skiff
#

I really need the high level top gear that already play with it. I finish installed and run and all work but I want to make some adjustments to make the output more as srt pro file and not just mouble jumple. sooooo who is with me?

#

my goal is that at the and maybe to use spacy to merge with logic cues and such... I am open to ideas

autumn bolt
#

Is there an argument I can use to make the timestamps slightly longer? It cuts out too early.

spring relic
#

not sure what you are trying to do?

viscid mauve
#

Are we allowed to talk about whisper.cpp in this channel?

spring relic
#

only in whispers..

tender rivet
alpine timber
#

Hey. Anyone ever tried to fine-tune Whisper while "boosting" certain words? Not necessarily while training, but could also be helpful during inference.
For example, target is "rotoescoliosis", but prediction ended up being "rot escolosis". I know that on inference time we have access to initial_prompt, but in my use-case, during inference time it's impossible for me to know what will be inserted in the initial_prompt.
tldr; I have a corpus specific to my context, and I want to "boost" the words in it during training or inference, without initial_prompt.
I'd be very thankful if someone could point me in the right direction!

robust basalt
#

Hey, is there anyone here who has worked with the whisper API in node.js? I'm trying to understand if the OpenAI node library supports it or not and so far, all I'm finding are a bunch of outdated code examples online. The official API documentation only seems to have a python example for sending audio to it... Am I missing something?

near yew
#

But I've done it through the HTTP API, not the library

#

Here's how I've done it:

require('dotenv').config()
const axios = require('axios')

const { Blob } = require('buffer')
const buffer = Buffer.from(YOUR_FILE_DATA_HERE)
let data = new FormData()

data.append('file', new Blob([buffer]), 'audio.mp3')
data.append('model', 'whisper-1')
const req = {
    method: 'POST',
    url: 'https://api.openai.com/v1/audio/transcriptions',
    headers: {
        'Authorization': `Bearer ${process.env.API_KEY}`
    },
    data: data
}
let transcription = await axios.request(req)
transcription = transcription.data.text
console.log(transcription)
thorn raft
# lost thunder

Im not sure if an ios screenshot on safari a sufficient response (but I don’t know @jdo330 what IDE do you use)?

lost thunder
zenith shore
#

I am unable to log in using Google Mail

#

why?

ocean arch
near yew
summer timber
#

Has anyone here had any experience training a custom whisper model? I'd like to explore some stuff, for example training it on SDH subtitles for [EXPLOSION] type notation, or on music to better pick out lyrics out of songs

#

How would the data preparation work, etc

#

I really hope it's easy like loading audio + transcription in and have it go at it

near yew
alpine timber
alpine timber
summer timber
#

Still doable?

#

oh nvm, it doesnt call for time aligned stuff

#

but it might still be a challenge to split the songs into manageable chunks, or could i train on the full 3-minute ish audios?

alpine timber
summer timber
#

not sure how much data i need for a decent result though

#

considering i'd just be training it more on english, and its already rather good at picking out lyrics from songs

alpine timber
small juniper
#

Has anyone here had any experience

#

I don't think so, it's not like you can

#

not sure how much data i need for a

warm pulsar
#

Is it possible to increase the volume (make the agent speak louder) when using text to speech?

#

or change the affect/feeling of the text?

woven pier
#

I'm running whisper locally and am wondering how I can get it to segment more. I'm trying to write subtitles for videos, but the segments with text are too grouped up

#

like I got two segments of 30 seconds each

warm pulsar
woven pier
#

I've got a 3090 so I can run two instances of large

#

Well, just barely run two

#

I can't do anything else

warm pulsar
warm pulsar
woven pier
#

when you use a model for the first time it will download it. that download will be slooooooow

warm pulsar
woven pier
#

yeah

#

it'll show up like 5 seconds early

warm pulsar
#

I would try this then:

  • Acquire transcription of the video.
  • Segment the transcription of the video based on the sentences which you just transcribed. For example, do string split "." to figure out where a sentence starts and ends.
  • So you will now have segmented sentences. Using these sentences, go back to your audio to find out where in the audio file itself a sentence could be possibly starting an ending (short pauses maybe) or somehow calculate the approximate duration of a sentence using tts (or maybe text based ChatGPT itself). Using this duration, segment the audio file, and then split your audio file and send it back to your whisper model.
#

@woven pier

#

I think there would be specific methods to figure out where speech starts and ends.

warm pulsar
woven pier
#

it didn't download faster than like 70 kbps

#

and I have gigbit internet

warm pulsar
# woven pier

Ahh, I miss those 2008 days where I used to have 70kbps internet

woven pier
#

don't you worry, they've got your back

#

👏

wheat kindle
#

hey do you guys have any news on the opening of the GPT marketplace ?

livid mauve
gray spoke
#

Hey guys,
There is a way to transcribe an audio with 1hr+ with whisper API without cutting the audio into parts?

#

I already tried to compress btw, but it is still too big

#

(more than 26MB)

gray spoke
#

I compressed to 32k, monoaudio and 22khz and I got it to 25MB

#

But when I try to use whisper it loads infinitely

#

I stayed with the PC turned on for one day and it did not finish dalle_tired

brisk nova
#

hey, am I allowed to share a link for my GPT here?

livid mauve
brisk nova
#

Ok, thank you

sudden adder
#

Does anyone know how to change the words per line for whisper's speech to text?

brave kestrel
#

Are you concerned about edge cases where it splits during a word?

fathom tangle
#

whisper token usage seems tom be fairly expensive compared to generations. I'm struggling to see the use case compared to just native text to speech for the cost. Am I doing something wrong?

brave kestrel
#
Model    Usage
Whisper    $0.006 / minute (rounded to the nearest second)```
#
gpt-4    $0.03 / 1K tokens    $0.06 / 1K tokens```
#

in theory it's far cheaper than gpt-4

#

turbo models are super cheap though

Model    Input    Output
gpt-3.5-turbo-1106    $0.0010 / 1K tokens    $0.0020 / 1K tokens```
#

Also whisper is STT not TTS. Openai does have TTS models as well

languid breach
#

Anybody can help me out, I'm using fastapi and I want to not write to file to transcribe my file-upload. I've spent hours but it I ended up going back to shutil cause in memory kept bugging out on filetype. Anybody have a snippet I could use?

gray spoke
#

I could just try to fix it manually, but I would like to do the process completely automatic

brave kestrel
#

Probably stuck with splitting it up

#

If you split the ffmpeg bits into smaller pieces and split off transcription you'll already be closer

weary island
#

hey folks - i have a very simple Q - forgive the noobness -

assume I am recording a lecture during a classroom, while the professor is speaking - i want to trigger OpenAI to summarize the lecture so far.

so I want to trigger it by saying something like - "hey {assistant} can you summarize the notes so far?"

  1. how can i do so?
    2.how do i remove that prompt as its not part of the lecture?
small juniper
#

hey folks - i have a very simple Q -

dapper bridge
#

hey, i'm having a bit of a problems with whisper-ctranslate2 (and normal whisper as well)
in the first minutes of a wav file, it spits out really long statements and after a while it starts to split it for some reason
i can't force it to do it one way or the other, any reason why it's happening and how to fix it?

raw oriole
#

Is whisper the same price when converting the audio from an mp4 to text and when converting like an audio recording

lone scaffold
#

Does anyone know of a way to using TTS (text to speech), but instead using a plan text, using an SRT subtitle? So essentially dubbing?

Basically I have a videos with SRT transcription, but wanna generate a dub using OpenAI's text to speech. Given that SRT includes timestamps, I don't want the timestamps to be read out loud, but used as guides for timing.

still moon
#

@lone scaffold commercial thing?

#

I've done something similar for existing_text + whisper_transcription alignment in a recent project.

#

it worked, well, perfectly. (at least I encountered zero failures). there are timing issues, by the way, with whisper's timestamps, so use the json output with word timings. lmk if you need help.

formal bramble
#

Does anyone know if there’s any iOS app, that is basically a keyboard alternative, but with whisper functionality built-in?

lone scaffold
# still moon <@736187863926046720> commercial thing?

it's for my own use actually, I found a really helpful course but it was in Chinese, but it had proper transcription, but I wanted to see if I could turn the SRT transcription with timing into a speech with OpenAI's TTS. How did you make TTS have delays that correspond with the SRT subtitles?

glad seal
# formal bramble Does anyone know if there’s any iOS app, that is basically a keyboard alternativ...

Sadly I don't but there is one on Android. It's pretty basic stuff and I can't find a GitHub repo but looks like "just some guy" free apps. I have used it for a good few months and also double checked security doing packet capture. Nothing was sent remote. The same guy has a great whisper powered notepad app too.
https://play.google.com/store/apps/dev?id=6674916867778158495

#

I'm the only one to have reviewed the note one, which is actually really nice. The keyboard is actually really good too. Just set the timeout to 30 seconds on the recordings because at the very end of each recording before transcription, there will be a little break in the audio that might make you miss a word. Really nice stuff. Works better than Google's voice typing even on my Pixel 6 accuracy wise. The note app has model selection from tiny.en up to base. I don't know what inference backend is used. You can also run Whisper.cpp in Termux on Android just installing through pip on Python 3. All you need is

pip install whisper-cpp-python
glad seal
formal bramble
still moon
#

that's from a court case.

#

(I'm kidding)

#

notice the word timings

#

I don't exactly know how accurate it is; you might want to evaluate it before you go too far.

#

Here's a script I wrote (sort of.. me and chatgpt since it's still a lot faster even for little things like this, imo) .. that converts the .json to audacity/tenacity labels.txt you can import (File -> Import -> Labels):

#
#!/usr/bin/env python3
import argparse
import json

def format_float(value, sig_figs):
    return f"{value:.{sig_figs}f}"

def process_json(json_data, options):
    labels = []

    for segment in json_data['segments']:
        if not options.no_full and segment['text']:
            label_text = segment['text']
            if options.probs:
                label_text = f"({segment['avg_logprob']}/{segment['no_speech_prob']}) {label_text}"
            if options.token_full:
                label_text = f"[{len(segment['tokens'])}s] {label_text}"
            labels.append(f"{format_float(segment['start'], options.sig_figs)}\t{format_float(segment['end'], options.sig_figs)}\t{label_text}\n")

        if not options.no_words:
            for word in segment['words']:
                label_word = word['word']
                if options.probs:
                    label_word = f"({word['probability']}) {label_word}"
                if options.token_words:
                    label_word = f"[{word['token']}w] {label_word}"
                labels.append(f"{format_float(word['start'], options.sig_figs)}\t{format_float(word['end'], options.sig_figs)}\t{label_word}\n")

    return labels
#

def main():
    parser = argparse.ArgumentParser(description='Convert Whisper JSON to Audacity Labels')
    parser.add_argument('file', type=str, help='JSON file to process')
    parser.add_argument('-o', '--output', type=str, default=None, help='Output filename')
    parser.add_argument('-nf', '--no-full', action='store_true', help='Disable output of full text string label')
    parser.add_argument('-nw', '--no-words', action='store_true', help='Disable output of individual word labels')
    parser.add_argument('-sf', '--sig-figs', type=int, default=3, help='Max significant float digits')
    parser.add_argument('-p', '--probs', action='store_true', help='Include probabilities in labels')
    parser.add_argument('-tw', '--token-words', action='store_true', help='Include token info in word labels')
    parser.add_argument('-tf', '--token-full', action='store_true', help='Include token info in full text labels')
    args = parser.parse_args()

    with open(args.file, 'r') as file:
        json_data = json.load(file)

    labels = process_json(json_data, args)

    output_file = args.output or args.file.rsplit('.', 1)[0] + '.transx.txt'
    with open(output_file, 'w') as file:
        file.writelines(labels)

    print(f"Labels written to {output_file}")

if __name__ == "__main__":
    main()
#

use with -nf so it'll only output the word tokens (otherwise it outputs the long spans of text AND duplicates it all as individual words too)

#

(whisper-json-to-labels someaudio.json will make someaudio.transx.txt)

weary saddle
#

Hi, I was wondering what kinds of audio preprocessing do you guys do before sending the audio file to Whisper API to get the best results?
For example, currently I'm doing the following steps to preprocess the audio file:

def preprocess_audio(audio_clip: AudioSegment, audio_file_path: str, channels=1, bitrate=16000):
    audio_clip = trim_silence_pyannote(audio_file_path, audio_clip)
    audio_clip = effects.normalize(audio_clip)
    audio_clip = audio_clip.set_channels(channels)
    audio_clip = audio_clip.set_frame_rate(bitrate)
    return audio_clip
pseudo cove
#

anyone know if anyone has integrated whisper on a telephony platform, like voximplant has with dialogflow? or if someone is known to be working on that?

languid iron
weary saddle
#

So far I haven't experimented with different audio formats yet because our inputs are pretty much just mp3 files.

languid iron
#

I get all sorts of formats, journalists transcribing raw interviews from whatever device they use, sometimes it’s just the iphone‘s default record app. I’ll try to convert everything to a tbd common format

alpine timber
weary saddle
#

I've noticed it hallucinates and sometimes repeats words on silences, overlapping speech, noise, anything that technically isn't human speech.

#

Hence I've been wondering if anybody managed to solve this via preprocessing the audio.

small juniper
#

anyone know if anyone has integrated

stone quiver
#

Thought it'd be neat to share that I used Whisper-JAX to transcribe 428 hours of audio in just under 6

gloomy mist
#

Hey, I'm currently using faster-whisper for TTS processing. However, my 15-second audio takes almost 2 seconds to process on my MacBook M2 with 16 GB RAM. Unfortunately, this is too long. I would like it to be closer to 1 second. Is there any way I can speed up the process? I would prefer not to use a model worse than 'small'. I will mainly be transcribing German audio.

from faster_whisper import WhisperModel
import datetime

model_size = "small"

model = WhisperModel(model_size, device="cpu", compute_type="int8")

start_time = datetime.datetime.now()
segments, info = model.transcribe("audio.mp3", beam_size=5)

end_time = datetime.datetime.now()

print(
    "Detected language '%s' with probability %f"
    % (info.language, info.language_probability)
)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

print("Transcription took: ", end_time - start_time)```


```Transcription took:  0:00:02.104246```
blazing cave
#

I plan on making my audio compression code available as an api.

wondering if there is an appetite for it (I use in my own app)

it usually returns an mp3 link that is 10% of original size.

UI version here:
shownotes.io/crush

If I get enough interest will make the api version.

autumn bolt
#

how do i cap the amount of text/time per line in the transcript? for example I want

[2.0->4.0]  consectetur adipiscing elit.```
but its giving
```[0.0->4.0] Lorem ipsum dolor sit amet, consectetur adipiscing elit.```
autumn bolt
#

replacement question! is there a way to get time per individual word

autumn bolt
#

nevermind

languid iron
sturdy jolt
#

Hello, we are having issues accessing the GPT-4 API for some reason it's not available on our account even though the account is paid and we have spent money on the GPT-3 API (and also, for some reason, we haven't been charged this month). We have been working on a SAAS project for real estate brokers for a long time, have made a custom plugin for GPT, etc., but unfortunately, we cannot use it. Can you help us with what to do?

modern ember
#
import os
import pygame
import speech_recognition as sr



def speak(text):
    voice = "en-US-ChristopherNeural"
    command = f'edge-tts --voice "{voice}" --text "{text}" --write-media "MichealOutput.mp3"'
    os.system(command)
    pygame.init()
    pygame.mixer.init()
    
    try:
        pygame.mixer.music.load("MichealOutput.mp3")
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(10)

    except Exception as e:
        print(e)

    finally:
        pygame.mixer.music.stop()
        pygame.mixer.quit()

def take_command():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration=0.5)
        print("Listening...")
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print("Recognizing...")
        query = r.recognize_sphinx(audio, language='en-us')
        print("hi")
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))

    return query

speak("hello, i am Micheal, your virtual assistant; How can i help you today")    
query = take_command()
print(query)
#

i want to implement whisper to this

#

this is a speech to text code

#

i am using sphinx

#

but sphinx isnt accurate

#

so can anyone help me

spark notch
# modern ember ```py import os import pygame import speech_recognition as sr def speak(text)...

Happy Friday https://chat.openai.com/share/b6606aff-ce32-4117-8d80-7ba421321ad8 ```import os
import pyaudio
import wave
import openai

Set your OpenAI API key

openai.api_key = "your_openai_api_key"

def record_audio(filename, duration=5):
"""Records audio from the microphone and saves it as a WAV file."""
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1024

audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)

print("Recording...")

frames = []

for _ in range(0, int(RATE / CHUNK * duration)):
    data = stream.read(CHUNK)
    frames.append(data)

print("Finished recording")

stream.stop_stream()
stream.close()
audio.terminate()

with wave.open(filename, 'wb') as wf:
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))

def transcribe_audio(filename):
"""Sends the audio file to Whisper for transcription."""
with open(filename, "rb") as audio_file:
transcript = openai.Audio.transcribe("whisper-1", audio_file)
return transcript["text"]

def main():
audio_filename = "recorded_audio.wav"
record_audio(audio_filename)
transcription = transcribe_audio(audio_filename)
print("Transcription:", transcription)

if name == "main":
main()

hot belfry
#

I am so lost. Is there a way to use Whisper without downloading anything? They say is in the api, but I cant find the [whisper] model

woven pier
misty ginkgo
#

How can I use OpenAI Whisper in JavaScript/TypeScript?

autumn bolt
#

I’m a subscriber, where’s my beta version with call option to other gpts and ability to locate and attach third party APIs? Can’t find it. GPT4 knows nothing of it…

tender rivet
autumn bolt
tender rivet
#

no idea of what you are talking about

#

the option to make API calls is under the "Actions" section of the custom gpt configuration

autumn bolt
#

Yes, but there’s a dedicated Actions GPT that’s meant to help with that.

tender rivet
#

there are some forked implementations of whisper that claim to run faster than the original and on CPU

#

I never tested, but the idea sounds pretty cool

pliant mural
#

Hello !
So I have a Mac laptop from 2017 (MacOS Monterey 12.7.1). I have installed Python 3, Pytorch, ffmpeg, pip, and whisper via pip. Every one of these installation processes seem to have worked well and have been fully complted. But when I lauch "whisper [path/to/audio/file.mp3]" it says "zsh: permission denied: whisper". I have tried to run my terminal with root permission, but even then it says "-sh: whisper: command not found". Can anyone help please?

full surge
#

172.17.0.1 - - [14/Feb/2024 20:22:08] "POST /asr?task=translate&language=da&output=srt&encode=false HTTP/1.1" 404 -

Why ?

inner quarry
#

Bruh, there any paperwork on putting whisper into a executable pythons great and all but isn't liable to be easily distributed to less advanced computer users.

brave kestrel
#

What do you mean

brave kestrel
brave kestrel
#

I think there's a self hosted version but it has requirements

#

Check out github

inner quarry
brave kestrel
#

Oh you literally meant creating an executable with Python

inner quarry
#

Yeahhhh

unborn fable
#

Anyone got any advice for bulk transcribing audio files? I was thinking renting a GPU server for a couple of hours and bulk processing with a self hosted model would be most cost effective. Also wondering if CPU models are catching up yet as these would be even cheaper.

inner quarry
brave kestrel
#

Does this work on android? Disclaimer, voice is sent to OpenAI and temp stored until whisper transcribes http://switchmeme.com/lel2

unborn fable
inner quarry
unborn fable
inner quarry
#

a typical pc can run the small.en model

unborn fable
#

PC without latest and greatest GPU?

inner quarry
#

So I would think I got a 100 dolla windows notebook here and I could run it. it was short audio

#

but it ran.

#

plus all the whisper running i've done has been cpu

unborn fable
#

Ok, thanks. I suppose I need to try it 😀

brave kestrel
#

anyone else using the javascript MediaRecorder API to record and send to whisper? on iOS the mediaRecorder.onstop gets triggered too soon

brave kestrel
#

lol wow... just increasing the volume sent to whisper worked

#

I used pydub

ripe shuttle
#

Is anyone having trouble recording audio from the browser on an iphone and sending the audio file to whisper api? It is only transcribing the first second of the recording and other times it will return random characters

misty ginkgo
#

Is the functionality to identify whether a user stops talking already provided in Whisper, like it is already in the ChatGPT app?

still light
#

I'm new here👀 what is whisper?

autumn bolt
#

Hi, is whisper capable of listening to the audio of a lecture recording and generating a clearer voice/new voice that can be dubbed over the video? My lecturer is very hard to understand due to his acent. It would be great if whisper can help!

surreal moon
#

I'm not a developer, but the code used by the program I use in Windows to use Whisper for dictation in windows look is very straightforward. I imagine you would need to implement something to stop each recording before the 25 MB limit is reached and automatically start a new recording.

#

The last time I looked in to natural sounding text to voice, they all had per unit pricing to use their APIs - but I have no idea how much it would cost etc

spice iris
#

Hey,
Any idea how to get "whisper" to transcribe exactly what the text says?

For example, when the voice contains even a mild "insult" (You can shut up), whisper generates completely random text.

For a voice assistant, this is quite problematic. Any ideas? 🙂

tawny narwhal
#

While trying to create the processor, memory goes insanely up, even with a very short audio, like 3 sec

#

It seems to be a glitch, but I can't find it. If anyone has seen this in the past let me know 😅
I'll try to create a small repo reproducing it

tawny narwhal
#

Got it! I was using stereo audio. It needs to be mono 🤦

half terrace
#

thanks for documenting your findings

muted axleBOT
#
<:book_icon:1171408210398289941> `` Rule 1 `` Be respectful.

Treat others the way you would like to be treated, and assume best intentions. Don’t harass or attack others, and don’t engage in hateful or generally malicious behavior (e.g. sexism, racism, homophobia, etc.). Keep the negativity to a minimum.

limber cedar
#

Why is whisper inference time not constant if the audio is always padded to 30s? For example, 30 seconds of input audio takes roughly 5x the inference time of 5 seconds of input audio.

surreal moon
#

Do you sir have a time machine? Because that would be an awesome way to make money through crypto!

#

I'm absolutely loving using Whisper through the API for dictation in Windows. However, I'm worried that using it for anything longer, like, is going to get expensive pretty quickly. Does anybody know if there are any other options just for Whisper in terms of cost? Because I kind of just was assuming that just using Whisper just for my own individual use was never really going to add up. But I think I might be wrong about that given I used nearly $1 on a particular day, and I still haven't' used it for any longer form writing yet. It might almost be worth it to use something like Tasker or an equivalent in Windows as some sort of hacky workaround to get it under the monthly subscription using ChatGPT?

#

The other possibility is that on that particular day I was getting used to using the Windows program and I might have been leaving a lot of dead spaces in between words and sentences? It's supposed to be about two cents a minute, correct?

broken belfry
#

There are locally run versions of whisper you could look into.
I can not vouch for the quality/speed or anything but have seen them on my "travels"

gilded oasis
#

is this the proper place to ask about faster-whisper and implementations of it?

still light
#

How to use whisper?

shrewd rose
shrewd rose
#

Hello there, I have been having this issue lately, where the model, I am using base, would completely hallucinate stuff. My transcripts are quite long, and I had this issue where at some point, someone said "I'm hungry." and then the model started just repeating "I'm hungry." all over until the end of the transcript. If aynone could help it would be much appreciated.

shrewd rose
#

Well I guess whisper was very hungry

#

anyways fixed it by setting vad=True

mellow pewter
#

is there an issue with the model requests? i keep getting request timeouts from my python code

#

im using my company's openai acc with its own subscription and key and everything and it worked fine in the past.
i just keep on getting erorr codes such as 503 or "request timeout" randomly

surreal moon
#

Has anybody else found that personal use of Whisper for dictation is extremely cheap? I was a bit worried about using it so much in Windows, given how easy it is to just, one hot key to start talking and one hot key to stop. And now that there's an Android keyboard, same issue, but I've been using it a lot. And I'm only at 93 cents for the month so far.

tranquil marten
#

anyone unable to get whisper to generate any files after transcribing?

#

no .srt .txt or anything

woeful verge
#

Hello guys.
I am currently trying to get whisper up and running on my machine.
Cuda is avalable and the standard device the model chooses to run on.
I checked with whisper --help
While using the CPU (AMD Ryzen 9 7900X3D) works like a charm, using the GPU (AMD Radeon RX 7900 XTX) doesnt work at all.
The following code:

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(f' The text in video: \n {result["text"]}')

raises this error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
    While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('HIP error: invalid device function\nHIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing HIP_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_HIP_DSA` to enable device-side assertions.\n',).

I dont know how to fix this.

sudden aspen
#

Are you on linux ?

#

That's a really specific error i'm sorry

#

1- what is HIP
2- why does it say kernel
3- i dont think its a hardware issue

woeful verge
surreal moon
#

If my phone can run the 'base' 74M model locally, sure my trusty GTX 750ti will make light work of the larger models? Right guys?

dim birch
#

hey guys, does anyone know how to get the Whisper python API/module to add punctuation consistently to its .words output in verbose_json mode when transcribing? it's a short audio recording & i put in the exact script (including punctuation) for the audio as the prompt. transcription.text has punctuation as expected but transcription.words has none unless it was two words inter-connected by punctuation with no space in the script (ie, it's..very works but it's.. very with the space doesn't)

(in case it's unclear, I'm talking about using the OpenAI python client & API, not running Whisper fully locally)

dim birch
dim birch
woeful verge
#

Tbh. I just sold my XTX and bought a 4090. AMD for AI ist just stupid and ROCm a piece of garbage

#

I tried SO many things. Updating the kernel to 6.6 on 22.04 Ubuntu, passing every Local variable like in the fixes from other people.

#

I uninstalled and reinstalled a ROCm-compatible PyTorch and reinstalled ROCm drivers for my respective ubuntu and kernel version

#

Also adjusted the HIP debugging level but that didn’t tell me anything.

woeful verge
neon cargo
#

guys, I really dont know which programming language to learn next, I am abu clueless about which one to learn, at this rate ai will become good at many of the languages, which is going to be very important in the future. And please no gpt type answsers

scenic remnant
#

I've been learning Python using it to create simple scripts to analyze Reddit data using PRAW, a Python Reddit API wrapper that has taught me what the heck an API even is

neon cargo
tacit gulch
#

Hey everyone! 👋

I'm diving into the integration of the Whisper API for a project that necessitates handling highly sensitive audio files, meant to be installed and operational within a highly secure local network. This setup is critical as it mandates that all data remain strictly offline for security reasons. I have a couple of key inquiries on this front and would greatly value any guidance or shared experiences:

Token Pre-Purchase: Is there an option to pre-purchase tokens or credits for Whisper API usage? We're looking for payment models that provide the flexibility to budget and plan ahead without immediate consumption.

Data Privacy and Usage Logs: Privacy is a top priority for us. In the context of Whisper's operation, can anyone shed light on the specifics captured within the usage logs? Are there any references to the processed data itself, or do the logs strictly record metrics such as transcribed minutes and the models utilized? Any insights into the data management and logging practices would be invaluable.

Navigating the secure and efficient use of Whisper API, particularly for projects with a high degree of sensitivity, is my current challenge. If you've had experience or have knowledge regarding deploying OpenAI technologies in similar secure, offline environments, I'd be incredibly thankful for your insights.

scenic remnant
#

Maybe it's Michael Phelps, but he probably isn't very good at basketball is he?

#

This isn't Pokemon my friend

neon cargo
scenic remnant
muted axleBOT
#
<:book_icon:1171408210398289941> `` Rule 1 `` Be respectful.

Practice kindness and positive regard. Harassment, hate speech (such as sexism, racism, or homophobia), or other malicious conduct will not be tolerated. Maintain a respectful and positive environment.

sudden aspen
neon cargo
sudden aspen
#

Then just save it as "clock.html" and run it thumbsup

ripe shuttle
#

Does anyone know how to split audio file into chunks to send to whisper api using nodejs running in Vercel Serverless Functions?

neon cargo
#

I need clues guys, CLUES

plain creek
# neon cargo I need clues guys, CLUES

Focus on building complex systems, and the languages used for those systems. Like C#. The language isn't as important as your ability to solve complex problems.

#

Enterprise

drowsy ruin
#

I’m finetuning Whisper with my own English data and need clarity on standardizing transcripts, especially regarding punctuation and capitalization. I reviewed appendix C (pg 21) of the Whisper paper but I couldn't find details on capitalization and punctuation for English transcription. Any idea how text was standardized when training Whisper?

neon cargo
plain creek
neon cargo
glad cedar
#

I got this and it's ok when I am choosing Chinese or Japanese

#

why

candid aspen
glad cedar
candid aspen
#

I alread said it cannot find the specified language... Cantonese in this case...

#

it's not finding it

fathom ridge
#

So I got ROCm working with whisper on the integrated gpu on a Ryzen 5700g (lol), but it's basically the same speed as CPU. Is there any way to use this GPU to actually speed it up or was this just pointless?

#

I suspect it was pointless

fathom ridge
#

Yea, this sucks. Keeps crashing the igpu requiring a reboot to clear.

upper sandal
#

@fathom ridge Try unplugging monitor from integrated port

fathom ridge
#

I'm using it. It runs a dashboard for my home assistant.

#

I can try it later but I want that more, heh.

upper sandal
#

Yes I'm sure you are using it. But it is hogging resources

fathom ridge
#

If it can't spare the resources to run a tab in firefox and do this then that configuration isn't suited to my needs.

upper sandal
#

Do you have only one port to plug a monitor into

#

Ryzen 5700g is equivalent to a RX 550 or GTX 560 -- light gaming

fathom ridge
#

yea, it's a small mitx pc with no dedicated gpu

#

I have things that run whisper fine, I just wanted to see if I could get this PC to be a bit faster. I wasn't expecting much, honestly I was surprised the integrated gpu on it even supported ROCm

upper sandal
#

Some of the prominent capabilities of the processor include SSE 4.2 + AVX + AVX2 + AES + VAES + AMD SVM + FMA + RdRand + FSGSBASE + BMI2

#

AMD is pretty good man

#

I run full amd as well

fathom ridge
#

I'm not getting bad performance, I was just trying to get the best possible.

upper sandal
#

What would you suggest for a desktop running Ryzen 7800X3D / Radeon 7800XT GPU / 32 GB Ram that isn't being used for an ai project? @fathom ridge

fathom ridge
#

idk what the 7800xt is good for in relation to AI, I have nvidia.

#

So I'd suggest finding a good game

#

This 5700g is the only time I've played with ROCm

quartz phoenix
#

@fathom ridge yeah you're going to need a beefier GPU, APUs are not exactly a well supported scenario to begin with
And AMD is notorious for not properly documenting which GPU supports what ROCm features.
if you have around 400 $/€ burning a hole in your pocket, you could get an RX 6800.
But even with the new windows support its still a far cry from CUDA on nvidia.

sudden aspen
upper sandal
upper sandal
#

i imagine fast hard drives also, like read/write speed. i cant answer for HD size.

#

for a LLM machine only, i think this is the kind of infrastructure you want

sudden aspen
#

yeah i wanted something like that, but ddr5. thanks alot for the help

fathom ridge
fathom ridge
#

Wishlist: VLC plugin that adds subtitles to any video

muted axleBOT
#
<:book_icon:1171408210398289941> `` Rule 7 `` No self-promotion, soliciting, or advertising.

Do not post or direct message any members of this server to promote non-OpenAI services, products, or projects.

fallow gorge
#

Bad GPT

uncut hedge
#

You would need to either stream the audio to Whisper or upload the whole thing. And such proposal can cost a lot of tokens

fathom ridge
#

Could just run it locally

hot belfry
#

is there a tutorial to learn how to use whisper? Is it difficult? I want to record a 20 minutes speech of myself

candid siren
lapis jacinth
brave shore
viral prism
#

ECONNRESET is killing me

willow grove
polar flume
#

Has anyone figured out a way how to use whisper for real time transcription without transcribe into nonsense?

long anvil
#

Hello!
I hope my question is relevant here. I am attempting to convert alignment results from WhisperX into a TextGrid file for the purpose of analyzing afterwards on Praat. I initially used Parselmouth to directly convert the WhisperX alignment, and I also tried to write the results to CSV in order to convert them into a TextGrid file, but it doesn't seem to work. Has anyone else done a similar thing before, or does anyone have suggestions on how to do this? Thank you!

tacit gulch
#

Hi, I'd like to know if there's a built-in method to extract the confidence score for each word in the output. Additionally, is there a feature or an add-on available that allows these confidence scores to be color-coded directly in the results?

rapid spindle
#

hi

#

when will v3 of whisper API be available? why the delay?

rapid spindle
# muted axle

When will you support v3 in your API ? It's been almost one year

whole agate
#

Can anyone help me use whisper here?
I want to make some code in python that utlizizes whisper to create a SRT file or a VTT that has timestamps but only single words per each I don't want sentences or paragraphs like these:

1
00:00:00,000 --> 00:00:05,000
Open AI has recently decided to open source.

2
00:00:05,000 --> 00:00:09,000
Their translation and transcription AI whisper.

3
00:00:09,000 --> 00:00:18,000
So now it is under an MIT license and that includes both the code that's here as well as the model weights that were used to train the AI.

4
00:00:18,000 --> 00:00:26,000
So if you want it to go and try and make your own speech transcription AI with that data, you are free to do so. ```

I want single words for each... Is there a way I could do this with whisper?
whole agate
#

anyone help?

#

please

granite pebble
#

Hello, I am using the API for text voice, and I am struggling to get the model to pause long enough between paragraphs. I have tried inserting 3 dots (...); an hyphen, but nothing works. Any ideas?

unreal kraken
#

Hello guys, after not using Whisper for 2 months, I got some issue.
The transcription is running well but at the end I don't have any text generated...

The key is (I guess): FileNotFoundError: [Errno 2] No such file or directory: '.\test.txt'

#

When I google it people have problem that after "No such file or directory:" they have ffmpeg, but I think my works well (I do reinstall), and there is nothing writte with ffmpego but '.\test.txt'.

restive reef
unreal kraken
restive reef
unreal kraken