#can anyone help me with python machine learning?
73 messages · Page 1 of 1 (latest)
i have to make a program that get a sound of person talking and it needs with background vs foreground to find the seconds the person talks
and it wants me to create an ML from the begining no ready dataframes.
What is "it"?
i really need help.
.
.
i have to make a program that get a sound of person talking and it needs with background vs foreground to find the seconds the person talks
and it wants me to create an ML from the begining no ready dataframes.
i mean to do this job.
recognise the seconds a person talks
just use whisperx lol
https://github.com/m-bain/whisperX throw that on a semi-decent GPU and be happy
gives you the recognized texts with the timestamps for each sentence
i cant use ready i need to make my own and train it.
So presumably this is some sort of college assignment
and they presumably also gave you a dataset to work with?
i need to find a dataset
So you don't need to make your own model?
and use it to train the
I need to make my model
Open Speech and Language Resources.
What did you do in the earlier parts of the semester? 😂
Sounds absolutely insane to have students do an ML model from scratch in some kind of audio class but whatever
i know
You just need to identify the timestamps?
What sounds more likely is that they want you to use some sort of concept you learned in the class (e.g. audio processing), seems highly unlikely that they expect you to just know ML 🤨
yeah like the person says the word "table" and i need to say he is saying it in 2,5 to 3 second of the sound.
You need to say "the person is speaking from 2.5 to 3 seconds" or do you also need the text content?
both
I already recommended whisprx which does exactly what you need
You are tasked with implementing a system that segments a sentence into words, mandatorily using a background vs foreground classifier of your choice. Given a recording of a speaker, the system should return the time boundaries of the spoken words (in seconds). Additionally, you must provide an accompanying program that plays back the detected words. The number of words in the sentence is not known in advance, but you can assume that there is a small gap of silence between the words.
This is a completely different problem than the one you outlined lol
Attention!!!: You cannot use convolutional neural networks. The use of ready-made web services or APIs for speech recognition is not allowed. Transfer learning from pre-trained networks is also not allowed. Solutions that violate these rules will receive zero points.
can we vc so you can help me a bit?
and explain to me?
No
See you don't actually need to do transcription
they tell you there are no background noises and that you have a pause between words
So you just yoink something like http://dx.doi.org/10.1145/2814895.2814926 and are done
is this official python library?
What
wait a bit
A) You are tasked with implementing a system that segments a sentence into words, mandatorily using a background vs foreground classifier of your choice. Given a recording of a speaker, the system should return the time boundaries of the spoken words (in seconds). Additionally, you must provide an accompanying program that plays back the detected words. The number of words in the sentence is not known in advance, but you may assume that there is a small silence interval between words.
You must implement and compare the performance of the following classifiers: Least Squares, SVM, RNN, and a 3-layer MLP (specify the number of neurons per layer). The comparison should be conducted as is typical for binary classification systems.
B) From the detected words, calculate the speaker's average fundamental frequency.
You must explain which data were used during the testing and training of the system. If they are your own, explain how you created them; if they are open source, explain how they were utilized.
Try to ensure that the system is speaker-independent and as robust as possible to variations in speaker characteristics.
See there you have it
they want you to implement a very basic thing and they tell you what to implement
can you give me a explanation of what i need to do?
https://www.youtube.com/watch?v=WAxfTAy6RS8 there you go here's a nice professor from some Indian university I think?
least square error, Optimization via normal equation and gradient descent, inference
i really dont understand what i need to do
can you give me a general explanatin
What did you do the entire semester 💀
yea you gotta write code
the lesson is about sound
You probably ignored some of the requirements then? 💀
also never heard of an audio engineering class in a computer science degree
i really cant understand what i need to do.
Have you tried asking your professor for some help?
Cuz either you're lying or they're insanely incompetent lol
now i understand what i need to do thanky you