#birdclef-2025
1 messages · Page 1 of 1 (latest)
Hello
Hello
Ola
Dear competition creators, why are you counting failed submissions into no. of submissions, this doesn't make any sense.
I have made 4 failed submissions, all because of not getting any clarity on how to submit, now I am left with only 1 submisison
Now the 5th and last submission is going on if this also fails, I am done. It is really demotivating.
this is usually done to deter probing, in competitions where probing is not an issue(like simulation competitions) failed subs are often not counted.
I am an amateur. Coming back into Kaggle after 3+ years. Looking for teammate for this competition.
Can you dm me?
aah, got it. But I get it once you understand how to submit, it won't be an issue.
Hi, I'm looking for a team to add me as I am machine/deep learning practitioner and I have not worked with audio data in past I can do any task with text and tabular format if you have any spot vacant in your team count me in
hey guys, does horizontal flipping the spectrograms make sense? I know, vertical flipping doesn't make any sense because our frequency bands will be altered but, flipping the time axis (horizontal flipping), is it good augmentation?
has anyone tried training models in TPU. I am using tensorflow, but when I am setting up the strategy.
This will give you an undesirable distortion of the spectrogram. You're better off looking into pitch shifting and time stretching for augmentation.
hi guys, I am running inference notebook. When I ran it on gpu, it got submitted successfully, but when I am running it on cpu, my submissions are failing
what can be the reason?
I don't know after 11 mins of running it just got submission scoring error. And as I said the same GPU notebook when I am running with cpu only got failed.
To give some premise, I am joblib to process the audio data to mel specs for all audio segments, then I was running a 3 models ensemble. The notebook when running with GPU got submitted sucessfully. But just for a thought experiment I changed my accelerator to None (i.e. CPU only). It went to submission scoring error
😠WHY
I think you need to use openvino or something like that
Thanks man🎩 . Will explore
I got my issue. I was job lib for processing all the audio files.
now after I am done with that I was then giving them to dataloaders for inference.
This 2 step approach was jamming my CPU and RAM. Now I am doing my processing in batches. Now it working perfectly with CPU.
hi Guys, can someone tell me what is the use of the train_soundscapes? They have no labels right, so how can I use it?
I believe they provide those as a means to give additional train data. You train your model without it and then utilize the model on it, identify labels, and then you can use the now labeled data for additional training. I believe previous year winners utilized it in this way for past BirdCLEF competitions where it was provided in this manner as well. The downside being if your model isn't good it may just confuse the model or otherwise reinforce inaccuracies with bad labeling.
Hi everyone, I am new to this competition, I have just created a baseline but failed to submit notebooks.
This is my baseline notebook, I am new to Kaggle so I wanted to team up with someone.
Couple of questions:
- is it GPU submission or CPU submission?
- What is the error you got on submission: is it Submission scoring error or Timeout error
There is a limit of 90 mins for submission
Also, internet is off, so if you are downloading some model configs then that will be an issue
Check your submission file, I have attached my sample submission image, make sure the row_ids are properly generated
Hello guys, I am seeing some audio files that contains human voices. Will the actual test data have them?
Not in this comp but since most don't allow internet access you can just upload the wheel files of any libs you need as a Kaggle dataset and use those to install any dependencies you need
no no, I was replying to @novel ember question
Is anyone interested in teaming up for this competition? I have experience with audio classification
I am. Anyone want to collaborate?
I want to collaborate, can I dm to you ?
Hello there, I am looking to participate in this competition, and I have doubts about the dataset. I see some human and bird voices also. The names of the species are not present. Would like to get some insights from the community
sure
I entered knowing im gonna do it just for the practice but 0.88 with 2 months to go is craaazy
I created a notebook to make a dataset. In the notebook it shows there are 58k images. But when I am creating a dataset from the notebook's output. It shows only 500 files. What is happening? Am I doing anything wrong?
Hi team! I'm a Kaggle newbie and had some questions on the competition setup:
- Does training AND inference have to run in under 90 minutes? Or just inference? If it's the former, why do some example notebooks like this just load checkpoints?
- What motivates folks to share their submissions and approaches on the open forum?
- What's the policy on external training datasets? I would think they're not allowed, but posts like this have me confused
This is mentioned in the competition overview on kaggle. So external datasets and pretrained models are fine as long as they're freely and publically available.
On 2, a lot of people have an innate desire to share their knowledge and understanding.
This satisfies a primitive urge and may in some cases improve their social status.
Can you use external datasets and pretrained models for official submissions? If so, doesn't that contradict "internet access disabled"?
oh I guess running CPU inference on 700 samples in 90 minutes is itself quite limiting
I'm having problem, that when I submit my code I'm not given any data in the test_soundscapes
Anyone knows why this would be happening?
Have you tried making an official solution like so?
yes I did
At first, I thought that I have some blunder in my code. But with more adjustments I found out that even when I copied a few lines of code from BirdCLEF+ 2025: Simple Submission that were supposed to read all files in test_soundscapes/ and print length of that list, it printed out "0 files"
oh the UI is potentially confusing. As far as I know, if you're able to print any output, you're likely running in a Kaggle notebook, in which case they intentionally leave the test set unpopulated.
just to make sure, if you're on this page, can you try hitting this button?
yes I tried this
it starts running and than fails and in the log I can see that when my code was trying to read data from test_soundscapes it couldn't because there weren't any
I'm looking for motivated teammates.
I'm particularly interested in collaborating with people who are passionate, diligent, and eager to learn together. I'm from South Korea, so teammates comfortable with international collaboration and open communication would be ideal.
I have been wanting to join a team and would be open to international collaboration. I
why do most of these birdclef solutions have CNN models that are trained on 5 folds and then they ensemble all 5 folds
why not just train on the entire dataset in 1 fold and use 1 model trained on everything?
The 5 folds allow the model to have the training data segmented into 5 buckets using 1 of the buckets as the validation set for 4 given training data at a time which is then ensembled for generalization purposes. The difficulty with this particular competition though and the 5 fold is that there isn't enough data for many of the labels to properly 5 fold without having some folds completely void of some labels. Some of the rarer classes only have like 2 oggs at all. One option would be to create multiple melspecs from the oggs such that you may have enough for multiple folds but you risk the data leakage there likely causing overfitting.
Perhaps the best option would be to round robin assign the rare classes and then only keep folds for testing that have one of the rare classes in. So 5 fold but utilizing only 2 of the folds given the rarity of the classes (where each of those 2 folds are the validation set).
For the full dataset you could do a train/validation split as long as you round robin the rare classes.
Perhaps the 5 folds though gives the best general answer given how rare the rare classes are though so folks relying on that end up doing well and the benefit of properly splitting the rare one is minor given there isn't much there to learn on.
I see thank you for such a detailed explanation!
Hi, I have one question. A lot of audio files are longer than the typical chunk duration of 5 or 10 seconds. Let's for example take the first two training examples that have the primary_label 1139490. The corresponding audio files are CSA36385.ogg and CSA36389.ogg. They are respectively 1:39 and 1:37 minutes long. Do you just truncate the audio files and only pick the first 5 or 10 seconds? Or is it not a better idea to create more training samples with this primary_label? If we take a chunk duration of 5 seconds, then it is possible to have (1:39 + 1:37) mod 5 training samples instead of 2. Or is this not advisable?
One difficulty with using multiple from the same source is the frequency of animal calls.
I noticed:
- Usually the subject makes a noise within the first 5/10s as the person uploading to the service cropped it so they are immediately being heard (since these come from those naturalist sites)
- Often times after the first it can vary with some animals making noises consistently while others have a break between their vocalizations
- Some recordings have humans either annotating after the sound or human voices intermixed with the animal sounds (the raccoons for example had at least one where there were people commenting on hearing raccoons and some author's samples have like 5s of the animal and then a minute of annotation consistently)
- There's a chance that by including multiple from the same source file you'll overfit especially if the multiple end up in various folds / aren't grouped during splitting
I think the first 5/10s are probably the safest but you could include the others with caution to avoid overfitting. Some concern about uneven vocalizations could be offset by using a model like perch to detect if the bird was present in that segment: https://www.kaggle.com/models/google/bird-vocalization-classifier it doesn't contain all of the birds though for the competition so some instances but may help to identify segments of value for the given set.
There has been some discussion about identifying the human voices via silero-vad on the forum: https://www.kaggle.com/competitions/birdclef-2025/discussion/568886 I did notice some bird vocalizations are misclassified as humans with this approach though so some caution is needed.
Hey, a dumb question, but should my model consider audio segments with no bird sounds? I mean is it possible that the test soundscapes contain 5 second segments with no sounds at all? Do you guys do something about it during training?
Hello Everyone,
As you all would have noticed, the audio files in the training dataset contains audio of the species' along with human annotations. I though of cleaning this by using VAD models(usually used to detect human speech segments for speech diarization processes).
I developed a python script to get the time stamps of non speech segments in the form of start and end timestamp lists for each audio sample.
My concern is, since we don't have any ground truths. Is there any way possible to evaluate the results. TIA
Here's the link to the notebook : https://www.kaggle.com/code/divyaprakashr/birdclef-2025-non-speech-activity-detection/edit
Hey there, I just came across this competition, checked the past years' records, and noticed the competitive score range has dramatically shifted upward. Does this suggest this competition became "easier" this year?
I can't seem to figure out the reason, since a wide variety of species has been added this time.
I'd be happy to hear from anyone, thanks!
It does not show the actual notebook sadly. But if you are using webrtcvad (which I was using), it will filter out many animal sounds as well, yielding a lot of audio without animals
One thing that helps is people can borrow succesful techniques from last year, and perhaps detecting one animal from, say, a spectogram isn't so different from detecting another animal from a spectogram. I see a lot are building on top of existing models, which have likely also gotten better over the past year. I am no expert though
Wonderful! And may I ask what model(s) people/you are making use of this year?
I've trained this one on basic versions of EfficientNet and regnet, but also used a more specific one like tf_efficientnetv2_s.in21k_ft_in1k. I believe the latter had been used in combination with focal loss and got number 8 last year
I'm playing around with early stopping as well but for some reason it makes performance worse more often than not and I'm not sure why 😅
Thanks, how is your score looking with these?
Highest we've gotten is 0.807, but I believe it should be able to reach 0.829 with an untouched dataset with default stopping? Right now I am running my dataset cleanser to properly remove voices and silences this time, so hopefully that will go up
Great, I'll look into it too!
Looking for 1 serious teammate for BirdCLEF 2025. Deadline is June 5. I haven’t started yet — just wrapping up Drawing with LLMs and Image Matching first (both end May 27–31).
My goal is Top 5 minimum. I’ll go all in on BirdCLEF starting June 1, but I want someone who can start groundwork now — loading the data, testing a few baseline models, figuring out label issues, and setting up basic training.
I don’t have compute. You’ll need to train on your end or use Colab. I can handle pipeline logic, ensembling, eval logic, and wrap-up once I’m free.
You should know audio modeling — spectrograms, CNNs, maybe wav2vec2 — and be down to win, not just submit something.
DM me with past comp or audio experience. No tourists.
We are trying to speed up our training, but nothing works. We already split the notebook into separate ones (preprocessing + training + testing&submitting). We now had the idea to precompute the Mel spectograms. But nothing works. Anyone any ideas that would help our training to run faster? One epoch with two folds takes already a few hours. I will provide our code in the attachment.
Hi everybody. This is my first competition using this type of submit, and I am having a very hard time making it to run. This is the code I am using in the notebook:
test_soundscape_path = "/kaggle/input/birdclef-2025/test_soundscapes"
test_files = sorted(glob.glob(os.path.join(test_soundscape_path, "*/.ogg"),
recursive=True))
print(f"Found {len(test_files)} test audio files")
I have also used other similar ideas to iterate files in that folder, but the folder only contains the readme.txt file. I read similar posts, I asked chatgpt and qwen, and I cannot understand what am I doing wrong. I am submiting this code to the competition, not running it.
Try to check if test_soundscapes contains any .ogg - if not use train_soundscapes. Your if check will be successfull when submission will be computed at Kaggle side - then test_soundscapes will be "silently" filled up with ogg files
A very very lame question:
IS 90 minutes CPU-only is for ALL the processing? Like, can I leave only fine-tune code part of model and load pre-trained model? As if I shrink my dataset gathered from input data files I get no satisfactory results. But if I train it in CPU-on;y mode it takes too long to fit in the 90 minutes.
I mean upload pre-trained model by me on my PC, using the code from Kaggle-notebook I upload for submission. So it was not published somewhere before, but I am ok someone can use it as the competition finishes. It is about the definition of terms in competition discription I am not really familiar with