#birdclef-2025

1 messages · Page 1 of 1 (latest)

chilly bramble
#

It is back!

robust kettle
#

Hello

crimson trail
#

Hello

wide monolith
#

Ola

#

Dear competition creators, why are you counting failed submissions into no. of submissions, this doesn't make any sense.

#

I have made 4 failed submissions, all because of not getting any clarity on how to submit, now I am left with only 1 submisison

#

Now the 5th and last submission is going on if this also fails, I am done. It is really demotivating.

pale tundra
robust kettle
#

I am an amateur. Coming back into Kaggle after 3+ years. Looking for teammate for this competition.

wide monolith
junior remnant
#

Hi, I'm looking for a team to add me as I am machine/deep learning practitioner and I have not worked with audio data in past I can do any task with text and tabular format if you have any spot vacant in your team count me in

wide monolith
#

hey guys, does horizontal flipping the spectrograms make sense? I know, vertical flipping doesn't make any sense because our frequency bands will be altered but, flipping the time axis (horizontal flipping), is it good augmentation?

wide monolith
#

has anyone tried training models in TPU. I am using tensorflow, but when I am setting up the strategy.

high flicker
wide monolith
#

hi guys, I am running inference notebook. When I ran it on gpu, it got submitted successfully, but when I am running it on cpu, my submissions are failing

#

what can be the reason?

chilly bramble
wide monolith
#

I don't know after 11 mins of running it just got submission scoring error. And as I said the same GPU notebook when I am running with cpu only got failed.

To give some premise, I am joblib to process the audio data to mel specs for all audio segments, then I was running a 3 models ensemble. The notebook when running with GPU got submitted sucessfully. But just for a thought experiment I changed my accelerator to None (i.e. CPU only). It went to submission scoring error

#

😭 WHY

chilly bramble
#

I think you need to use openvino or something like that

wide monolith
#

Thanks man🎩 . Will explore

wide monolith
#

I got my issue. I was job lib for processing all the audio files.

now after I am done with that I was then giving them to dataloaders for inference.

This 2 step approach was jamming my CPU and RAM. Now I am doing my processing in batches. Now it working perfectly with CPU.

wide monolith
#

hi Guys, can someone tell me what is the use of the train_soundscapes? They have no labels right, so how can I use it?

arctic steppe
# wide monolith hi Guys, can someone tell me what is the use of the train_soundscapes? They have...

I believe they provide those as a means to give additional train data. You train your model without it and then utilize the model on it, identify labels, and then you can use the now labeled data for additional training. I believe previous year winners utilized it in this way for past BirdCLEF competitions where it was provided in this manner as well. The downside being if your model isn't good it may just confuse the model or otherwise reinforce inaccuracies with bad labeling.

wide monolith
#

yeah little risky 😬

#

But Thanks for answering🎩

novel ember
#

Hi everyone, I am new to this competition, I have just created a baseline but failed to submit notebooks.

#

This is my baseline notebook, I am new to Kaggle so I wanted to team up with someone.

wide monolith
# novel ember Hi everyone, I am new to this competition, I have just created a baseline but fa...

Couple of questions:

  • is it GPU submission or CPU submission?
  • What is the error you got on submission: is it Submission scoring error or Timeout error

There is a limit of 90 mins for submission
Also, internet is off, so if you are downloading some model configs then that will be an issue
Check your submission file, I have attached my sample submission image, make sure the row_ids are properly generated

wide monolith
#

Hello guys, I am seeing some audio files that contains human voices. Will the actual test data have them?

dawn abyss
wide monolith
#

no no, I was replying to @novel ember question

brazen mica
#

Is anyone interested in teaming up for this competition? I have experience with audio classification

lime meadow
novel ember
strange phoenix
#

Hello there, I am looking to participate in this competition, and I have doubts about the dataset. I see some human and bird voices also. The names of the species are not present. Would like to get some insights from the community

lime meadow
full isle
#

I entered knowing im gonna do it just for the practice but 0.88 with 2 months to go is craaazy

wide monolith
#

I created a notebook to make a dataset. In the notebook it shows there are 58k images. But when I am creating a dataset from the notebook's output. It shows only 500 files. What is happening? Am I doing anything wrong?

flat night
#

Hi team! I'm a Kaggle newbie and had some questions on the competition setup:

  1. Does training AND inference have to run in under 90 minutes? Or just inference? If it's the former, why do some example notebooks like this just load checkpoints?
  2. What motivates folks to share their submissions and approaches on the open forum?
  3. What's the policy on external training datasets? I would think they're not allowed, but posts like this have me confused
uneven sundial
#

On 2, a lot of people have an innate desire to share their knowledge and understanding.

#

This satisfies a primitive urge and may in some cases improve their social status.

flat night
#

Can you use external datasets and pretrained models for official submissions? If so, doesn't that contradict "internet access disabled"?

#

oh I guess running CPU inference on 700 samples in 90 minutes is itself quite limiting

vale brook
#

I'm having problem, that when I submit my code I'm not given any data in the test_soundscapes

#

Anyone knows why this would be happening?

flat night
#

Have you tried making an official solution like so?

vale brook
vale brook
#

At first, I thought that I have some blunder in my code. But with more adjustments I found out that even when I copied a few lines of code from BirdCLEF+ 2025: Simple Submission that were supposed to read all files in test_soundscapes/ and print length of that list, it printed out "0 files"

flat night
#

oh the UI is potentially confusing. As far as I know, if you're able to print any output, you're likely running in a Kaggle notebook, in which case they intentionally leave the test set unpopulated.

just to make sure, if you're on this page, can you try hitting this button?

vale brook
#

yes I tried this

#

it starts running and than fails and in the log I can see that when my code was trying to read data from test_soundscapes it couldn't because there weren't any

peak apex
#

whats up with #1 having .902 😭

#

its gotta be overfitting public LB right?

chilly bramble
chilly bramble
#

I'm looking for motivated teammates.
I'm particularly interested in collaborating with people who are passionate, diligent, and eager to learn together. I'm from South Korea, so teammates comfortable with international collaboration and open communication would be ideal.

sonic tundra
peak apex
#

why do most of these birdclef solutions have CNN models that are trained on 5 folds and then they ensemble all 5 folds

#

why not just train on the entire dataset in 1 fold and use 1 model trained on everything?

arctic steppe
# peak apex why not just train on the entire dataset in 1 fold and use 1 model trained on ev...

The 5 folds allow the model to have the training data segmented into 5 buckets using 1 of the buckets as the validation set for 4 given training data at a time which is then ensembled for generalization purposes. The difficulty with this particular competition though and the 5 fold is that there isn't enough data for many of the labels to properly 5 fold without having some folds completely void of some labels. Some of the rarer classes only have like 2 oggs at all. One option would be to create multiple melspecs from the oggs such that you may have enough for multiple folds but you risk the data leakage there likely causing overfitting.

Perhaps the best option would be to round robin assign the rare classes and then only keep folds for testing that have one of the rare classes in. So 5 fold but utilizing only 2 of the folds given the rarity of the classes (where each of those 2 folds are the validation set).

For the full dataset you could do a train/validation split as long as you round robin the rare classes.
Perhaps the 5 folds though gives the best general answer given how rare the rare classes are though so folks relying on that end up doing well and the benefit of properly splitting the rare one is minor given there isn't much there to learn on.

peak apex
quaint wharf
#

Hi, I have one question. A lot of audio files are longer than the typical chunk duration of 5 or 10 seconds. Let's for example take the first two training examples that have the primary_label 1139490. The corresponding audio files are CSA36385.ogg and CSA36389.ogg. They are respectively 1:39 and 1:37 minutes long. Do you just truncate the audio files and only pick the first 5 or 10 seconds? Or is it not a better idea to create more training samples with this primary_label? If we take a chunk duration of 5 seconds, then it is possible to have (1:39 + 1:37) mod 5 training samples instead of 2. Or is this not advisable?

arctic steppe
# quaint wharf Hi, I have one question. A lot of audio files are longer than the typical chunk ...

One difficulty with using multiple from the same source is the frequency of animal calls.

I noticed:

  • Usually the subject makes a noise within the first 5/10s as the person uploading to the service cropped it so they are immediately being heard (since these come from those naturalist sites)
  • Often times after the first it can vary with some animals making noises consistently while others have a break between their vocalizations
  • Some recordings have humans either annotating after the sound or human voices intermixed with the animal sounds (the raccoons for example had at least one where there were people commenting on hearing raccoons and some author's samples have like 5s of the animal and then a minute of annotation consistently)
  • There's a chance that by including multiple from the same source file you'll overfit especially if the multiple end up in various folds / aren't grouped during splitting

I think the first 5/10s are probably the safest but you could include the others with caution to avoid overfitting. Some concern about uneven vocalizations could be offset by using a model like perch to detect if the bird was present in that segment: https://www.kaggle.com/models/google/bird-vocalization-classifier it doesn't contain all of the birds though for the competition so some instances but may help to identify segments of value for the given set.

There has been some discussion about identifying the human voices via silero-vad on the forum: https://www.kaggle.com/competitions/birdclef-2025/discussion/568886 I did notice some bird vocalizations are misclassified as humans with this approach though so some caution is needed.

round lintel
#

Hey, a dumb question, but should my model consider audio segments with no bird sounds? I mean is it possible that the test soundscapes contain 5 second segments with no sounds at all? Do you guys do something about it during training?

heady cairn
#

Hello Everyone,

As you all would have noticed, the audio files in the training dataset contains audio of the species' along with human annotations. I though of cleaning this by using VAD models(usually used to detect human speech segments for speech diarization processes).

I developed a python script to get the time stamps of non speech segments in the form of start and end timestamp lists for each audio sample.

My concern is, since we don't have any ground truths. Is there any way possible to evaluate the results. TIA

Here's the link to the notebook : https://www.kaggle.com/code/divyaprakashr/birdclef-2025-non-speech-activity-detection/edit

lapis night
#

Hey there, I just came across this competition, checked the past years' records, and noticed the competitive score range has dramatically shifted upward. Does this suggest this competition became "easier" this year?
I can't seem to figure out the reason, since a wide variety of species has been added this time.
I'd be happy to hear from anyone, thanks!

real hull
real hull
lapis night
real hull
#

I'm playing around with early stopping as well but for some reason it makes performance worse more often than not and I'm not sure why 😅

lapis night
real hull
# lapis night Thanks, how is your score looking with these?

Highest we've gotten is 0.807, but I believe it should be able to reach 0.829 with an untouched dataset with default stopping? Right now I am running my dataset cleanser to properly remove voices and silences this time, so hopefully that will go up

lapis night
simple quest
#

Looking for 1 serious teammate for BirdCLEF 2025. Deadline is June 5. I haven’t started yet — just wrapping up Drawing with LLMs and Image Matching first (both end May 27–31).

My goal is Top 5 minimum. I’ll go all in on BirdCLEF starting June 1, but I want someone who can start groundwork now — loading the data, testing a few baseline models, figuring out label issues, and setting up basic training.

I don’t have compute. You’ll need to train on your end or use Colab. I can handle pipeline logic, ensembling, eval logic, and wrap-up once I’m free.

You should know audio modeling — spectrograms, CNNs, maybe wav2vec2 — and be down to win, not just submit something.

DM me with past comp or audio experience. No tourists.

quaint wharf
#

We are trying to speed up our training, but nothing works. We already split the notebook into separate ones (preprocessing + training + testing&submitting). We now had the idea to precompute the Mel spectograms. But nothing works. Anyone any ideas that would help our training to run faster? One epoch with two folds takes already a few hours. I will provide our code in the attachment.

nova nebula
#

Hi everybody. This is my first competition using this type of submit, and I am having a very hard time making it to run. This is the code I am using in the notebook:

test_soundscape_path = "/kaggle/input/birdclef-2025/test_soundscapes"
test_files = sorted(glob.glob(os.path.join(test_soundscape_path, "*/.ogg"),
recursive=True))
print(f"Found {len(test_files)} test audio files")

I have also used other similar ideas to iterate files in that folder, but the folder only contains the readme.txt file. I read similar posts, I asked chatgpt and qwen, and I cannot understand what am I doing wrong. I am submiting this code to the competition, not running it.

wind cedar
hallow lance
#

A very very lame question:
IS 90 minutes CPU-only is for ALL the processing? Like, can I leave only fine-tune code part of model and load pre-trained model? As if I shrink my dataset gathered from input data files I get no satisfactory results. But if I train it in CPU-on;y mode it takes too long to fit in the 90 minutes.

#

I mean upload pre-trained model by me on my PC, using the code from Kaggle-notebook I upload for submission. So it was not published somewhere before, but I am ok someone can use it as the competition finishes. It is about the definition of terms in competition discription I am not really familiar with