#birdclef-2024
1 messages · Page 1 of 1 (latest)
How do we submit? I'm not sure where if we should take all files in the test directory?
I would be interested in forming a team
I am interested to form a team
Hello, nice to meet you. im also interested in a team. 🙂
Hello guys, I am venkatkumar! If some one wants to teamup and work together birdclef 2024! I am curious to participate this competition
kaggle linke: https://www.kaggle.com/venkatkumar001
hey, did you ever find out how to do this?
right now, i have code that reads all files in test_soundscapes, but os.listdir is only returning readme.txt
Hey. You should filter on .ogg files only. What you can do is that if the folder is empty, you take files from the unlabeled_sequences instead. The test folder will be populated only when you submit
ah i see, but i mean when i submit it still happens. however, i think ive figured it out: is the notebook ran twice? once to commit and submit, and another hidden run to populate the test folder?
Yes, it runs twice usually. You first need to commit then you can submit.
thanks for the answer. its my first time participating in a notebook-only competition
hello birdclef-2024
working to make submissions with untrained off the shelf base models in order to find out if 120 minutes run-time limit can be met
unlabeled_soundscapes ogg files seem 4-min long, one could use 1100 of them to estimate the submission time
The time limits are quite annoying in these competitions
We want to deploy the winning solutions in the field where compute resources are limited, hence the computational constraints of the competition
I guess that by this restriction people will tend to use already trained models
host confirmed train labels done by community and test labels done by highly skilled bird experts - no wonder why label smoothing is used
some of unlabeled_soundscapes ogg are not 4-min long such as birdclef-2024/unlabeled_soundscapes/1005741050.ogg
estimated run-time of EfficientNetV2B0 using unlabeled_soundscapes is about 30min
Submissions scored 0.49 and took same time as estimated run-times
now is the time to train the model and hopefully score better than 0.49, will see what happens
basic training in progress - no augmentation, just random 10-sec spec, gpu is almost not utilized, the bottle neck is decoding ogg and preparing spec, model has to wait for the image, estimated 1 hr per epoch, total 8 epochs, 8 hours, 🍝 and ☕
8 epochs were not enough make any difference on the leaderboard score. more epochs scored 0.50
there was a mistake in the submission code, when fixed, scored 0.61
I always try to put the whole dataset into GPU memory to maximize utilization during experimentation (although the tradeoff is that you might have to reduce the resolution and/or precision of the data to get it small enough to fit)
pytorch and other libraries offer audio->spectrogram transforms that work on the GPU. If it were possible to do ogg->waveform decoding on the GPU too then that would be really cool.
For anyone who needs spectrogram data to train an image classification model, I made a dataset which contains most of the audio data separated into 5 second spectrograms. You can find the dataset here:
https://www.kaggle.com/datasets/nathaniellybrand/birdclef-2024-mel-spectrograms
Wouldn’t it be better to train on random crops instead of separating them ahead of time?
I am not sure what you mean by that. If you wanted you could shuffle the data so that they do not appear in order, I made the dataset so that people wouldn't have to convert the audio to spectrograms at runtime, since it takes longer to train.
I mean that random cropping gives training more diverse data. By precropping the model only sees 0-5, 5-10, 10-15, etc. Perhaps it learns to overfit on those particular crops.
But when random cropping, the model is given more arbitrary parts of a recording, like 2.34-7.34, 8-13 etc.
You can still compute the spectrograms ahead of time, just without splitting them up
You could certainly do that, but since the model is evaluated in 5 second chunks, it probably doesn't matter much.
During training you want the model to see as much variety of data as possible, but precropping would limit that
You can feel free to generate your own spectrograms, there is a plethora of notebooks teaching you how to do so.
Yes I have. Just curious as to why you’re taking this approach
Is it possible to upload trained in Colab model when submitting? Or I should always use Kaggle to train my model?
model can be trained anywhere. kaggle is needed to submit.
🤔 no joy yet - random leaderboard scores of 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61 by different model sizes and input sizes using random 10-sec specs without any augmentations - there seems to be no noticeable correlation between local validation and leaderboard performance - i wonder if i can improve the score by utilizing various augmentations and techniques...
Hi; Just had a beginner level question as dataset size is nearly 24 GB can everything be processed on distributed Kaggle Notebooks (including mel spectrograms, Model Training etc tasks) in individual notebooks or do we need any external training environment and hardware resources ?
in my opinion, all can be done happily using kaggle resource, however, any additional external resources are always helpful.
Agree with @coarse drift's comment. All should be good on kaggle itself. Only limitation is the 30 hrs of GPU, 8 hrs of notebook runtime, and storage space for saving a bunch of preprocessed data (usually not an issue but if you're saving like 100+ GBs of data, I think that's an issue).
If you haven't already discovered it, using del and gc.collect() will be your friend for not exploding your RAM if you're doing data preprocessing on a bunch of files.
Also, downloading data to a cloud like AWS, Azure, or what I have been hot on recently, RodPod, is pretty straightforward and only takes like 2+ hours using a smallish VM. Then you can go ham on GPU, granted, you have money for it. Happy kaggling on birds.
Does anyone happen to know why every time I save version of my notebook the error occurs: nbclient.exceptions.DeadKernelError: Kernel died? It’s the third time it happens! This notebook is used for training my model (efficientvit_b1.r288_in1k), it doesn’t have any errors and doesn’t exceed the 9 hours’ limit (it hardly exceeds 1.5 hours!). Any help will be very appreciated
Here is my notebook - https://www.kaggle.com/demko1/train-birdclef-2024-pytorch-melspectrogram
As it turned out, the error arose because of the overflooding the RAM space or the GPU space. Already solved it!
Hi all,
I have trained a EfficientNetV2L (also tried B0) convolutional neural network with an extra 128 Dense layer tacked on the end. All times I have attempted to submit have timed out on CPU time, and I'm very confused how I could fix this.
I load the audio with librosa, convert to Dbs, convert to grayscale image with tensorflow and use that to predict my outputs.
Based on example solutions people have done I'm scratching my head why my submission is timing out. I've run my predictions on the unlabled soundscapes and it has a pretty decent prediction rate of around 1 5s clip per 2-3s
I haven't been able to get submissions to not timeout unless they were around 8 seconds per file or under
I'm getting about 380-400ms per 5s segment
Sanity checking: I submit by splitting each and every audio file into each and every 5s segment, which I then predict individually.
Edit: im getting 90ms per 5s segment and its still timing out.
predicting individually is likely to incur an unnecessary overhead. split audio file into 48 chunks of 5s segment, predict a batch of 48 chunks.
are you getting a batch size of 48 on cpu?
Yes. training with gpu batch size of 64 and inferencing with cpu batch size of 48. what's your batch size?
Can confirm the batch size 48 solved my timing out issue
it's 128 in training, but I was only doing 1 in inference. I might be able to submit a bigger model by doing 48
Hi again a beginner level question! I am still confused regarding random cropping during training; For example if currently I am cropping 15 seconds of audio in 5 seconds chunks from beginning and ending of audio. arent there chances that the 5 second melspectrogram will not contain any bird call but will only contain noise as I am not even able to cross 50% accuracy with efficientnet models (I know scoring is in macroavg roc) THANK YOU SO MUCH FOR HELPING OUT
Yes, that is correct. There is a chance you won't get the bird noise. The keras starter notebook takes the approach of just using 10s windows for training and 5s for inference. Supposedly it transfers alright
(I am also beginner level)