#hms-harmful-brain-activity-classification
1 messages · Page 1 of 1 (latest)
Few weird plots are there like this , need to check the data
I see similar cases in this data
Newbie question: the Kaggle discussion forum is very active for this challenge but nothing much here. Why would you use discord & not the discussion board? Am i missing something? thanks!
Hy, I am Kavya and I am looking out to join a team for this competition. If there is anyone who is available for team up, we can connect and can start working on this asap!
Hmm. I took a quick look at some of the data, and I noticed that the voltages change kind of rapidly from positive to negative, and if I interpret this correctly, it would mean that the period of the eeg waves is 1-2 indeces (which I get is the time unit used). Wouldn't that mean that there is a good chance that the frequency is underrepresented? I'm kind of beginner though, so I might have got something wrong. Here's a sample graph:
That graph is from the file of the training data 1000913311.parquet, indeces 1-20 and column Fp1.
Hmm. I am most likely doing something wrong at how I am interpreting the eeg.
@grave egret you can see on the problem definition that the frequency of eeg graphs are 200Hz.
I am aware that the rate that the eeg measures is 200Hz. What I am concerned about is the frequency of the waveforms themselves. Does this graph mean that the waveforms are also like, 100-200Hz? Or I am doing something wrong in calculating the waveform frequency?
@grave egret I think that frequency only affects on resolution of the graph.
On the other hand I think that you are plotting strange result because you are plotting directly the column FP1, I am not 100% sure of that, because i'm not a doctor, but I think the right way to represents that graph is making the substraction betweem two signals. You can see that in PDF examples and i see some notebooks whith the same approach. I think that It could have sense because you are measuring brain activity between to parts of the brain.
If someone else could confirm that i think It could help
You are right, I have to subtract an electrode from another. I am still trying to figure out what exactly is the waveform graph thought. I have a graph of 10000 indeces of Fp1 - F3 electrodes:
For now what I want to figure out is how to convert the data of the parquet files into a waveform, like the ones we see in a typical eeg (i.e. a line, and not whatever this is on that graph). Some input from an expert would be really appreciated :>
where can the files in <tf-efficientnet-imagenet-weights> and <tf-efficientnet-whl-files> be found?
Hi - I'm trying to reconstruct Chris Deotte's (https://www.kaggle.com/cdeotte) awesome EfficientNet Starter notebook. I added the 2 datasets required for the spectrogram processing <brain_spectrograms> and <brain_eeg_spectrograms>. Now I'm trying find the EfficientNet pre-loaded weights seen in Chris's Kaggle/input <tf-efficientnet-imagenet-weights> directory as well as the "whl" files which appear to be particular versions of various python & packages <tf-efficientnet-whl-files>. It'd be greatly appreciated if you might point me in the right direction. Thanks, Matthew
Earned a BA in mathematics then worked as a graphic artist, photographer, carpenter, and [teacher][3]. Earned a PhD in computational science and mathematics with a [thesis on optimizing parallel processing][5]. Now work as a [data scientist and researcher][4].
Are you using a kaggle notebook? if so, file > add data > search for dataset
if not, should be able to download with kaggle command line
you can find all the datasets for any given code on its "inputs" page
I've been looking for this image on code/discussion because I remember have seen it there but I don't found where.
There is a discussion about a webinar and I found it there. May be is useful so here you have it. And the webinar https://www.youtube.com/watch?v=4Ey8viSAj_o&ab_channel=SciBerloga
🚀 https://t.me/sberlogabig/355 webinar on data science:
👨🔬 Dmitrii Rudenko "Introduction to the Kaggle competition 'HMS - Harmful Brain Activity Classification (https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification)'"
⌚️Friday, 2 Febraury
Announcement on Kaggle (https://www.kaggle.com/competitions/hms-harmful-brain-ac...
seems that RR Right Temporal is RL column on spectrograms
Hey there guys I am new to kaggle competetions and I am not sure on how to load the data properly are there any guides or videos that I look into?
the only submission I made was done by copying the data loader and changing the mdoel
model*
I'm not sure what are you asking. Data on notebooks is as it would be in your local machine, just click on the right side option pannel for the paths
the specific preprocessing will depend on the competition and wont be unique, many approaches can be applied
if you are asking for datasets and dataloaders each popular library have his docs, for pytorch for example you can check https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
Yeah that's what I am asking for ..I usually use tensorflow and pytorch feels kinda hard...
thanks
Ya make sure to load in batches it’s a huge dataset
You’re gonna run out of memory fast
Yeah sure noted 👍 it happend in the last challenge but I found a good starter notebook so it will be fine I guess
all eegs fits in kaggle memmory, is 6 min of loading in exchange of faster trainings, better for GPU quota
Hello! Looking for a team for HMS Harmful Brain Activity. I'm aware it's almost over but would like to use the data while it's available. Please dm if interested.
Classify seizures and other patterns of harmful brain activity in critically ill patients
Hi all---this is my first code competition on Kaggle. I'm seeing multiple "training notebooks" (internet enabled, pretrained=True when calling timm.create_model) in which they train + export models publicly to Kaggle, then import into an "inference notebook" (internet disabled) used to score their submission. Just wanted to make sure this is an approach I can use that follows the rules? Or are these just examples people are showing?
Oh I just found a discussion post that confirms that this approach is fine---dropping it here in case anyone needs it: https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification/discussion/475467
Classify seizures and other patterns of harmful brain activity in critically ill patients
Hi guys, does anyone know which eeg id maps to which pdf example?
I think there is not such info. But in the link I gave you can see an example where the autor zooms kaggle spectrograms and compars them with the ones he generates from eeg.
hi all, is there a specified loss function for the challenge?
recommended KLDiv
this competition is frying my brain i wanna cry
Hello, my competition submission gave me error saying: Submission Scoring Error
These are my final lines of code for predicting and submitting, i checked the sample file, it's literally the same format, i want to know where did i do wrong?
table = pq.read_table(test_eeg_files).to_pandas().fillna(0).to_numpy()
normalized_table = torch.tensor(preprocessing.normalize(table))
x1, x2, x3 = sliding_window(normalized_table.unsqueeze(0), sw_size)
x1 = x1.to(device)
x2 = x2.to(device)
x3 = x3.to(device)
with torch.no_grad():
test_preds = model(x1, x2, x3).softmax(dim=1)
numpy_test_preds = test_preds.detach().cpu().numpy()
df = pd.DataFrame(numpy_test_preds)
df = df.rename(columns={0: 'seizure_vote',
1: 'lpd_vote',
2: 'gpd_vote',
3: 'lrda_vote',
4: 'grda_vote',
5: "other_vote"})
df.insert(0, "eeg_id", int(Path(test_eeg_files[0]).stem))
df.to_csv("submission.csv", index=False)
Although cryptic failure says something about the cause. What sayd to you?
incorrect format?
Hey,
Looking for a team mate for the competition : hms-harmful-brain-activity-clas…
I'm familiar with eda...and sklearn machine learning models...
If interested, then plz respond...
i know the competition is about to end...but 10-15 days are sufficient to make a significant contribution
Is there a way to submit notebook without it rurunning again?
i already run the notebook to train, and it took 5 hours
submitting takes another 5 hours....
save the trained model, upload it as dataset, and run an inference only notebook
alright, thank you so much
or don't train it before, train and inference directly at submission, you will save GPU quota
hello, I want to ask about the train_spectrogram data.. I see the spectrogram visualization and I noticed that there are missing data there.. How much will it affect the model? And how to fill those missing data? Sorry, I am a beginner in this field so I don't know much about this..
you can:
- Impute ( fill values with some regressive or predictive models)
- Drop columns with many missing values
- Fill with zeroes
- Fill with mean,max, median etc
My suggestion is to go throught the kaggle learn housing competition
Learn the core ideas in machine learning, and build your first models.
its just a day of work at max , and you'll have m your answers and more 😇
ohh okayy thank you for your answer! Also I'm curious, can I use train_eegs data to fill the missing data? I kind of have an idea to convert the raw eeg data to spectrogram in order to fill the missing data, but I still don't really know how to do so..
yeah you can
i assume you're doing the #hms-harmful-brain-activity-classification
yes, currently I'm trying that competition.. quite challenging since I don't know anything about EEG at first..
lol
i assumed you would
well its okay , its best to read other solutions and learn from them
is it your first competition?
i can try and help, dm me
I saw an explanation posted by Chris Deotte about creating a spectrogram from EEG data but I still don't undestand by the formula..
yes, this is my first competition 😅
no probs
Just published my first notebook on Kaggle and it's about this competition! Let me know what you think 🙂
https://www.kaggle.com/code/eliaschiavon/data-exploration-from-zero-to-spectrograms
how do you guys handle eeg signals? i tried padding them so it can load as batches, but gpus run out of memory when loading them... is there a way to make them smaller with minimum loss of information?
lightgbm is good for this competition
hi has anybody found the correct method for montaging eeg signals, eg: which sensors to subtract from their neighbors for better features
probably there is a better way
but for eeg the one that worked better for me was... give a minut...
not the net, I've failed with any sequence related model, but electrode montage gave me nice results with CNN
def __data_generation(self, index):
row = self.df.iloc[index]
X = np.zeros((10_000, 8), dtype='float32')
y = np.zeros(6, dtype='float32')
data = self.eegs[row.eeg_id]
# === Feature engineering ===
X[:,0] = data[:,feature_to_index['Fp1']] - data[:,feature_to_index['T3']]
X[:,1] = data[:,feature_to_index['T3']] - data[:,feature_to_index['O1']]
X[:,2] = data[:,feature_to_index['Fp1']] - data[:,feature_to_index['C3']]
X[:,3] = data[:,feature_to_index['C3']] - data[:,feature_to_index['O1']]
X[:,4] = data[:,feature_to_index['Fp2']] - data[:,feature_to_index['C4']]
X[:,5] = data[:,feature_to_index['C4']] - data[:,feature_to_index['O2']]
X[:,6] = data[:,feature_to_index['Fp2']] - data[:,feature_to_index['T4']]
X[:,7] = data[:,feature_to_index['T4']] - data[:,feature_to_index['O2']]
# === Standarize ===
X = np.clip(X,-1024, 1024)
X = np.nan_to_num(X, nan=0) / 32.0
# === Butter Low-pass Filter ===
X = butter_lowpass_filter(X)
if self.mode != 'test':
y_prob = row[self.config.target_cols].values.astype(np.float32)
return X, y_prob
Is a simplification of the usual contiguous electrode substraction dviding banana regions
The original substracts all differences on contiguous electrodes of each banan region, leading to 4x4 signals
this one does the same but skipping one electrode each time, leading to 2x4 signals with more concentrated information
One day left
The double banana regions
LL:
Fp1 - T3
T3 - O1
LP:
Fp1 - C3
C3 - O1
RP:
Fp2 - C4
C4 - O2
RR:
Fp2 - T4
T4 - O2