#llm-detect-ai-generated-text | Kaggle | Page 1

wise pecan Nov 1, 2023, 9:22 AM

#

hello

light smelt Nov 2, 2023, 8:55 AM

#

Hi

unborn dawn Nov 2, 2023, 3:52 PM

#

hehe

bright oriole Nov 2, 2023, 4:44 PM

#

The competition rules state:
“To the extent your Submission makes use of generally commercially available software not owned by you that you used to generate your submission, but that can be procured by the Competition Sponsor without undue expense, you do not grant the license in the preceding sentence to that software.”

Does this prohibit the (extensive) use of a (fine-tuned) closed-source foundation model, such as GPT-4, which are pay-per-use? Could I use such models for generating training data or for making predictions?

ember nymph Nov 2, 2023, 6:58 PM

#

hello

#

Are we allowed to use pre - trained models like BERT, CalBERT etc?

sturdy kraken Nov 3, 2023, 7:08 AM

#

hey guys beginner question here...can someone explain how submissions work in this competition?

wind quiver Nov 3, 2023, 7:12 PM

#

ember nymph Are we allowed to use pre - trained models like BERT, CalBERT etc?

Yes.

ember nymph Nov 3, 2023, 7:47 PM

#

thanks

bleak shuttle Nov 7, 2023, 8:21 PM

#

so, to be able to submit, we need to make 2 notebooks, where the second one loads the trained model and tokenizer (if applicable) from the kaggle working space (or from hard drive?), and then infer from those on the test data?

#

if someone would have any pointers, much appreciated

#

or can I first download the model and tokenizer and then use those? (in my case distilbert-base-uncased-finetuned-sst-2-english from hf)

sturdy kraken Nov 8, 2023, 8:21 AM

#

You gotta download them using internet on first (then download to local and upload as a dataset) then load from the custom dataset directory

#

Don't forget to turn internet off when submitting the notebook

bleak shuttle Nov 8, 2023, 8:22 AM

#

ok thanks, will try that

turbid wraith Nov 8, 2023, 11:47 AM

#

I need a team for this project can anyone invite me

steep roost Nov 9, 2023, 2:54 PM

#

Hi, I'm have some experiences in ML/DL and interested in LLMs, but I don't have much experience in kaggle competitions (attended a few). Is there any one wants to team up with me for this project?

ember nymph Nov 11, 2023, 11:37 PM

#

steep roost Hi, I'm have some experiences in ML/DL and interested in LLMs, but I don't have ...

Hello! If you still haven't found a team, I'd love to join up!

ember nymph Nov 12, 2023, 8:55 AM

#

I'm looking for a team as well, I'd love to join up. I have some interesting text features to share, not based on word counts, or tf idf.

honest kettle Nov 13, 2023, 3:43 AM

#

ember nymph I'm looking for a team as well, I'd love to join up. I have some interesting tex...

wanna work together. I have decent backround with ML. I know a variety of ML methods like nerural networks, logistic regression, etc

ember nymph Nov 13, 2023, 8:48 AM

#

Can someone help me out understanding the difference between the training data, and the data used during the submission scoring?

#

I'm using daigt_external_dataset.csv and train_essays_RDizzl3_seven_v1.csv as training data.

#

I've done a write up to better explain my problem / observation: https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/455056

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

ember nymph Nov 13, 2023, 8:11 PM

#

Why did the competition organizers, not supply a representative test set?

#

Is this because they don't have enough human written essays, and generated ChatGPT essays?

#

I find it really silly they can't supply 50 each as test.csv and supply a much bigger test.csv during the submission scoring.

#

I find it equally silly the process by which this "test.csv used during scoring" is created, is not public knowledge.

sturdy kraken Nov 15, 2023, 6:02 AM

#

Hey guys I'm thinking of using langchain/hf to make a dayset of llm essays (using the prompts from train_dataset) does anyone want to team up?

onyx pulsar Nov 17, 2023, 1:45 PM

#

Anyone willing to share some ideas on how to improve?

#

My best code has 0.91 auc on the leaderboard, but it's really hard to go up from here.

#

Anyone willing to do a code review for feedback? 🙂

#

On a related note, can I see what code others used? Or is this hidden?

lean sluice Nov 19, 2023, 10:42 PM

#

hi

normal topaz Nov 21, 2023, 3:26 AM

#

Hi, I'm looking for a teammate. I have decent experience in NLP and training machine learning/neutral network based models.
Pls DM 🙂

woeful wasp Nov 21, 2023, 11:26 PM

#

hey

#

I am new to kaggle competitions

#

I trained my model and saved checkpoint to /kaggle/working directory, but I cannot load it during test time

#

isn't /kaggle/working folder preserved during testing?

mint phoenix Nov 22, 2023, 2:15 PM

#

Should be fine to load it

#

I am also saving it and then loaded it again

#

Works fine for me

mint phoenix Nov 22, 2023, 2:53 PM

#

I have made a guide on how to train a tokenizer from scratch

#

https://www.kaggle.com/code/tonyyunyang99/how-to-use-a-tokenizer

How to use a Tokenizer?

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

woeful wasp Nov 23, 2023, 1:54 AM

#

mint phoenix I am also saving it and then loaded it again

interesting, I tried a lost but coudn't load from /working folder. Now I am training in one notebook and saving the weights then upload them to submission notebook as dataset..

mint phoenix Nov 23, 2023, 8:50 AM

#

woeful wasp interesting, I tried a lost but coudn't load from /working folder. Now I am trai...

omg...

lofty mirage Nov 25, 2023, 9:59 PM

#

I'm currently at place 6 but imo haven't done something special

#

Anyone who has used something else as tf-idf

#

With a decent score?

ember nymph Nov 26, 2023, 1:41 AM

#

Has anyone been using ChatGPT APIs?

acoustic flame Nov 26, 2023, 10:20 PM

#

lofty mirage Anyone who has used something else as tf-idf

I'm trying something with transformers, but I don't know if it's really useful with the 512 token limit?

sturdy kraken Nov 27, 2023, 6:09 AM

#

Hey guys check out my notebook:

https://www.kaggle.com/code/pranshubahadur/detect-llm-generated-essays-using-retention/notebook

Would appreciate any insights!

Detect LLM generated essays using Retention

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

sturdy kraken Nov 27, 2023, 10:17 AM

#

acoustic flame I'm trying something with transformers, but I don't know if it's really useful w...

I'm not sure about this due to nature of tfidf and positional embedding... Be interesting to see though!

lofty mirage Nov 27, 2023, 3:29 PM

#

Interesting notebook, from all of the data sources you use. Are there any that are no texts related to the suggested 7 prompts?

lofty mirage Nov 27, 2023, 4:18 PM

#

Tf-idf and the war of available memory.

sturdy kraken Nov 28, 2023, 6:16 AM

#

Yeah try the one generated using PaLM

#

You can also try to generate your own using llama cpp

lofty mirage Nov 28, 2023, 3:17 PM

#

Owh nice, yes I'm just scared to lose a lot of positions by focusing on those 7 prompts.

lofty mirage Nov 28, 2023, 8:15 PM

#

sturdy kraken Hey guys check out my notebook: https://www.kaggle.com/code/pranshubahadur/dete...

Have you tried this only using the dataset that is popular among the Best public score notebooks?

#

I used all of these datasets, made tfidf features but the score was much worse than what you did.

sturdy kraken Nov 29, 2023, 5:34 AM

#

Using Retention?

sturdy kraken Nov 29, 2023, 5:35 AM

#

lofty mirage I used all of these datasets, made tfidf features but the score was much worse t...

So i don't think tf_idf works with retention / transformers since these architectures are dependent on word positions

sturdy kraken Nov 29, 2023, 5:38 AM

#

lofty mirage I used all of these datasets, made tfidf features but the score was much worse t...

could you link this?

lofty mirage Nov 29, 2023, 7:56 AM

#

sturdy kraken So i don't think tf_idf works with retention / transformers since these architec...

No no I did not feed tfidf features to the retention algo. I used the same data as you, concatenated all the different sources. But finally had a score way worse than you have in that notebook.

#

It's impossible to create all the tfidf features using the same logic as found in the top Public notebooks. So I shrunk the data using truncated SVD, this enabled me to use all available data. But the score was something in the range of 0.715

sturdy kraken Nov 29, 2023, 8:51 AM

#

Yeah

#

Maybe try to fix spelling errors before using tfidf?

lofty mirage Nov 29, 2023, 1:49 PM

#

Would that make such a big difference, but I could have a look into it.

#

Since I doubt that would increase my score from 0.7 ish to 0.9ish I didn't go further with that

heady terrace Nov 29, 2023, 11:32 PM

#

Hi everyone, since I found that many people saying that the lead board is highly over fitted does any one have a good idea how to design a good validation set for this competition ?

ember nymph Dec 4, 2023, 3:01 AM

#

hey stupid question everyone but how do you access the hidden test file in the submission (python) notebook?

#

I tried
test_samples = pd.read_csv("../input/test-essays/test_essays.csv",sep=',')
but it failed

woeful wasp Dec 4, 2023, 11:26 AM

#

ember nymph I tried test_samples = pd.read_csv("../input/test-essays/test_essays.csv",sep=',...

test_set = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/test_essays.csv')

ember nymph Dec 4, 2023, 12:51 PM

#

woeful wasp ```python test_set = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/te...

thanks!

ember nymph Dec 4, 2023, 1:07 PM

#

and for people who are using ngrams in the (3,5) range with the big datasets (> 40k rows) how are you dealing with the 30GB RAM limit?

lofty mirage Dec 4, 2023, 2:32 PM

#

I suggest having a good look at the scikit-learn API, there are some interesting parameters to tune.

fleet widget Dec 6, 2023, 4:34 PM

#

hey beginner question here...can someone explain how submissions work in this competition?

I am trying to submit it always fails😕

#

I m doing by loading model and tokanizer in kagle input directory. During submission my notebook runs successfully but scoring fails.... Exception error comes

Please suggest me a solution someone

lofty mirage Dec 6, 2023, 4:42 PM

#

Why is it failing, what is the error message?

fleet widget Dec 6, 2023, 4:43 PM

#

rn_image_picker_lib_temp_7aed7978-66ed-4b25-8ed9-6cf26401d7ef.jpg

#

rn_image_picker_lib_temp_58f80929-4f9e-44cc-8382-79053be8ab7c.jpg

#

rn_image_picker_lib_temp_3cfd3a94-b137-47e4-b748-5a5800f2cf91.jpg

fleet widget Dec 6, 2023, 4:44 PM

#

lofty mirage Why is it failing, what is the error message?

This is the error 😕 any suggestions so that it can be resolved and I can submit

lofty mirage Dec 6, 2023, 4:48 PM

#

Click on it and scroll down until you find the error.

fleet widget Dec 6, 2023, 4:49 PM

#

lofty mirage Click on it and scroll down until you find the error.

There is no error in notebook

#

I checked log too

rn_image_picker_lib_temp_ca225604-66f8-475f-b79c-ddc6ea908bc4.jpg

#

😕 how u submitted?? Give suggestions

lofty mirage Dec 6, 2023, 4:54 PM

#

One of the public notebooks had an if statement that looks at the length of the test file and therefore executes different code in the submission.

#

Did you copy a notebook?

fleet widget Dec 6, 2023, 4:58 PM

#

lofty mirage Did you copy a notebook?

No

fleet widget Dec 6, 2023, 4:58 PM

#

lofty mirage One of the public notebooks had an if statement that looks at the length of the ...

So how to solve it

lofty mirage Dec 6, 2023, 5:01 PM

#

I do not know since I do not have your code. If you publish it on Kaggle I could have a look since private sharing of code is not allowed.

fleet widget Dec 6, 2023, 5:02 PM

#

lofty mirage I do not know since I do not have your code. If you publish it on Kaggle I could...

Okay will try to do it.

How u submitted??? During submission what's inside notebook anything to see!?

ember nymph Dec 7, 2023, 4:28 AM

#

by the way, you're using windows, you do realise you can send images of your screen by pressing the print screen button rather than sending a photo of your computer right

#

#

like that

fleet widget Dec 7, 2023, 5:20 AM

#

ember nymph by the way, you're using windows, you do realise you can send images of your scr...

yes 🙂

fleet widget Dec 7, 2023, 5:56 AM

#

During submission i need to make another notebook ????? in which no code for training will be there . is it like this????

fleet widget Dec 7, 2023, 6:35 AM

#

🙄😕???

fleet widget Dec 7, 2023, 8:19 AM

#

https://www.kaggle.com/code/nidhipriya123/chexk

chexk

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

fleet widget Dec 7, 2023, 8:19 AM

#

fleet widget https://www.kaggle.com/code/nidhipriya123/chexk

can anyone solve my problem of submission not happening

fleet widget Dec 7, 2023, 8:21 AM

#

lofty mirage I do not know since I do not have your code. If you publish it on Kaggle I could...

https://www.kaggle.com/code/nidhipriya123/chexk

here is my code please check and give a solution if u can really help me in submission. i will be very thankful

chexk

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

ember nymph Dec 7, 2023, 9:54 AM

#

it says it ran successfully, but theres no score. what happened when you submitted

fleet widget Dec 7, 2023, 10:36 AM

#

exception error

fleet widget Dec 7, 2023, 10:36 AM

#

ember nymph it says it ran successfully, but theres no score. what happened when you submitt...

notebook ran but score failed

ember nymph Dec 7, 2023, 2:58 PM

#

what happens when you run it on your own test data

fleet widget Dec 7, 2023, 5:21 PM

#

It gives me the output as wanted

fleet widget Dec 9, 2023, 4:12 AM

#

It's not submitting

#

What to do?

fleet widget Dec 9, 2023, 6:02 AM

#

opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["remove_papermill_header.RemovePapermillHeader"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
29.9s 9 warn(
29.9s 10 [NbConvertApp] WARNING | Config option kernel_spec_manager_class not recognized by NbConvertApp.
29.9s 11 [NbConvertApp] Converting notebook notebook.ipynb to notebook
30.3s 12 [NbConvertApp] Writing 10481 bytes to notebook.ipynb
31.9s 13 /opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["nbconvert.preprocessors.ExtractOutputPreprocessor"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
31.9s 14 warn(
31.9s 15 [NbConvertApp] WARNING | Config option kernel_spec_manager_class not recognized by NbConvertApp.
32.0s 16 [NbConvertApp] Converting notebook notebook.ipynb to html
32.9s 17 [NbConvertApp] Writing 298426 bytes to results.html

in log this is coming

my notebook ran successfully but after that submissin failed~~!!!!

any solution from experts??? or all are just noob???

#

notebook ran successfully after that error in submission.

sturdy kraken Dec 9, 2023, 1:34 PM

#

What's the error message? (click the ⚠️)

sturdy kraken Dec 9, 2023, 5:17 PM

#

hey guys started generating a dataset using mistral:

https://www.kaggle.com/code/pranshubahadur/llm-generate-essay-mistral-hf-langchain

LLM: Generate Essay -> Mistral (HF, Langchain)

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

fleet widget Dec 9, 2023, 7:39 PM

#

sturdy kraken What's the error message? (click the ⚠️)

like this error coming help me

#

notebook runs fine after that during submision it happens

sturdy kraken Dec 10, 2023, 6:38 AM

#

It's either an error in the note book (check the notebook not the logs) or it's an error with the submission part of your code (try index=False when you do to_csv)

gaunt fractal Dec 14, 2023, 2:05 AM

#

Is there an explanation why the test_essays.csv file has only 3 examples, and why those examples are basically gibberish? I couldn't find an explanation on the competition page.

#

indigo cedar Dec 14, 2023, 9:41 AM

#

Hello, i hope someone can help me, I can not submit on the competition, my notebook works fine, my model and tokenizer are well loaded but i have constantly an error.

#

this is my error : OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like distilbertuncases_model is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Installation

#

#

#

#

Please, help me ^^ what have i done wrong ?

indigo cedar Dec 14, 2023, 11:31 AM

#

Anyone ? *

ruby chasm Dec 15, 2023, 4:58 PM

#

Is this channel also meant for finding teams? Just started in the challenge, bringing some transformer and machine learning experience, as well as a 4090 to the table. Got 3 more weeks of vacation and looking for motivated and/or experienced team partners to exchange ideas 🙂 Feel free to reach out

lofty mirage Dec 15, 2023, 11:39 PM

#

4090 🫨

lofty mirage Dec 15, 2023, 11:41 PM

#

indigo cedar Anyone ? *

The competition doesn't let you use the internet.

ruby chasm Dec 16, 2023, 5:48 AM

#

lofty mirage 4090 🫨

Eversthing I trained so far for the competition didnt go above 14GB RAM, so it hardly matters so far. But maybe some team mate has some ideas how to put it to use 😉

ruby chasm Dec 16, 2023, 5:54 AM

#

indigo cedar Anyone ? *

make sure the path is correct (you can check for example with os.listdir). For example, for me the path was mymodel/mymodel
Make sure you have the model offline (usable with Internet = Off). Its probably best to train it outside your submission Notebook (because you need Internet to first use it), save it locally on your device and Upload it as Dataset on kaggle. Thats how I am doing it and never had Problems
Make sure the model folder includes both the tokenizer and the model.

indigo cedar Dec 16, 2023, 6:32 AM

#

Thank you for your advices, I will do it locally first. Thank you

ember nymph Dec 17, 2023, 12:36 PM

#

gaunt fractal Is there an explanation why the test_essays.csv file has only 3 examples, and wh...

the actual test set is secret. That's just a dummy file to use to run your code

ocean flame Dec 18, 2023, 5:57 PM

#

why is nobody else using deberta? I got to top 25 using the existing deberta model + like 10 more lines of code

#

Im fairly confident I can go about 0.05 further by just retraining it as well

ember nymph Dec 19, 2023, 2:08 PM

#

i think people are fine tuning the huggingface pretrained models for classification

lofty mirage Dec 19, 2023, 10:16 PM

#

ocean flame why is nobody else using deberta? I got to top 25 using the existing deberta mod...

Wait what?

#

What is your username?

ocean flame Dec 19, 2023, 10:18 PM

#

^

lofty mirage Dec 19, 2023, 10:20 PM

#

I see disqualified at school

#

0.963

#

Tomorrow I'll have a look into Deberta, I abondoned the idea because I did not see any good notebooks using it.

#

So instead I optimized my own pipeline.

#

I was sick the last couple of days but nobody caught me 🙂

ruby chasm Dec 20, 2023, 6:06 AM

#

ocean flame why is nobody else using deberta? I got to top 25 using the existing deberta mod...

i used deberta, fine tuned it (without much effort etc), got a .78 😄

ruby chasm Dec 20, 2023, 6:09 AM

#

lofty mirage 0.963

again, what a weird competition where you are top 1k at .95 and top 35 at .963 😄

lofty mirage Dec 20, 2023, 8:38 AM

#

I'm first in the competition

ruby chasm Dec 20, 2023, 9:55 AM

#

I thought you mean the deberta score is .963

#

Yea they are .964 - my point was just that .964 is top 21 and .960 is top 777

velvet vortex Dec 21, 2023, 10:22 PM

#

ocean flame Im fairly confident I can go about 0.05 further by just retraining it as well

Im fairly confident I can go about 0.05 further by just retraining it as well
Waiting for your 0.968 score tomorrow then.

ocean flame Dec 21, 2023, 10:33 PM

#

oh i completely forgot i was training that thanks for sending me the notification

#

... and my ssh key seems to have been deleted ...

fierce notch Dec 22, 2023, 6:24 AM

#

ocean flame Im fairly confident I can go about 0.05 further by just retraining it as well

cool

fierce notch Dec 22, 2023, 2:21 PM

#

ocean flame ... and my ssh key seems to have been deleted ...

can you tell us your team name? Really looking forward to seeing the sharing of your team's deberta solution after the competition ends👍

ocean flame Dec 22, 2023, 2:29 PM

#

Linguistic Ninjas

fierce notch Dec 22, 2023, 2:57 PM

#

👍

fierce notch Dec 24, 2023, 1:35 AM

#

anyone try run catboost on gpu ? I meet Kernel crashes when trying to run CatBoost model on GPU https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/463218

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

digital dew Dec 24, 2023, 4:15 AM

#

fierce notch anyone try run catboost on gpu ? I meet Kernel crashes when trying to run Ca...

I also tried but got the kernel crash

late field Dec 24, 2023, 6:53 AM

#

I've tried catboost on the normal CPU cores. Performance was terrible. An earlier poster got PyCaret to work, but I haven't been able to install the library in the notebook

distant salmon Dec 26, 2023, 10:17 PM

#

Hey guys I'm new and lets see if I got this correct so the simplest way of explaining this task is to say that we will build a model that inputs a text(text) and outputs a label(generated). right ?

safe patio Dec 27, 2023, 2:46 AM

#

distant salmon Hey guys I'm new and lets see if I got this correct so the simplest way of expla...

Yeah, you're right

elfin pulsar Dec 27, 2023, 7:33 AM

#

Duhhh I m literally irritated with this OOM error while running catboost on GPU

#

😑😑😑😑

distant salmon Dec 27, 2023, 1:29 PM

#

safe patio Yeah, you're right

thanks

velvet vortex Dec 28, 2023, 9:34 PM

#

elfin pulsar Duhhh I m literally irritated with this OOM error while running catboost on GPU

Use less data. Use 2 T4 instead of a P100.

wraith cypress Dec 29, 2023, 6:53 AM

#

I keep getting an error that my submission is throwing an exception, I feel like my solution is the most simplest there is, just hardcoded values for each line of test_essays.csv; I'd appreciate any help. Would I be ok posting 7 lines of code?

#

http://nopaste.net/w8eyFlUh0i

velvet vortex Dec 30, 2023, 8:14 AM

#

wraith cypress I keep getting an error that my submission is throwing an exception, I feel like...

when you submit a new test dataset is provided. You have to compute predictions for that new test data. You cannot precompute predictions given you can't see this data before hitting the submit button.

ruby chasm Dec 30, 2023, 4:59 PM

#

wraith cypress http://nopaste.net/w8eyFlUh0i

Also, "for x in test_essays" is not iterating over rows (which is done by: "for idx, row in test_essays.iterrows()"), but over the columns. So you will always create 3 values, since the test essay file has 3 columns. Coincidentally, this works for the fake test data which we can access, because it also has 3 rows.
TLDR; your code has a bug - on submission you try to fill a column with ~4.5k rows with a list of 3 values

lyric kayak Dec 30, 2023, 6:07 PM

#

Hi, just entered this competition.

Seems like the best public results (disregarding efficiency) are achieved by training a few classical ML models on top of TF-IDF encodings of tokens. Not BERT, or any other deep learning model. Is that right? It doesnt seem right.

I would expect a pre-trained and fine-tuned BERT or any other decoder model (llama, mistral, etc.) would perform much better than training an ensemble of classical models on top of TF-IDF.

ruby chasm Dec 30, 2023, 6:25 PM

#

lyric kayak Hi, just entered this competition. Seems like the best public results (disregar...

this has been part of many discussion so far (even a couple messages earlier in here 🙂 )

decoder models are not fit for classification tasks, at least not to the degrees of encoder-only transformers or classic ML classifiers
the competition used A LOT of LLMs - but only for generating data. we are at a point where we went from a few human-generated training rows to over 60k AI rows and over 25k human rows (congrats to the community for that)
the main question: why does classic ML outperform transformers? no real clue. on paper (at least in my opinion), the transformer embeddings should outperform tf-idf. However, it is worth noting that the used tf-idf embeddings are based on n-grams (3-grams to 5-grams), which means we are closer to positional meaning than pure word-based tf-idf. I strongly believe the reason to be that the n-grams tell more about the GenAI behavior (as in which texts are generated) than the pre-trained transformer embeddings + attention etc.
3b) ensembles are in most situations increasing performance - if you are using transformers, maybe use a transformer ensemble too (e.g., finetune BERT + finetune DeBERTa + finetune RoBERTa + finetune Electra + maybe finetune an existing finetuned model for essays / scientific articles etc. )

You are obviously more than welcome to prove all current top leaderboard submissions wrong with the use of transformers - but for me even a 2 layer NN has been outperforming a fine-tuned deberta3 large by almost .12

lyric kayak Dec 30, 2023, 6:26 PM

#

@ruby chasm Thanks for all the info! Super helpful!

lyric kayak Dec 30, 2023, 6:36 PM

#

ocean flame why is nobody else using deberta? I got to top 25 using the existing deberta mod...

What do you mean "existing deberta model"? Are you referring to a public kernel or just the base pre-trained deberta from hugginface?

ruby chasm Dec 30, 2023, 6:43 PM

#

would love to hear more about Dom's approach but I dont think we will 😉

#

https://tenor.com/view/lord-of-the-rings-frodo-alright-then-keep-your-secrets-smiling-gif-26462584

Tenor

ocean flame Dec 30, 2023, 7:42 PM

#

lyric kayak What do you mean "existing deberta model"? Are you referring to a public kernel ...

The one on kaggle

#

That is 0.833

#

I have since changed approach quite heavily

lyric kayak Dec 30, 2023, 7:48 PM

#

Thanks for the info @ocean flame

lyric kayak Dec 31, 2023, 10:08 AM

#

Can someone link to the dataset used in this kernel: https://www.kaggle.com/code/carrot1500/distilbertclassifier-from-scratch-with

/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv

Also are there better datasets/versions people are using?

Edit: found it https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset

lyric kayak Dec 31, 2023, 4:09 PM

#

What is the meaning of the bool column "RDizzl3_seven" in the daigt-v2 train dataset?

wraith cypress Dec 31, 2023, 9:06 PM

#

ruby chasm Also, "for x in test_essays" is not iterating over rows (which is done by: "for ...

Thank you! That solved my problem!

ruby chasm Jan 2, 2024, 10:49 AM

#

lyric kayak What is the meaning of the bool column "RDizzl3_seven" in the daigt-v2 train dat...

hey shatz - sorry for replying somewhat late, we had some holidays 😉 as i see nobody else replied: this will give you some insights:
https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/453410

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

lyric kayak Jan 3, 2024, 6:02 PM

#

@ruby chasm Thanks! Still am a bit confused. So if a row in the daigt-v2 train dataset has RDizzl3_seven=True it means that it came from this "7 prompts" dataset? Meaning the 7 prompts dataset is a subset of the daigt-v2 train dataset

#

correct?

ruby chasm Jan 3, 2024, 6:37 PM

#

lyric kayak <@383516289190330368> Thanks! Still am a bit confused. So if a row in the `daigt...

the daigt v2 is comprised of multiple data sets, the rdizzl3 set is a separate set. that is included in daigt v2 iirc

lyric kayak Jan 3, 2024, 6:56 PM

#

gotcha! Thanks!!

lyric kayak Jan 4, 2024, 7:19 PM

#

Has anyone experimented in max_features for TfidfVectorizer? Does limiting the number of features impact score significantly?

digital dew Jan 5, 2024, 6:00 AM

#

lyric kayak Has anyone experimented in max_features for TfidfVectorizer? Does limiting the n...

I think I experimented with that once, but not much changed

ruby chasm Jan 5, 2024, 7:42 AM

#

lyric kayak Has anyone experimented in max_features for TfidfVectorizer? Does limiting the n...

experimenting with it right now. it definetly impacts the runtime of the notebook a lot

valid marten Jan 5, 2024, 9:16 AM

#

Wow .982 in the leaderboard 😯 @noble tendon found some secret sauce indeed!!

lofty mirage Jan 5, 2024, 9:39 AM

#

Damn

lyric kayak Jan 5, 2024, 10:57 AM

#

ruby chasm experimenting with it right now. it definetly impacts the runtime of the noteboo...

yup, faster runtime and also lower score. Got .875 when going to a mere 1000 features.

ruby chasm Jan 5, 2024, 12:52 PM

#

lyric kayak yup, faster runtime and also lower score. Got .875 when going to a mere 1000 fea...

with the .961 baseline?

ruby chasm Jan 5, 2024, 3:30 PM

#

How are you guys balancing the OOM error? Its my first time turning towards the .961 baseline and whenever i add more data than the daigt v2 dataset im going OOM

velvet vortex Jan 5, 2024, 3:41 PM

#

lofty mirage Damn

I had the same reaction when you passed me the first time.

lofty mirage Jan 5, 2024, 4:41 PM

#

velvet vortex I had the same reaction when you passed me the first time.

I went from a one stage pipeline to a 2 stage pipeline. I'm trying to combine multiple methods now but they're hardly increasing my score.

#

With multiple I mean more than 2 btw.

lyric kayak Jan 5, 2024, 5:43 PM

#

ruby chasm with the .961 baseline?

yup

ember nymph Jan 5, 2024, 5:46 PM

#

I am having problems to submit my notebooks. The submission remains as "scoring" for several hours and then it does not submit. Does anyone has the same problem?

late portal Jan 5, 2024, 6:49 PM

#

just want to share a funny AI-generated essay on "phones and driving"

#

There are many people out in this world that get along just fine and there are some that are totally unhappy and unlucky in love, well that is not the case in this story because I am not unhappy at all in love. I have been married to my high school sweetheart for ten years. My love and I have four children that are all above the age of six. I am at home with the kids while my wife is working. However, not all is peaceful at the Green household. Someone always has an attitude and is mad at another person and that's not me but the kids. There are so many things that they get on each other's nerves and love to pick on each other. When they get on each other's nerves, they turn on each other and just can't seem to get over the things that they say.

One day in the afternoon, I had to take my oldest son on his annual Physical for his football team. The physical was at four o'clock in the afternoon. The quarterback is my son and he has to pass this physical or he will not be able to play football this season. When I dropped my oldest son off to be checked out by the doctor, I went to pick up my wife from work and then we were going to go on to the doctor’s office together. My daughter and daughter-in-law were working for the day so it was going to be a long drive. The doctor's office was about ten minutes away from where we worked. When I got home, I told my wife and oldest son about the doctor’s appointment and told them to get ready and we were heading out
.

#

I drove for about fifteen minutes trying to get us to the doctor's office. On the way there, my cell phone started to ring. When I looked to see who it was, it was my wife who I just said to meet me at the doctor’s office. I stopped in the middle of the road because I was in my mind and not looking at the road. When I turned around, I notice that I was in the middle of the road and there was a SUV right behind me. If I would have looked up a few more seconds, I would have been rammed and possibly killed. I would have killed someone with my own stupidity. To this day, I look back to that day and wonder if my life could have changed by using my cell phone less.

As you can see, there are many reasons why one should not drive while talking on the phone. You can run into things and get distracted and crash into something or somebody. Also, it is against the law in Iowa to talk on the phone and text while driving. For these reasons, people should not be allowed to talk on their phone while driving.

indigo cedar Jan 5, 2024, 7:07 PM

#

^^ it's educational

lofty mirage Jan 6, 2024, 11:00 AM

#

ruby chasm How are you guys balancing the OOM error? Its my first time turning towards the ...

#

When trying to incorporate public notebooks into the pipeline

#

I tried most things

#

Omitting n_jobs

#

Deleting all variables

#

GC.collect()

ruby chasm Jan 6, 2024, 11:39 AM

#

brutal

#

have you experimented with max_features in the vectorizer?

lyric kayak Jan 6, 2024, 6:00 PM

#

Can someone enlighten me about some psychology behind publishing solutions?
I understand why competitors in the top 20 or so do not publish their solutions. They are clearly there to win.

But, considering the competitors with a score of 0.963. Would they not benefit more from publishing their solution and getting a ton of upvotes than trying to actually win the competition?

#

^ hope I dont sound demeaning or something btw. Totally cool to not publish solutions haha. Just curious behind the thought process of those with 0.963.

astral drum Jan 6, 2024, 6:46 PM

#

How about you achieve that score and see what would you then?

lofty mirage Jan 6, 2024, 6:55 PM

#

We'll if you have an unique idea and are scoring 0.963 and believe you have room of improvement I wouldn't share the code.

#

Besides that it is unknown wether these people have used public notebooks as well in their pipeline

#

So there is still a possibility they're in for the gold.

ruby chasm Jan 6, 2024, 7:05 PM

#

@lofty mirage i just made a poston oom which might help your problems

lyric kayak Jan 6, 2024, 7:09 PM

#

@lofty mirage makes sense thanks! I guess it depends on the situation. Hevent been in it myself yet haha

lofty mirage Jan 6, 2024, 7:20 PM

#

I see thank you for the tips.

lyric kayak Jan 6, 2024, 7:49 PM

#

Does anyone have insight as to how the VotingClassifier's estimators were figured out in the .961 and .962 kernels?

Is it just trial and error or does the author have a sophisticated hyperparameter optimization pipeline?

astral drum Jan 6, 2024, 7:51 PM

#

probably both

velvet vortex Jan 6, 2024, 10:47 PM

#

lyric kayak Can someone enlighten me about some psychology behind publishing solutions? I un...

The one with 0.963 with deberta clearly thinks he can get a better score. And if you look at LB, you don't need to improve a lot to be in gold zone. Ofc this assumes little to no shakeup.

unkempt haven Jan 7, 2024, 1:29 AM

#

lyric kayak Can someone enlighten me about some psychology behind publishing solutions? I un...

Sharing the exact code has one problem. It will lead to lots of forkmittings, where many people just directly copy and submit without making any modifications. Sharing the approach only gives ideas but forces people to actually put in the work to implement

#

and there’s only 3 weeks left in the comp

ruby chasm Jan 7, 2024, 7:39 AM

#

velvet vortex The one with 0.963 with deberta clearly thinks he can get a better score. And i...

Also they had quite the operation going. There is no way we are able to replicate this within the last 17 days. They did not share code but just how transformers may work - which I think is really good and what kaggle is for

lyric kayak Jan 7, 2024, 8:06 AM

#

ruby chasm Also they had quite the operation going. There is no way we are able to replicat...

I guess youre referring to the deberta with 0.963. Can you link the reference? I didnt see this

ruby chasm Jan 7, 2024, 8:21 AM

#

https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/465882

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

ruby chasm Jan 7, 2024, 8:29 AM

#

lyric kayak yup, faster runtime and also lower score. Got .875 when going to a mere 1000 fea...

btw, have you tried more of this? im currently struggling to get the catboost going. for some reason its even going oom when copying it from notebooks that already scored on lb..

lyric kayak Jan 7, 2024, 9:36 AM

#

ruby chasm btw, have you tried more of this? im currently struggling to get the catboost go...

Have not tried anything in between 1000 features and whatever the default it (I guess unlimited?). If i do ill update.

ruby chasm Jan 7, 2024, 9:36 AM

#

Yep, unlimited atm

lyric kayak Jan 7, 2024, 9:36 AM

#

Strange to get OOM when running stuff that already scored

#

Maybe unlimited is just barely on the edge of max memory allotment

ruby chasm Jan 7, 2024, 9:37 AM

#

its very frustrating. since it also takes forever..

#

thanks for the quick reply tho 🙂

lyric kayak Jan 7, 2024, 9:38 AM

#

Sure, thanks to u too haha

digital dew Jan 7, 2024, 9:39 AM

#

ruby chasm its very frustrating. since it also takes forever..

So I am not the only one getting 9 hour runtimes on the submission stage lol

ruby chasm Jan 7, 2024, 9:43 AM

#

which is wild, because if I remove boost from my planned pipeline, it takes less than 30minutes

digital dew Jan 7, 2024, 9:46 AM

#

catboost takes too long to train but I can't even use gpu without getting an oom :(
has anyone successfully used catboost with gpu?

ruby chasm Jan 7, 2024, 9:50 AM

#

trying it rn 🙂

ruby chasm Jan 7, 2024, 10:42 AM

#

I just found this: https://github.com/catboost/catboost/issues/2192

GitHub

CatBoost GPU out of memory · Issue #2192 · catboost/catboost

Problem:When i use catboost to train my dataset on GPU.It reports the error:"out of memory. requested 1795MB; free 595 MB".So i want to use two GPUs to solve this error by set parameters ...

#

according to that calculation, the ~9200 samples have ~8M 3-5ngrams (features) and we have training set of ~45k rows. That means 432_000_000_000 Bytes. Thats approx. 402 GB RAM

ruby chasm Jan 7, 2024, 11:34 AM

#

digital dew So I am not the only one getting 9 hour runtimes on the submission stage lol

just ran an xgb model taking 3hrs

lyric kayak Jan 7, 2024, 1:58 PM

#

@ruby chasm 10k feats results in 0.901

ruby chasm Jan 7, 2024, 1:59 PM

#

on the whole ensemble or just boost?

lyric kayak Jan 7, 2024, 1:59 PM

#

oh wait nvm, I also removed MultinomialNB

ruby chasm Jan 7, 2024, 1:59 PM

#

i did a 42k multinomial that scored .909

lyric kayak Jan 7, 2024, 1:59 PM

#

just multinomial?

#

mine was whole ensemble except for multinomial

ruby chasm Jan 7, 2024, 2:10 PM

#

yea, just multinomial

lyric kayak Jan 7, 2024, 7:04 PM

#

any tips for local validation? I gotta stop using kaggle submissions for validation

#

was thinking just to hold out the official training data prompts for validation set:

val_df = train[train['prompt_name'].isin(["Car-free cities", "Does the electoral college work?"])]
train_df = train[~train['prompt_name'].isin(["Car-free cities", "Does the electoral college work?"])]

ruby chasm Jan 8, 2024, 6:20 AM

#

we are fairly certain those prompts are not in the test set. so i cant tell you how good that idea works

#

i have been just doing normal CV (which is not .999 for me atm) - selecting folds in a way to get close to 9k test samples... this probably aint a good / the best way, but i dont have a good validation set

lyric kayak Jan 8, 2024, 11:27 AM

#

My intution says that at least those prompts should be within the same "distribution" as the test set (assuming they are a subset of the original test dataset). That being said, they are just too easy haha. My local validation and LB are way off

ruby chasm Jan 9, 2024, 4:58 AM

#

How is anybody able to submit on catboost? Im even going OOM on it with max_features=1M

velvet vortex Jan 9, 2024, 10:03 AM

#

If you have N samples in your training data and M features then catboost will use NxMs4 bits (I assume it uses FP32). Then compare this with your memory to get the number of features you can afford.

ruby chasm Jan 9, 2024, 12:56 PM

#

@velvet vortex am i correct to assume i have the same Hardware limitations during submission? I even copied the model from a notebook that was already scored on LB, but when I try it locally it keeps on failing. this is logically beyond my understanding. I have tried it with less features AND less data than the Notebook I copied it from. only reasoning for me would be having more than 30GB RAM during submission

velvet vortex Jan 9, 2024, 2:53 PM

#

Are you executing it in the same environment?

ruby chasm Jan 9, 2024, 4:44 PM

#

velvet vortex Are you executing it in the same environment?

I am copying the notebook 1:1 but wanted to test it inside editor first, where it crashes. I never submitted it since I dont want to waste submissions.
What I did: fake a test set, reduce train size (smaller than in the notebook i copied from), set max_features. I cant see how this would go oom

lyric kayak Jan 9, 2024, 5:11 PM

#

Are you guys referring to max num features for tf-idf? Or is there a parameter for catboost that limits the number of features it makes?

ruby chasm Jan 9, 2024, 5:22 PM

#

nope max feature of tfidf

lyric kayak Jan 9, 2024, 5:29 PM

#

Im surprised it makes 1M+ features

#

did you check that?

#

we should run it on the train set and see that it even makes that many. Maybe your max is greater than what tfidf needs

ruby chasm Jan 9, 2024, 5:33 PM

#

when I create a fake test set with ~9k subsamples from the train set, in created ~7M tfidf features (i.e., my training matrix is 40000x7300000). I am not sure how many featuers cat creates, can never check, its always oom

#

on train set i think it was like 60-80M features from tfidf

lyric kayak Jan 9, 2024, 5:33 PM

#

oh damn i stand corrected haha

#

catboost has this param: used_ram_limit=None

#

interesting 🧐

#

also: per_float_feature_quantization=None

ruby chasm Jan 9, 2024, 5:47 PM

#

used ram limit didnt help

ruby chasm Jan 9, 2024, 5:47 PM

#

lyric kayak also: per_float_feature_quantization=None

this sounds interesting, didnt see it on performance thingies

lyric kayak Jan 9, 2024, 5:56 PM

#

Another interesting direction: KBinsDiscretizer
https://scikit-learn.org/stable/auto_examples/cluster/plot_face_compress.html

The example they show doesnt actually use less memory, but at the end they do say that it would use less memory if it started out as float64 (their starting point is int8)

scikit-learn

Vector Quantization Example

This example shows how one can use KBinsDiscretizer to perform vector quantization on a set of toy image, the raccoon face. Original image: We start by loading the raccoon face image from SciPy. We...

#

not sure if sklearn pipeline would take advantage of it anyway. They could cast to float64 immediately upon training

ruby chasm Jan 9, 2024, 5:57 PM

#

interesting, thanks for sharing

lyric kayak Jan 9, 2024, 5:58 PM

#

also I looked at the docs for catboost per_float_feature_quantization. have absolutely no idea what the hell it does

#

seems like you need to understand catboost

ruby chasm Jan 9, 2024, 5:58 PM

#

https://tenor.com/view/understand-nothing-michael-scott-the-office-steve-carrell-gif-7915947

Tenor

lyric kayak Jan 9, 2024, 5:59 PM

#

Oh wait I understand something
border_count: The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default: 254 on cpu, 128 on gpu

#

per_float_feature_quantization changes the amount of "borders" which is splits for a feature

#

Basically we would need to make the per_float_feature_quantization less than 254 for every feature

#

and we might save some rams

ruby chasm Jan 9, 2024, 6:01 PM

#

i despise catboost in this competition

lyric kayak Jan 9, 2024, 6:02 PM

#

something like: per_float_feature_quantization=[f'{i}:border_count=50' for i in range(num_feats)]

#

https://tenor.com/mXd5NplsuvC.gif

Tenor

magic peak Jan 10, 2024, 12:07 AM

#

I can see texts talking about scores of more than 0.95, I was just wondering if these guys have generated artificial data via LLMs, as I can see only 3 data points for these class in the original data

opal sequoia Jan 10, 2024, 1:15 AM

#

magic peak I can see texts talking about scores of more than 0.95, I was just wondering if ...

I think so yes https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset, or discussed in https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/465882 this author even created a dataset of 500k data points

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

ruby chasm Jan 10, 2024, 5:52 AM

#

magic peak I can see texts talking about scores of more than 0.95, I was just wondering if ...

Check out public notebooks shared in the code part of the competition. Most notebooks use this amazing dataset, which seems to work great out of the box https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset

DAIGT V2 Train Dataset

A dataset you can actually train on for the LLM Detect AI Generated Text comp.

elfin pulsar Jan 10, 2024, 9:27 AM

#

Woaaha, Now I m getting 0
O.960 without catboost also

#

And it just takes 2 hours to run

fierce notch Jan 10, 2024, 9:27 AM

#

cool

elfin pulsar Jan 10, 2024, 10:27 AM

#

Okay so I have one question

#

So I collected my own data as well

#

Will it be good to fit my vectorizer on my own data

#

Or just fit it on publically available data of drcat

lyric kayak Jan 10, 2024, 11:43 AM

#

elfin pulsar Woaaha, Now I m getting 0 O.960 without catboost also

would be nice if you share approach 😇 I cant beat .95 without catboost

elfin pulsar Jan 10, 2024, 11:45 AM

#

After the competition

#

😂😂😂

lyric kayak Jan 10, 2024, 11:45 AM

#

fair enough haha

#

https://tenor.com/bT0p6.gif

Tenor

elfin pulsar Jan 10, 2024, 11:48 AM

#

ruby chasm Jan 10, 2024, 1:11 PM

#

lyric kayak would be nice if you share approach 😇 I cant beat .95 without catboost

the .962 Baseline that was shared without CatBoost is .955 in about 2 hrs

opal sequoia Jan 10, 2024, 9:57 PM

#

ruby chasm this has been part of many discussion so far (even a couple messages earlier in ...

Hi, I'm a newbie and I just wanted to give some of my thoughts about why classic ML performs so well, or even better than LLM. If there's any point that is wrong, please correct me 🙂

About point 3 from Valentin's discussion "on paper (at least in my opinion), the transformer embeddings should outperform tf-idf", I think we should clarify which task do transformer embeddings outperform tf-idf. I think if the task is language modeling, then yes, transformer embeddings should be better because they are specifically trained to learn human knowledge, while tf-idf are just probability. But I don't think this should hold for the task of LLM-generated text detection. In my opinion, the reason why classic ML performs better than LLM models is that because of the tf-idf features that captures the WRITING STYLES, which I believe is what sets LLM-generated text apart from student essays. ChatGPT will likely give an essay that sounds similar to an article on the news, which can sound more formal, with rich vocabularies than student essays. And this can be captured efficiently by tf-idf embeddings, while LLM embeddings will be affected by its human knowledge, which is not the focus of this task, and will make LLM perform worse (unless there are much more data as the solution of BERT with 500k augmented data from https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/465882).

Another point I want to mention is that I think using LLM to detect LLM-generated text, in the long run, might not be a sustainable solution, as LLM are trained to generate human like text. It's just a matter of time for LLM to be able to generate total-human-like text, and eventually LLM-based detection solution will not be able to do detection

LLM - Detect AI Generated Text

Identify which essay was written by a large language model

elfin pulsar Jan 11, 2024, 12:25 PM

#

Guys

#

Is there a way to load large pickle files

#

I dumped it from from joblib

#

It's size is 5.5 gb

#

But when I load it on kaggle my notebook crashes

ocean flame Jan 12, 2024, 12:46 AM

#

opal sequoia Hi, I'm a newbie and I just wanted to give some of my thoughts about why classic...

I think tf idf is just highly over fitted

ember nymph Jan 12, 2024, 1:08 AM

#

ocean flame Im fairly confident I can go about 0.05 further by just retraining it as well

heya how's it going with deberta? I have been using distilbert and deberta-v3-small (I don't have the resources for deberta-v3-base or deberta-v3-large) as well as my own hand rolled 2 or 3 layer LSTM/FF NN with attention and I'm getting around 0.8 for distilbert and my hand rolled NN and less than 0.8 for deberta, I trained deberta and distilbert for 1 or 2 epochs with lr in the 1e-6 to 1e-5 range

ember nymph Jan 12, 2024, 1:10 AM

#

ocean flame I think tf idf is just highly over fitted

from what i've been doing the LLM/NN approach are outperforming tfidf on my own train/test split from the train_v3_drcat_02.csv that's publicly available on kaggle, but tfidf outperforms LLM/NN on the competition test data

lyric kayak Jan 12, 2024, 8:01 AM

#

arbitraty text statistics (potentially as a result of unfortunate data collection) > text semantics & intricacies

ruby chasm Jan 12, 2024, 11:54 AM

#

i still dont think tfidf is overfitting in the context of the Kaggle use case, If you define "new data" as the private LB.

But I also believe that the .963 Transformer solution group will climb some Ranks on private LB. Probably top 5-10

ember nymph Jan 12, 2024, 2:22 PM

#

my best performing submission so far is tfidf + multinomial naive bayes (0.918), using distilbert I get like 0.8 something

#

but i get 0.98 something ROC AUC with distilbert on my own 80/20 train test split (use 3 of the prompts for validation and 12 for training)

#

if i can't get the NN approach to work in the next few days I'll just focus on getting higher with tfidf using ensemble methods

ruby chasm Jan 12, 2024, 5:43 PM

#

ember nymph but i get 0.98 something ROC AUC with distilbert on my own 80/20 train test spli...

have you looked into the confidence of predictions? I have only trained 1 model so far for the competition, but it was extremely overconfident. All predictions were either 0 or 1. If you find a way to reduce this, your model is probably going to perform better (my deberta was .86-ish)

#

I will also give it another try soon 🙂

azure pagoda Jan 12, 2024, 6:45 PM

#

damn..whats with the dislikes?..

elfin pulsar Jan 12, 2024, 8:53 PM

#

I want to ask one thing

#

How much time does multinomial nb takes to submit

ember nymph Jan 13, 2024, 1:42 AM

#

ruby chasm have you looked into the confidence of predictions? I have only trained 1 model ...

your score is .86-ish when you submit to the kaggle comp? did you evaluate it on your own test data?

#

i don't know what the prediction probabilities are for the competition test data but based on the ROC AUC it should be close to 1 or 0 for most of the data and closer to .5 for the misclassified

elfin pulsar Jan 13, 2024, 4:22 AM

#

Dude why multinomial nb gives 0.94

#

Just in 20 minutes run

#

😂😂😂😂

ruby chasm Jan 13, 2024, 5:38 AM

#

elfin pulsar Just in 20 minutes run

exactly. Same with sgd. its Like 25-30 min iirc and a Bit better

ruby chasm Jan 13, 2024, 5:40 AM

#

ember nymph your score is .86-ish when you submit to the kaggle comp? did you evaluate it on...

yes on own data it was like .99 or such - by far the most overconfident model with the highest CV LB gap

ruby chasm Jan 13, 2024, 5:40 AM

#

ember nymph i don't know what the prediction probabilities are for the competition test data...

for me there was no preds between 0.05 and 0.95 I think. the model was very overconfident

digital dew Jan 13, 2024, 12:02 PM

#

I just found something weird
Just split my train set into two (test size is 9000)
To replicate the submission environment. (The code works properly without oom when submit)
But for some reason I get TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. (basically OOM) when running my code (not submitting)

#

I don't really know where... but I remember that someone said how when we submit the model gets ran in both the public and private test sets?

digital dew Jan 13, 2024, 12:39 PM

#

it does perfectly work when i set the test size to 4500

Guess the competition does only run the subset for public scoring?

azure pagoda Jan 13, 2024, 12:42 PM

#

digital dew it does perfectly work when i set the test size to 4500 Guess the competition ...

it runs on all test set as always

digital dew Jan 13, 2024, 12:43 PM

#

hmm
then I guess my code was wrong

ember nymph Jan 13, 2024, 3:40 PM

#

ruby chasm for me there was no preds between 0.05 and 0.95 I think. the model was very over...

What was your highest public score for the deberta? Mine was less than .8 and for distilbert it was .83 run with less than 3 epochs. It seems like the LLMs are strongly fitting to my train data, generalising well to my validation data (but much worse than the training) and generalising less well to the competition data

#

i've also been getting a weird problem where I edit the notebook and submit several times but the score is the same for all the submissions after editing

#

like when i look at the notebooks for the submissions they are clearly different versions but they all have the same score

lyric kayak Jan 14, 2024, 6:32 PM

#

I tried concatting spaCy embeddings with the TFIDF features, feel free to check it out here: https://www.kaggle.com/code/danshatzz/spacy-embeddings-tfidf-ensemble

spaCy Embeddings + TFIDF + Ensemble

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

#

^ normalizing the embeddings similar to tfidf features improved the score a bit. But I had to remove Catboost due to out of time error, so it doesnt beat baseline

#

I also tried the same with BERT, got worse score

ember nymph Jan 14, 2024, 11:21 PM

#

yeah with catboost you can't do (3,5) ngrams if you train the vectorizr on the full training data. I saw some notebooks where they train the vectorizer on the hidden test data instead

uneven vine Jan 16, 2024, 2:44 PM

#

Someone please clarify, do we have to predict labels like(0 or 1) or the probabilities like (0.8, 0.4, etc)?

iron marlin Jan 16, 2024, 5:05 PM

#

uneven vine Someone please clarify, do we have to predict labels like(0 or 1) or the probabi...

Probabilities

ruby chasm Jan 17, 2024, 4:27 PM

#

you can predict either, but auc is very kind in terms of probabilities.

nocturne tusk Jan 18, 2024, 3:35 AM

#

lyric kayak I also tried the same with BERT, got worse score

How much score are you getting? BERT gives me 0.825

zinc nebula Jan 22, 2024, 2:56 PM

#

Hey, Does anyone know how to solve
CatBoostError: /src/catboost/catboost/libs/data/quantization.cpp:2416: All features are either constant or ignored.

valid marten Jan 22, 2024, 6:13 PM

#

Will the results be disclosed at midnight UTC sharp? Or will there be a period of gestation?

velvet vortex Jan 22, 2024, 6:48 PM

#

valid marten Will the results be disclosed at midnight UTC sharp? Or will there be a period o...

it is immediate at 23:59 UTC

#

but these will be prvisional results. Kaggle may (and probably will) remove some teams for cheating.

#

Removal happens in the next few days, it is not immediate.

digital dew Jan 23, 2024, 12:27 AM

#

The shakeups are insane

#

unkempt haven Jan 23, 2024, 5:15 AM

#

clueless why CatBoost overfits the least

#

anyone can explain why

ruby chasm Jan 23, 2024, 5:16 AM

#

digital dew

yessir! at least top3 is somewhat expected..
I am top 350 instead of top 100 because of bad selection.. i guess its a learning experience

unkempt haven Jan 23, 2024, 5:16 AM

#

like, if you keep everything constant, and just replace your CAT model with an LGB model, your score goes from 0.91 to 0.86

#

I haven't seen that much diff between CAT and LGB before

ruby chasm Jan 23, 2024, 5:17 AM

#

unkempt haven like, if you keep everything constant, and just replace your CAT model with an L...

Not sure why cat would overfit more than other boosters tho

unkempt haven Jan 23, 2024, 5:17 AM

#

We had one sub with only CAT and one sub with only LGB (with MNB and SGD) and the difference was massive

#

Even though the public score was almost the same...

lyric kayak Jan 23, 2024, 11:16 AM

#

gg everyone! see yall in the next LLM-ish competition!

lyric kayak Jan 23, 2024, 3:17 PM

#

btw https://github.com/ahans30/Binoculars

GitHub

GitHub - ahans30/Binoculars: Binoculars: Zero-Shot Detection of LLM...

Binoculars: Zero-Shot Detection of LLM-Generated Text - GitHub - ahans30/Binoculars: Binoculars: Zero-Shot Detection of LLM-Generated Text

mental finch Feb 20, 2024, 6:59 PM

#

Has anyone here done transfer learning with Google Gemini as the base model?
what can you tell me about the work?
would it be possible to share a link to the notebook??