#llm-detect-ai-generated-text

1 messages Β· Page 1 of 1 (latest)

wise pecan
#

hello

light smelt
#

Hi

unborn dawn
#

hehe

bright oriole
#

The competition rules state:
β€œTo the extent your Submission makes use of generally commercially available software not owned by you that you used to generate your submission, but that can be procured by the Competition Sponsor without undue expense, you do not grant the license in the preceding sentence to that software.”

Does this prohibit the (extensive) use of a (fine-tuned) closed-source foundation model, such as GPT-4, which are pay-per-use? Could I use such models for generating training data or for making predictions?

ember nymph
#

hello

#

Are we allowed to use pre - trained models like BERT, CalBERT etc?

sturdy kraken
#

hey guys beginner question here...can someone explain how submissions work in this competition?

ember nymph
#

thanks

bleak shuttle
#

so, to be able to submit, we need to make 2 notebooks, where the second one loads the trained model and tokenizer (if applicable) from the kaggle working space (or from hard drive?), and then infer from those on the test data?

#

if someone would have any pointers, much appreciated

#

or can I first download the model and tokenizer and then use those? (in my case distilbert-base-uncased-finetuned-sst-2-english from hf)

sturdy kraken
#

You gotta download them using internet on first (then download to local and upload as a dataset) then load from the custom dataset directory

#

Don't forget to turn internet off when submitting the notebook

bleak shuttle
#

ok thanks, will try that

turbid wraith
#

I need a team for this project can anyone invite me

steep roost
#

Hi, I'm have some experiences in ML/DL and interested in LLMs, but I don't have much experience in kaggle competitions (attended a few). Is there any one wants to team up with me for this project?

ember nymph
ember nymph
#

I'm looking for a team as well, I'd love to join up. I have some interesting text features to share, not based on word counts, or tf idf.

honest kettle
ember nymph
#

Can someone help me out understanding the difference between the training data, and the data used during the submission scoring?

#

I'm using daigt_external_dataset.csv and train_essays_RDizzl3_seven_v1.csv as training data.

ember nymph
#

Why did the competition organizers, not supply a representative test set?

#

Is this because they don't have enough human written essays, and generated ChatGPT essays?

#

I find it really silly they can't supply 50 each as test.csv and supply a much bigger test.csv during the submission scoring.

#

I find it equally silly the process by which this "test.csv used during scoring" is created, is not public knowledge.

sturdy kraken
#

Hey guys I'm thinking of using langchain/hf to make a dayset of llm essays (using the prompts from train_dataset) does anyone want to team up?

onyx pulsar
#

Anyone willing to share some ideas on how to improve?

#

My best code has 0.91 auc on the leaderboard, but it's really hard to go up from here.

#

Anyone willing to do a code review for feedback? πŸ™‚

#

On a related note, can I see what code others used? Or is this hidden?

lean sluice
#

hi

normal topaz
#

Hi, I'm looking for a teammate. I have decent experience in NLP and training machine learning/neutral network based models.
Pls DM πŸ™‚

woeful wasp
#

hey

#

I am new to kaggle competitions

#

I trained my model and saved checkpoint to /kaggle/working directory, but I cannot load it during test time

#

isn't /kaggle/working folder preserved during testing?

mint phoenix
#

Should be fine to load it

#

I am also saving it and then loaded it again

#

Works fine for me

mint phoenix
#

I have made a guide on how to train a tokenizer from scratch

woeful wasp
lofty mirage
#

I'm currently at place 6 but imo haven't done something special

#

Anyone who has used something else as tf-idf

#

With a decent score?

ember nymph
#

Has anyone been using ChatGPT APIs?

acoustic flame
sturdy kraken
sturdy kraken
lofty mirage
#

Interesting notebook, from all of the data sources you use. Are there any that are no texts related to the suggested 7 prompts?

lofty mirage
#

Tf-idf and the war of available memory.

sturdy kraken
#

Yeah try the one generated using PaLM

#

You can also try to generate your own using llama cpp

lofty mirage
#

Owh nice, yes I'm just scared to lose a lot of positions by focusing on those 7 prompts.

lofty mirage
#

I used all of these datasets, made tfidf features but the score was much worse than what you did.

sturdy kraken
#

Using Retention?

sturdy kraken
lofty mirage
#

It's impossible to create all the tfidf features using the same logic as found in the top Public notebooks. So I shrunk the data using truncated SVD, this enabled me to use all available data. But the score was something in the range of 0.715

sturdy kraken
#

Yeah

#

Maybe try to fix spelling errors before using tfidf?

lofty mirage
#

Would that make such a big difference, but I could have a look into it.

#

Since I doubt that would increase my score from 0.7 ish to 0.9ish I didn't go further with that

heady terrace
#

Hi everyone, since I found that many people saying that the lead board is highly over fitted does any one have a good idea how to design a good validation set for this competition ?

ember nymph
#

hey stupid question everyone but how do you access the hidden test file in the submission (python) notebook?

#

I tried
test_samples = pd.read_csv("../input/test-essays/test_essays.csv",sep=',')
but it failed

woeful wasp
ember nymph
#

and for people who are using ngrams in the (3,5) range with the big datasets (> 40k rows) how are you dealing with the 30GB RAM limit?

lofty mirage
#

I suggest having a good look at the scikit-learn API, there are some interesting parameters to tune.

fleet widget
#

hey beginner question here...can someone explain how submissions work in this competition?

I am trying to submit it always failsπŸ˜•

#

I m doing by loading model and tokanizer in kagle input directory. During submission my notebook runs successfully but scoring fails.... Exception error comes

Please suggest me a solution someone

lofty mirage
#

Why is it failing, what is the error message?

fleet widget
fleet widget
lofty mirage
#

Click on it and scroll down until you find the error.

fleet widget
#

I checked log too

#

πŸ˜• how u submitted?? Give suggestions

lofty mirage
#

One of the public notebooks had an if statement that looks at the length of the test file and therefore executes different code in the submission.

#

Did you copy a notebook?

fleet widget
lofty mirage
#

I do not know since I do not have your code. If you publish it on Kaggle I could have a look since private sharing of code is not allowed.

fleet widget
ember nymph
#

by the way, you're using windows, you do realise you can send images of your screen by pressing the print screen button rather than sending a photo of your computer right

#

like that

fleet widget
#

During submission i need to make another notebook ????? in which no code for training will be there . is it like this????

fleet widget
#

πŸ™„πŸ˜•???

fleet widget
fleet widget
fleet widget
ember nymph
#

it says it ran successfully, but theres no score. what happened when you submitted

fleet widget
#

exception error

fleet widget
ember nymph
#

what happens when you run it on your own test data

fleet widget
#

It gives me the output as wanted

fleet widget
#

It's not submitting

#

What to do?

fleet widget
#

opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["remove_papermill_header.RemovePapermillHeader"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
29.9s 9 warn(
29.9s 10 [NbConvertApp] WARNING | Config option kernel_spec_manager_class not recognized by NbConvertApp.
29.9s 11 [NbConvertApp] Converting notebook notebook.ipynb to notebook
30.3s 12 [NbConvertApp] Writing 10481 bytes to notebook.ipynb
31.9s 13 /opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["nbconvert.preprocessors.ExtractOutputPreprocessor"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
31.9s 14 warn(
31.9s 15 [NbConvertApp] WARNING | Config option kernel_spec_manager_class not recognized by NbConvertApp.
32.0s 16 [NbConvertApp] Converting notebook notebook.ipynb to html
32.9s 17 [NbConvertApp] Writing 298426 bytes to results.html

in log this is coming

my notebook ran successfully but after that submissin failed~~!!!!

any solution from experts??? or all are just noob???

#

notebook ran successfully after that error in submission.

sturdy kraken
#

What's the error message? (click the ⚠️)

sturdy kraken
fleet widget
#

notebook runs fine after that during submision it happens

sturdy kraken
#

It's either an error in the note book (check the notebook not the logs) or it's an error with the submission part of your code (try index=False when you do to_csv)

gaunt fractal
#

Is there an explanation why the test_essays.csv file has only 3 examples, and why those examples are basically gibberish? I couldn't find an explanation on the competition page.

indigo cedar
#

Hello, i hope someone can help me, I can not submit on the competition, my notebook works fine, my model and tokenizer are well loaded but i have constantly an error.

#

Please, help me ^^ what have i done wrong ?

indigo cedar
#

Anyone ? *

ruby chasm
#

Is this channel also meant for finding teams? Just started in the challenge, bringing some transformer and machine learning experience, as well as a 4090 to the table. Got 3 more weeks of vacation and looking for motivated and/or experienced team partners to exchange ideas πŸ™‚ Feel free to reach out

lofty mirage
#

4090 🫨

lofty mirage
ruby chasm
# lofty mirage 4090 🫨

Eversthing I trained so far for the competition didnt go above 14GB RAM, so it hardly matters so far. But maybe some team mate has some ideas how to put it to use πŸ˜‰

ruby chasm
# indigo cedar Anyone ? *
  1. make sure the path is correct (you can check for example with os.listdir). For example, for me the path was mymodel/mymodel
  2. Make sure you have the model offline (usable with Internet = Off). Its probably best to train it outside your submission Notebook (because you need Internet to first use it), save it locally on your device and Upload it as Dataset on kaggle. Thats how I am doing it and never had Problems
  3. Make sure the model folder includes both the tokenizer and the model.
indigo cedar
#

Thank you for your advices, I will do it locally first. Thank you

ember nymph
ocean flame
#

why is nobody else using deberta? I got to top 25 using the existing deberta model + like 10 more lines of code

#

Im fairly confident I can go about 0.05 further by just retraining it as well

ember nymph
#

i think people are fine tuning the huggingface pretrained models for classification

lofty mirage
#

What is your username?

ocean flame
#

^

lofty mirage
#

I see disqualified at school

#

0.963

#

Tomorrow I'll have a look into Deberta, I abondoned the idea because I did not see any good notebooks using it.

#

So instead I optimized my own pipeline.

#

I was sick the last couple of days but nobody caught me πŸ™‚

ruby chasm
ruby chasm
# lofty mirage 0.963

again, what a weird competition where you are top 1k at .95 and top 35 at .963 πŸ˜„

lofty mirage
#

I'm first in the competition

ruby chasm
#

I thought you mean the deberta score is .963

#

Yea they are .964 - my point was just that .964 is top 21 and .960 is top 777

velvet vortex
ocean flame
#

oh i completely forgot i was training that thanks for sending me the notification

#

... and my ssh key seems to have been deleted ...

fierce notch
ocean flame
#

Linguistic Ninjas

fierce notch
#

πŸ‘

fierce notch
digital dew
late field
#

I've tried catboost on the normal CPU cores. Performance was terrible. An earlier poster got PyCaret to work, but I haven't been able to install the library in the notebook

distant salmon
#

Hey guys I'm new and lets see if I got this correct so the simplest way of explaining this task is to say that we will build a model that inputs a text(text) and outputs a label(generated). right ?

elfin pulsar
#

Duhhh I m literally irritated with this OOM error while running catboost on GPU

#

πŸ˜‘πŸ˜‘πŸ˜‘πŸ˜‘

distant salmon
velvet vortex
wraith cypress
#

I keep getting an error that my submission is throwing an exception, I feel like my solution is the most simplest there is, just hardcoded values for each line of test_essays.csv; I'd appreciate any help. Would I be ok posting 7 lines of code?

velvet vortex
ruby chasm
# wraith cypress http://nopaste.net/w8eyFlUh0i

Also, "for x in test_essays" is not iterating over rows (which is done by: "for idx, row in test_essays.iterrows()"), but over the columns. So you will always create 3 values, since the test essay file has 3 columns. Coincidentally, this works for the fake test data which we can access, because it also has 3 rows.
TLDR; your code has a bug - on submission you try to fill a column with ~4.5k rows with a list of 3 values

lyric kayak
#

Hi, just entered this competition.

Seems like the best public results (disregarding efficiency) are achieved by training a few classical ML models on top of TF-IDF encodings of tokens. Not BERT, or any other deep learning model. Is that right? It doesnt seem right.

I would expect a pre-trained and fine-tuned BERT or any other decoder model (llama, mistral, etc.) would perform much better than training an ensemble of classical models on top of TF-IDF.

ruby chasm
# lyric kayak Hi, just entered this competition. Seems like the best public results (disregar...

this has been part of many discussion so far (even a couple messages earlier in here πŸ™‚ )

  1. decoder models are not fit for classification tasks, at least not to the degrees of encoder-only transformers or classic ML classifiers
  2. the competition used A LOT of LLMs - but only for generating data. we are at a point where we went from a few human-generated training rows to over 60k AI rows and over 25k human rows (congrats to the community for that)
  3. the main question: why does classic ML outperform transformers? no real clue. on paper (at least in my opinion), the transformer embeddings should outperform tf-idf. However, it is worth noting that the used tf-idf embeddings are based on n-grams (3-grams to 5-grams), which means we are closer to positional meaning than pure word-based tf-idf. I strongly believe the reason to be that the n-grams tell more about the GenAI behavior (as in which texts are generated) than the pre-trained transformer embeddings + attention etc.
    3b) ensembles are in most situations increasing performance - if you are using transformers, maybe use a transformer ensemble too (e.g., finetune BERT + finetune DeBERTa + finetune RoBERTa + finetune Electra + maybe finetune an existing finetuned model for essays / scientific articles etc. )

You are obviously more than welcome to prove all current top leaderboard submissions wrong with the use of transformers - but for me even a 2 layer NN has been outperforming a fine-tuned deberta3 large by almost .12

lyric kayak
#

@ruby chasm Thanks for all the info! Super helpful!

lyric kayak
ruby chasm
#

would love to hear more about Dom's approach but I dont think we will πŸ˜‰

ocean flame
#

That is 0.833

#

I have since changed approach quite heavily

lyric kayak
#

Thanks for the info @ocean flame

lyric kayak
lyric kayak
#

What is the meaning of the bool column "RDizzl3_seven" in the daigt-v2 train dataset?

wraith cypress
ruby chasm
lyric kayak
#

@ruby chasm Thanks! Still am a bit confused. So if a row in the daigt-v2 train dataset has RDizzl3_seven=True it means that it came from this "7 prompts" dataset? Meaning the 7 prompts dataset is a subset of the daigt-v2 train dataset

#

correct?

ruby chasm
lyric kayak
#

gotcha! Thanks!!

lyric kayak
#

Has anyone experimented in max_features for TfidfVectorizer? Does limiting the number of features impact score significantly?

digital dew
ruby chasm
valid marten
#

Wow .982 in the leaderboard 😯 @noble tendon found some secret sauce indeed!!

lofty mirage
#

Damn

lyric kayak
ruby chasm
#

How are you guys balancing the OOM error? Its my first time turning towards the .961 baseline and whenever i add more data than the daigt v2 dataset im going OOM

velvet vortex
lofty mirage
#

With multiple I mean more than 2 btw.

lyric kayak
ember nymph
#

I am having problems to submit my notebooks. The submission remains as "scoring" for several hours and then it does not submit. Does anyone has the same problem?

late portal
#

just want to share a funny AI-generated essay on "phones and driving"

#

There are many people out in this world that get along just fine and there are some that are totally unhappy and unlucky in love, well that is not the case in this story because I am not unhappy at all in love. I have been married to my high school sweetheart for ten years. My love and I have four children that are all above the age of six. I am at home with the kids while my wife is working. However, not all is peaceful at the Green household. Someone always has an attitude and is mad at another person and that's not me but the kids. There are so many things that they get on each other's nerves and love to pick on each other. When they get on each other's nerves, they turn on each other and just can't seem to get over the things that they say.

One day in the afternoon, I had to take my oldest son on his annual Physical for his football team. The physical was at four o'clock in the afternoon. The quarterback is my son and he has to pass this physical or he will not be able to play football this season. When I dropped my oldest son off to be checked out by the doctor, I went to pick up my wife from work and then we were going to go on to the doctor’s office together. My daughter and daughter-in-law were working for the day so it was going to be a long drive. The doctor's office was about ten minutes away from where we worked. When I got home, I told my wife and oldest son about the doctor’s appointment and told them to get ready and we were heading out
.

#

I drove for about fifteen minutes trying to get us to the doctor's office. On the way there, my cell phone started to ring. When I looked to see who it was, it was my wife who I just said to meet me at the doctor’s office. I stopped in the middle of the road because I was in my mind and not looking at the road. When I turned around, I notice that I was in the middle of the road and there was a SUV right behind me. If I would have looked up a few more seconds, I would have been rammed and possibly killed. I would have killed someone with my own stupidity. To this day, I look back to that day and wonder if my life could have changed by using my cell phone less.

As you can see, there are many reasons why one should not drive while talking on the phone. You can run into things and get distracted and crash into something or somebody. Also, it is against the law in Iowa to talk on the phone and text while driving. For these reasons, people should not be allowed to talk on their phone while driving.

indigo cedar
#

^^ it's educational

lofty mirage
#

When trying to incorporate public notebooks into the pipeline

#

I tried most things

#

Omitting n_jobs

#

Deleting all variables

#

GC.collect()

ruby chasm
#

brutal

#

have you experimented with max_features in the vectorizer?

lyric kayak
#

Can someone enlighten me about some psychology behind publishing solutions?
I understand why competitors in the top 20 or so do not publish their solutions. They are clearly there to win.

But, considering the competitors with a score of 0.963. Would they not benefit more from publishing their solution and getting a ton of upvotes than trying to actually win the competition?

#

^ hope I dont sound demeaning or something btw. Totally cool to not publish solutions haha. Just curious behind the thought process of those with 0.963.

astral drum
#

How about you achieve that score and see what would you then?

lofty mirage
#

We'll if you have an unique idea and are scoring 0.963 and believe you have room of improvement I wouldn't share the code.

#

Besides that it is unknown wether these people have used public notebooks as well in their pipeline

#

So there is still a possibility they're in for the gold.

ruby chasm
#

@lofty mirage i just made a poston oom which might help your problems

lyric kayak
#

@lofty mirage makes sense thanks! I guess it depends on the situation. Hevent been in it myself yet haha

lofty mirage
#

I see thank you for the tips.

lyric kayak
#

Does anyone have insight as to how the VotingClassifier's estimators were figured out in the .961 and .962 kernels?

Is it just trial and error or does the author have a sophisticated hyperparameter optimization pipeline?

astral drum
#

probably both

velvet vortex
unkempt haven
#

and there’s only 3 weeks left in the comp

ruby chasm
lyric kayak
ruby chasm
ruby chasm
lyric kayak
ruby chasm
#

Yep, unlimited atm

lyric kayak
#

Strange to get OOM when running stuff that already scored

#

Maybe unlimited is just barely on the edge of max memory allotment

ruby chasm
#

its very frustrating. since it also takes forever..

#

thanks for the quick reply tho πŸ™‚

lyric kayak
#

Sure, thanks to u too haha

digital dew
ruby chasm
#

which is wild, because if I remove boost from my planned pipeline, it takes less than 30minutes

digital dew
#

catboost takes too long to train but I can't even use gpu without getting an oom :(
has anyone successfully used catboost with gpu?

ruby chasm
#

trying it rn πŸ™‚

ruby chasm
#

according to that calculation, the ~9200 samples have ~8M 3-5ngrams (features) and we have training set of ~45k rows. That means 432_000_000_000 Bytes. Thats approx. 402 GB RAM

ruby chasm
lyric kayak
#

@ruby chasm 10k feats results in 0.901

ruby chasm
#

on the whole ensemble or just boost?

lyric kayak
#

oh wait nvm, I also removed MultinomialNB

ruby chasm
#

i did a 42k multinomial that scored .909

lyric kayak
#

just multinomial?

#

mine was whole ensemble except for multinomial

ruby chasm
#

yea, just multinomial

lyric kayak
#

any tips for local validation? I gotta stop using kaggle submissions for validation

#

was thinking just to hold out the official training data prompts for validation set:

val_df = train[train['prompt_name'].isin(["Car-free cities", "Does the electoral college work?"])]
train_df = train[~train['prompt_name'].isin(["Car-free cities", "Does the electoral college work?"])]
ruby chasm
#

we are fairly certain those prompts are not in the test set. so i cant tell you how good that idea works

#

i have been just doing normal CV (which is not .999 for me atm) - selecting folds in a way to get close to 9k test samples... this probably aint a good / the best way, but i dont have a good validation set

lyric kayak
#

My intution says that at least those prompts should be within the same "distribution" as the test set (assuming they are a subset of the original test dataset). That being said, they are just too easy haha. My local validation and LB are way off

ruby chasm
#

How is anybody able to submit on catboost? Im even going OOM on it with max_features=1M

velvet vortex
#

If you have N samples in your training data and M features then catboost will use NxMs4 bits (I assume it uses FP32). Then compare this with your memory to get the number of features you can afford.

ruby chasm
#

@velvet vortex am i correct to assume i have the same Hardware limitations during submission? I even copied the model from a notebook that was already scored on LB, but when I try it locally it keeps on failing. this is logically beyond my understanding. I have tried it with less features AND less data than the Notebook I copied it from. only reasoning for me would be having more than 30GB RAM during submission

velvet vortex
#

Are you executing it in the same environment?

ruby chasm
# velvet vortex Are you executing it in the same environment?

I am copying the notebook 1:1 but wanted to test it inside editor first, where it crashes. I never submitted it since I dont want to waste submissions.
What I did: fake a test set, reduce train size (smaller than in the notebook i copied from), set max_features. I cant see how this would go oom

lyric kayak
#

Are you guys referring to max num features for tf-idf? Or is there a parameter for catboost that limits the number of features it makes?

ruby chasm
#

nope max feature of tfidf

lyric kayak
#

Im surprised it makes 1M+ features

#

did you check that?

#

we should run it on the train set and see that it even makes that many. Maybe your max is greater than what tfidf needs

ruby chasm
#

when I create a fake test set with ~9k subsamples from the train set, in created ~7M tfidf features (i.e., my training matrix is 40000x7300000). I am not sure how many featuers cat creates, can never check, its always oom

#

on train set i think it was like 60-80M features from tfidf

lyric kayak
#

oh damn i stand corrected haha

#

catboost has this param: used_ram_limit=None

#

interesting 🧐

#

also: per_float_feature_quantization=None

ruby chasm
#

used ram limit didnt help

ruby chasm
lyric kayak
#

Another interesting direction: KBinsDiscretizer
https://scikit-learn.org/stable/auto_examples/cluster/plot_face_compress.html

The example they show doesnt actually use less memory, but at the end they do say that it would use less memory if it started out as float64 (their starting point is int8)

#

not sure if sklearn pipeline would take advantage of it anyway. They could cast to float64 immediately upon training

ruby chasm
#

interesting, thanks for sharing

lyric kayak
#

also I looked at the docs for catboost per_float_feature_quantization. have absolutely no idea what the hell it does

#

seems like you need to understand catboost

lyric kayak
#

Oh wait I understand something
border_count: The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default: 254 on cpu, 128 on gpu

#

per_float_feature_quantization changes the amount of "borders" which is splits for a feature

#

Basically we would need to make the per_float_feature_quantization less than 254 for every feature

#

and we might save some rams

ruby chasm
#

i despise catboost in this competition

lyric kayak
#

something like: per_float_feature_quantization=[f'{i}:border_count=50' for i in range(num_feats)]

magic peak
#

I can see texts talking about scores of more than 0.95, I was just wondering if these guys have generated artificial data via LLMs, as I can see only 3 data points for these class in the original data

opal sequoia
ruby chasm
elfin pulsar
#

Woaaha, Now I m getting 0
O.960 without catboost also

#

And it just takes 2 hours to run

fierce notch
#

cool

elfin pulsar
#

Okay so I have one question

#

So I collected my own data as well

#

Will it be good to fit my vectorizer on my own data

#

Or just fit it on publically available data of drcat

lyric kayak
elfin pulsar
#

After the competition

#

πŸ˜‚πŸ˜‚πŸ˜‚

lyric kayak
#

fair enough haha

elfin pulsar
ruby chasm
opal sequoia
# ruby chasm this has been part of many discussion so far (even a couple messages earlier in ...

Hi, I'm a newbie and I just wanted to give some of my thoughts about why classic ML performs so well, or even better than LLM. If there's any point that is wrong, please correct me πŸ™‚

About point 3 from Valentin's discussion "on paper (at least in my opinion), the transformer embeddings should outperform tf-idf", I think we should clarify which task do transformer embeddings outperform tf-idf. I think if the task is language modeling, then yes, transformer embeddings should be better because they are specifically trained to learn human knowledge, while tf-idf are just probability. But I don't think this should hold for the task of LLM-generated text detection. In my opinion, the reason why classic ML performs better than LLM models is that because of the tf-idf features that captures the WRITING STYLES, which I believe is what sets LLM-generated text apart from student essays. ChatGPT will likely give an essay that sounds similar to an article on the news, which can sound more formal, with rich vocabularies than student essays. And this can be captured efficiently by tf-idf embeddings, while LLM embeddings will be affected by its human knowledge, which is not the focus of this task, and will make LLM perform worse (unless there are much more data as the solution of BERT with 500k augmented data from https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/465882).

Another point I want to mention is that I think using LLM to detect LLM-generated text, in the long run, might not be a sustainable solution, as LLM are trained to generate human like text. It's just a matter of time for LLM to be able to generate total-human-like text, and eventually LLM-based detection solution will not be able to do detection

elfin pulsar
#

Guys

#

Is there a way to load large pickle files

#

I dumped it from from joblib

#

It's size is 5.5 gb

#

But when I load it on kaggle my notebook crashes

ocean flame
ember nymph
# ocean flame Im fairly confident I can go about 0.05 further by just retraining it as well

heya how's it going with deberta? I have been using distilbert and deberta-v3-small (I don't have the resources for deberta-v3-base or deberta-v3-large) as well as my own hand rolled 2 or 3 layer LSTM/FF NN with attention and I'm getting around 0.8 for distilbert and my hand rolled NN and less than 0.8 for deberta, I trained deberta and distilbert for 1 or 2 epochs with lr in the 1e-6 to 1e-5 range

ember nymph
# ocean flame I think tf idf is just highly over fitted

from what i've been doing the LLM/NN approach are outperforming tfidf on my own train/test split from the train_v3_drcat_02.csv that's publicly available on kaggle, but tfidf outperforms LLM/NN on the competition test data

lyric kayak
#

arbitraty text statistics (potentially as a result of unfortunate data collection) > text semantics & intricacies

ruby chasm
#

i still dont think tfidf is overfitting in the context of the Kaggle use case, If you define "new data" as the private LB.

But I also believe that the .963 Transformer solution group will climb some Ranks on private LB. Probably top 5-10

ember nymph
#

my best performing submission so far is tfidf + multinomial naive bayes (0.918), using distilbert I get like 0.8 something

#

but i get 0.98 something ROC AUC with distilbert on my own 80/20 train test split (use 3 of the prompts for validation and 12 for training)

#

if i can't get the NN approach to work in the next few days I'll just focus on getting higher with tfidf using ensemble methods

ruby chasm
#

I will also give it another try soon πŸ™‚

azure pagoda
#

damn..whats with the dislikes?..

elfin pulsar
#

I want to ask one thing

#

How much time does multinomial nb takes to submit

ember nymph
#

i don't know what the prediction probabilities are for the competition test data but based on the ROC AUC it should be close to 1 or 0 for most of the data and closer to .5 for the misclassified

elfin pulsar
#

Dude why multinomial nb gives 0.94

#

Just in 20 minutes run

#

πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚

ruby chasm
ruby chasm
ruby chasm
digital dew
#

I just found something weird
Just split my train set into two (test size is 9000)
To replicate the submission environment. (The code works properly without oom when submit)
But for some reason I get TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. (basically OOM) when running my code (not submitting)

#

I don't really know where... but I remember that someone said how when we submit the model gets ran in both the public and private test sets?

digital dew
#

it does perfectly work when i set the test size to 4500

Guess the competition does only run the subset for public scoring?

azure pagoda
digital dew
#

hmm
then I guess my code was wrong

ember nymph
#

i've also been getting a weird problem where I edit the notebook and submit several times but the score is the same for all the submissions after editing

#

like when i look at the notebooks for the submissions they are clearly different versions but they all have the same score

lyric kayak
#

^ normalizing the embeddings similar to tfidf features improved the score a bit. But I had to remove Catboost due to out of time error, so it doesnt beat baseline

#

I also tried the same with BERT, got worse score

ember nymph
#

yeah with catboost you can't do (3,5) ngrams if you train the vectorizr on the full training data. I saw some notebooks where they train the vectorizer on the hidden test data instead

uneven vine
#

Someone please clarify, do we have to predict labels like(0 or 1) or the probabilities like (0.8, 0.4, etc)?

ruby chasm
#

you can predict either, but auc is very kind in terms of probabilities.

nocturne tusk
zinc nebula
#

Hey, Does anyone know how to solve
CatBoostError: /src/catboost/catboost/libs/data/quantization.cpp:2416: All features are either constant or ignored.

valid marten
#

Will the results be disclosed at midnight UTC sharp? Or will there be a period of gestation?

velvet vortex
#

but these will be prvisional results. Kaggle may (and probably will) remove some teams for cheating.

#

Removal happens in the next few days, it is not immediate.

digital dew
#

The shakeups are insane

unkempt haven
#

clueless why CatBoost overfits the least

#

anyone can explain why

ruby chasm
# digital dew

yessir! at least top3 is somewhat expected..
I am top 350 instead of top 100 because of bad selection.. i guess its a learning experience

unkempt haven
#

like, if you keep everything constant, and just replace your CAT model with an LGB model, your score goes from 0.91 to 0.86

#

I haven't seen that much diff between CAT and LGB before

ruby chasm
unkempt haven
#

We had one sub with only CAT and one sub with only LGB (with MNB and SGD) and the difference was massive

#

Even though the public score was almost the same...

lyric kayak
#

gg everyone! see yall in the next LLM-ish competition!

lyric kayak
mental finch
#

Has anyone here done transfer learning with Google Gemini as the base model?
what can you tell me about the work?
would it be possible to share a link to the notebook??