#kaggle-llm-science-exam | Kaggle | Page 1

limpid sparrow Aug 7, 2023, 11:17 PM

#

Pretty narrow spread between the top 2 teams! We're approaching 1000 teams 😮 How's everyone team stacking up so far?

stoic zodiac Aug 11, 2023, 2:12 AM

#

https://twitter.com/full_stack_dl/status/1689724255775379456?t=I_AENQDMTj81QZo9nplAUw&s=19

stoic zodiac Aug 13, 2023, 3:24 AM

#

Oops when I posted ^^ I didn't realize the link didn't unfurl. Anyway, the event happened and the recording is now available here! https://www.youtube.com/watch?v=GaV_M2l4-N0

YouTube

The Full Stack

Livecoding: Getting Started with LLMs, by Jeremy Howard

A livecoding session from Charles working through a starter notebook for the "LLM Science Exam" Kaggle competition, written by Jeremy Howard.

Focus is on exploratory data analysis for and with LLMs, plus some synthetic data generation.

Original notebook: https://www.kaggle.com/code/jhoward/getting-started-with-llms

Edited notebook: https://ww...

▶ Play video

final barn Aug 19, 2023, 11:16 AM

#

stoic zodiac Oops when I posted ^^ I didn't realize the link didn't unfurl. Anyway, the event...

This is really helpful! thank_you

chilly radish Aug 19, 2023, 9:05 PM

#

Hey guys I'm gonna start working on this competition soon would love to Collab!

About me:
https://www.linkedin.com/in/pranshu-bahadur

timid holly Aug 20, 2023, 4:16 PM

#

Hi guys

#

Any one could use one sentence to summary what this competition do? Is it more focus on model and GPU laryer?

soft galleon Aug 20, 2023, 5:17 PM

#

I think it is about how to use LLM to produce a large amount of high-quality data😆

#

and fine-tuning by using large models

strange nimbus Aug 26, 2023, 2:53 PM

#

i think its about who can afford the most GPUs

dusk pendant Aug 27, 2023, 4:29 AM

#

Is there any limit to how powerful our local machines can be for training ?

strange nimbus Aug 27, 2023, 8:40 AM

#

i dont think so

slim shore Aug 28, 2023, 8:38 AM

#

Hey, Masters students in CS here, looking for a team, hit me up if you're up for it

full jackal Aug 29, 2023, 5:52 PM

#

How does one get a good understanding of why this is written this way.
I could figure out what each of these lines were doing, but why they are doing what they are is not clear yet.

class Perplexity(nn.Module):

    def __init__(self, reduce: bool = True):
        super().__init__()
        self.loss_fn = nn.CrossEntropyLoss()
        self.reduce = reduce

    def forward(self, logits, labels):
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()

        perplexity = []
        for i in range(labels.shape[0]):
            perplexity.append(self.loss_fn(shift_logits[i], shift_labels[i]))
        perplexity = torch.stack(perplexity, dim=0)
        
        if self.reduce:
            perplexity = torch.mean(perplexity)
        return perplexity

Tried reading https://thegradient.pub/understanding-evaluation-metrics-for-language-models/ but felt like I need something milder. 😅

The Gradient

Evaluation Metrics for Language Modeling

On different metrics for evaluating language models, the relationships among them, mathematical and empirical bounds for those metrics, and suggested best practices with regards to how to report them.

hollow canopy Aug 31, 2023, 7:46 PM

#

full jackal How does one get a good understanding of why this is written this way. I could f...

Essentially, it's a "Perplexity" class which measures how well a probability model predicts a sample. It's usually used for evalutating the performance of language models. The lower the perplexity means the model can predicts a sample with a higher accuracy.

hollow canopy Aug 31, 2023, 7:49 PM

#

hollow canopy Essentially, it's a "Perplexity" class which measures how well a probability mod...

After doing some very quick "research" I found a pretty easy to understand explanation / example of Perplexity. Here's the link: https://jamesmccaffrey.wordpress.com/2016/08/16/what-is-machine-learning-perplexity/#:~:text=In machine learning%2C the term,a measure of prediction error.

James D. McCaffrey

jamesdmccaffrey

What is Machine Learning Perplexity?

In machine learning, the term perplexity has three closely related meanings. Perplexity is a measure of how easy a probability distribution is to predict. Perplexity is a measure of how variable a …

upper aspen Sep 3, 2023, 4:41 AM

#

There is a video which explains perplexity here at timestamp 15:00
https://www.youtube.com/watch?v=ddCYORu41Xs&list=PL23FjyM69j92o_j5JFH9sNlbhCx4n0ZYh&index=1

strange nimbus Sep 3, 2023, 8:27 PM

#

there is a massive problem with perplexity in my opinion

#

namely when the answer is 'none of the above' or 'all of the above'

wraith marsh Sep 4, 2023, 7:11 PM

#

strange nimbus namely when the answer is 'none of the above' or 'all of the above'

do a simple ground truth value

strange nimbus Sep 4, 2023, 7:12 PM

#

how would that be done?

wraith marsh Sep 4, 2023, 7:12 PM

#

dot product of the semantic embeddings with a target value

#

hihger dot product = higher similarity to the target

#

like attention

strange nimbus Sep 4, 2023, 7:13 PM

#

have you implemented this?

wraith marsh Sep 4, 2023, 7:13 PM

#

also btw

#

no i but i think its used quite commonly

#

its called ROUGE score

#

i think

wraith marsh Sep 4, 2023, 7:57 PM

#

wraith marsh no i but i think its used quite commonly

its used for evaluation though and not as loss function

blazing hill Sep 7, 2023, 9:10 AM

#

Hello everyone.
I don't play around with kaggle for a few years now and so, in practice, I'm a newbie on this. I'm interested in this competition but I'm a bit lost on few small details. Hope someone can help clearing some dumb questions.

Since this is a offline competition we need to pip install directly the wheel. I get that. But I see some notebooks that installed the wheels saved in the folder "kaggle/inputs". How did they get there? Isn't the inputs folder read only?
How much disk space can we use? We are allow only to use the 20GB of the working directory?What aboout the input folder?
About the GPUs. Just to confirm. We can use up to 32GB of GPU (T4x2), right?

reef harness Sep 7, 2023, 3:17 PM

#

You can get the data/notebook output to the input folder. Find more info in this commen: https://www.kaggle.com/code/cdeotte/how-to-train-open-book-model-part-2/comments#2423056.
Yes, Input folder is read only but you can add any of the datasets from "Add Data".
I dont think there is any limit to adding data to input folder, only when you load to the notebook, it matters IMO.

How To Train Open Book Model - Part 2

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

mighty lion Sep 7, 2023, 4:44 PM

#

strange nimbus namely when the answer is 'none of the above' or 'all of the above'

How prevalent does this seem to be in the data ?
Idts I've come across any instances yet

wooden elm Sep 8, 2023, 4:41 AM

#

upper aspen There is a video which explains perplexity here at timestamp 15:00 https://www.y...

this is a really helpful set of starter videos. Thank you for sharing, Chris!

lean wing Sep 12, 2023, 12:37 PM

#

Some of us may get confused by the variety of the Bert models in Huggingface. I am definitely one of them.
Therefore I did some googling and researching and summarised my findings in this post.

Please feel free to give it a look. and if you find it helpful, please give it an upvote.🙂
https://www.kaggle.com/datasets/xhlulu/huggingface-bert/discussion/438733

Huggingface BERT

BERT models directly retrieved and updated from: https://huggingface.co/

strange nimbus Sep 12, 2023, 1:34 PM

#

does anyone know where / what the loss function used in transformers.DebertaV2ForMultipleChoice is?

mighty lion Sep 15, 2023, 7:19 PM

#

https://arxiv.org/abs/2309.07900

arXiv.org

Ambiguity-Aware In-Context Learning with Large Language Models

In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic sim...

#

Could this be applied to llm science exam

hasty tangle Sep 16, 2023, 2:39 PM

#

so i have a very noob question so forgive me... turning this task into a classification task is pretty simple, the categorical cross entropy loss is the go to choice and you get the answer directly and its pretty simple to change model heads (just add a linear layer). i've also seen folks fine tune models on a seq-to-seq task (like how a usual llm would be trained, (you give the question+next_token(answer|output_logits) +backward(),step and then question+answer_token_1 + next_token(answer|output_logits)...) +backward(),step(), how'd you actually get the answer here, i can imagine the next token being the answer token(A|B|C|D|E or yes|no) but lets say i have fine tuned my model for generating the answer correctly <as a list of newly generated tokens>, i still don't know how to automatically pick out the correct option. e.g question: ..., options:a..,b...,c...,d...e.., answer: says nothing about the option b but generates the option b exactly. so i guess to summarize, how does one use a seq-2-seq fine tuned model for actual inference

upper aspen Sep 16, 2023, 11:33 PM

#

strange nimbus does anyone know where / what the loss function used in transformers.DebertaV2Fo...

According to GitHub here https://github.com/huggingface/transformers/blob/1fa2d89a9bb98a15e9720190e07d272a42f03d28/src/transformers/models/bert/modeling_bert.py#L1620
Each of the five pairs of question plus choice are forward pass. And PyTorch saves the five logits. Then PyTorch reshapes the five logits and applies cross entropy which in PyTorch includes softmax loss

    loss = None
    if labels is not None:
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(reshaped_logits, labels)

upper aspen Sep 16, 2023, 11:36 PM

#

hasty tangle so i have a very noob question so forgive me... turning this task into a classif...

The trick to making GPT style (seq2seq) LLM to work is to finetune and/or prompt engineer the LLM so that it only outputs 1 token. We can either show LLM the question plus 5 answer choices and the model will output A,B,C,D,or E and we extract the probabilities from those tokens. Or we show the LLM pairs of question and 1 answer and the model will output True, False and we extract the probabilities from those tokens.

GPT style LLM take time to infer each token. Therefore if we try to output a 20 token sentence, that will take 20x longer than outputting one token. More info is here: https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/440908

hasty tangle Sep 17, 2023, 7:12 AM

#

upper aspen The trick to making GPT style (seq2seq) LLM to work is to finetune and/or prompt...

yeah, that makes sense.

upper aspen Sep 19, 2023, 2:26 AM

#

According to the website, this competition began on June 19th and ended 12 weeks later on Sep 8th. Is this competition still going on?

limpid sparrow Sep 19, 2023, 6:52 PM

#

Deleting this as we ask that promotion of events, blog posts, etc is posted in our resources channel instead: https://discord.com/channels/1101210829807956100/1130802867285008414

dusk pendant Sep 20, 2023, 8:52 AM

#

Can Platypus2-70B be used for this competition ?

hasty tangle Sep 20, 2023, 9:40 AM

#

Actually yes... Ivent explored it msyef but I read a discussion post where the OP used this or some other 70B model by loading it layer by layer (u can't load it all in the kaggle GPUs at the same time) ... Took 8hrs for submission but there were no OOM errors.

strange nimbus Sep 20, 2023, 12:54 PM

#

dusk pendant Can Platypus2-70B be used for this competition ?

I wouldnt ... But yes you can

dusk pendant Sep 21, 2023, 12:04 AM

#

@upper aspen let’s say I train a model V1, loaded from deberta-v3-large pretrained model, for 2 epochs, with DATASET 1

Then, I train a model V2, loaded from model V1 best checkpoint for another 2 epochs (like basically cloning the model) WITH THE SAME DATASET 1

Is that effectively training for 4 epochs or this is overfitting ?

#

I have seen a public notebook do that and the val loss and validation MAP@3 improve

dusk pendant Sep 21, 2023, 8:56 AM

#

strange nimbus I wouldnt ... But yes you can

There is a discussion post saying it doesn’t have commercial license. But from what I searched it has the same as LLAMA2 license. There’s no answer from the competition hosts for 5 days since the discussion was posted

hasty tangle Sep 21, 2023, 10:41 AM

#

My gpu quota on kaggle just ran out which is a bummer becuz I just fixed my memory and speed issues and was about to make a submission....

#

Has anyone tried kaggle TPUs? Does the transformers library work well with it... I guess I'll have to figure that one on my own XD

bronze burrow Sep 21, 2023, 10:49 AM

#

Topic: Bitsandbytes error

Hello everyone,

I'm trying to fine-tune Falcon 7B, but I'm stuck at the configuration part of QLORA bits and bytes.
Here is my code.

model_id = "tiiuae/falcon-7b"
import torch

bnb_config = BitsAndBytesConfig( # Used for initiating QLora which reduces memory of training size
load_in_8bit=True,
load_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)

model =AutoModelForCausalLM.from_pretrained( # Initiating the model
model_id,
device_map="auto",
offload_folder = "/content/drive/MyDrive/Fall 2023/Model building",
trust_remote_code=True,
quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
Here is the error. I've both the mentioned libraries installed, but it's still not working as the issues is still open on the bits and bytes GitHub page. Is there any solution or any other way to implement using QLora? This is my first time, so please be as detailed as possible.

https://preview.redd.it/363xaoicdjpb1.png?width=1342&format=png&auto=webp&s=18be466074e06a1156e358690bb037d62f4a194d

Thanks

upper aspen Sep 21, 2023, 10:04 PM

#

dusk pendant <@933380182159544361> let’s say I train a model V1, loaded from deberta-v3-large...

Yes, we can train for 2 epochs, save weights. Then load weights and train another 2 epochs. If we use cosine learning schedule each time, this is called cosine with restarts. (So this is the same as training 4 epochs cosine restarts with restart every 2 epochs). If we use constant learning rate each time. Then the result is the same as training with 4 epochs constant learning rate.

Furthermore, sometimes people train with more diversity of data and augmentations the first time. And then train with less diverse data (i.e. more like test data) the second time with less augmentations.

dusk pendant Sep 22, 2023, 12:02 AM

#

upper aspen Yes, we can train for 2 epochs, save weights. Then load weights and train anothe...

Thank you for the detailed reply!

#

When we specify learning_rate=2e-5 in the Training parameters with a cosine rate schedule what does this mean ?

#

Is 2e-5 the initial learning rate

dusk pendant Sep 22, 2023, 1:58 AM

#

In my best model my train loss is lower than the valid loss is that a bad thing ?

hasty tangle Sep 22, 2023, 12:57 PM

#

Has anyone used Rlhf or reinforcement learning based methods for this competition... I need to talk with someone who has used it in this competition...

upper aspen Sep 29, 2023, 4:37 PM

#

hasty tangle Has anyone used Rlhf or reinforcement learning based methods for this competitio...

In HuggingFace, this is called RewardTrainer. There is a Kaggle notebook here https://www.kaggle.com/code/datafan07/single-model-rewardtrainer-lora-llm

hasty tangle Sep 29, 2023, 4:45 PM

#

upper aspen In HuggingFace, this is called RewardTrainer. There is a Kaggle notebook here ht...

Alright. I'll check it out soon.. Thanks

strange nimbus Sep 29, 2023, 6:06 PM

#

how many models is everyone using?

#

is just over 0.9 good for a single deberta?

bronze flume Sep 30, 2023, 5:03 AM

#

It sounds good;

but I suspect it would depend on what you are using as validation set. And whether your training set contains questions from your validation set

wraith marsh Sep 30, 2023, 8:55 AM

#

strange nimbus is just over 0.9 good for a single deberta?

are you using it with an index or something (rag)

strange nimbus Sep 30, 2023, 8:58 AM

#

bronze flume It sounds good; but I suspect it would depend on what you are using as validat...

In lb

wheat bloom Oct 3, 2023, 7:30 AM

#

Hi, we are 23rd https://www.kaggle.com/xiezejian currently(having try too much ensemble so I believe there is still have room for us) and looking for a single person team to merge. PM me if you are interested or would like to discuss further.

Xie Zejian | Expert

Fintech

strange nimbus Oct 10, 2023, 5:59 PM

#

well done everyone! this was a fun competition

dusk pendant Oct 11, 2023, 2:12 AM

#

Congrats to all participants who got medals!

strange nimbus Oct 11, 2023, 7:09 AM

#

Thanks

fallen zephyr Oct 12, 2023, 7:50 AM

#

Hi, i have 50 clusters of numbers each cluster corresponds to a plume shape, i know the location of origin point, i want to translate each plume or cluster at same position line they superposed each other cause i want to eavlaute the average values of them. anyone knows how to implement this in python
i am trying this from yesterday, not able to implement it

lilac charm Oct 12, 2023, 1:20 PM

#

i would like to learn NLP and LLM from scratch, can anyone suggest me some resources?

ancient warren Oct 13, 2023, 10:06 PM

#

lilac charm i would like to learn NLP and LLM from scratch, can anyone suggest me some reso...

I think this notebook is a good start https://www.kaggle.com/code/crxxom/comprehensive-overview-on-nlp-for-beginners This goes more in depth with the basis of language models https://www.kaggle.com/code/tanulsingh077/deep-learning-for-nlp-zero-to-transformers-bert and maybe this video is good motivation and theory https://youtu.be/XfpMkf4rD6E?si=xwY1JsS0qG91Nf9s and this video after the others https://www.youtube.com/watch?v=kCc8FmEb1nY&t=484s goes over building something like a GPT

✅ Comprehensive Overview on NLP for Beginners 🥳

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

Deep Learning For NLP: Zero To Transformers & BERT

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

YouTube

Stanford Online

CS25 I Stanford Seminar - Transformers United 2023: Introduction to...

January 10, 2023
Introduction to Transformers
Andrej Karpathy: https://karpathy.ai/

Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are finding applications all over Deep Learning, be it computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GAN...

▶ Play video

YouTube

Andrej Karpathy

Let's build GPT: from scratch, in code, spelled out.

We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable...

▶ Play video

tacit coyote Oct 14, 2023, 12:17 AM

#

Any follow ups on this question? Any benchmarking against GPT4 or Claude V2 would be interesting #1130785937681547284 https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/446790

Kaggle - LLM Science Exam

Use LLMs to answer difficult science questions

lilac charm Oct 15, 2023, 6:05 AM

#

ancient warren I think this notebook is a good start https://www.kaggle.com/code/crxxom/compreh...

thanks for sharing, i really appreciate your help, have a great day

dusk pendant Oct 28, 2023, 4:29 AM

#

Lmao I still have no idea why my fallback on 0.825 XWinLM works (Public 0.909 Private 0.900) while the fallback on 0.870 XWinLM doesn't LOL. I keep getting submission CSV not found

#

It's literally falling back on the same model and the same template as simjeg (I referenced from there), I only changed the function

#

Maybe it was already touching the brink of the memory holyfuck

#

That probably should make a silver medal submission 😑

velvet granite Nov 2, 2023, 7:41 PM

#

lilac charm i would like to learn NLP and LLM from scratch, can anyone suggest me some reso...

coursera deeplearning.ai

teal geyser Nov 5, 2023, 2:37 AM

#

full jackal How does one get a good understanding of why this is written this way. I could f...

Your question is not clear to me, but I hope you've figured it out by now?

#

Essentially, I think this project wants us to build something like GPTZero - https://gptzero.me/. You can get loads of insights for this project if you read how their detection technology works - https://gptzero.me/technology

GPTZero

GPTZero | The Trusted AI Detector for ChatGPT, GPT-4, & More

Covered by >100 media outlets, GPTZero is the most advanced AI detector for ChatGPT, GPT-4, Bard. Check up to 50000 characters for AI plagiarism in seconds.

GPTZero

GPTZero | Technology

The World's #1 AI Content Detector with over 1 Million Users