#arc-prize-2024

1 messages · Page 1 of 1 (latest)

verbal mulch
#

This is such a cool problem. It's even pretty fun to do by hand!

bleak basin
#

Yeeees!! Really cool problem. If anyone wants to work together on the competition!

clear yarrow
brittle oasis
#

Hey everyone! making AGI on a single GPU for $1 million. 😂 love it! Lets go... 😎 ❤️ 🚀

brittle oasis
#

...very smart test. Bigtech LLMs will eventually bust ARC-AGI once they enough scale (pretrained on enough of their data). But then you could rebase the benchmark on totally different paradigm (but similar core knowledge) and they would get stuck again. Enforcing FOSS and low specs for ARC-AGI means a solution has to be real intelligence (unless its some BS domain-specific hack solution.. hope the incentive structures stipulate the generalizable solution as that's key to the benchmark integrity IMO). good luck everyone! 🙏 🙌

lean muralBOT
#
gregkamradt has been warned

Reason: Posted an invite

trail garnet
#

Hey team! Greg from the ARC Prize team here

We are excited to have you interested in this competition.

If you haven’t already, I highly suggest checking out our launch blog post [1] and competition trailer [2]

We have a bunch of resources for you to get started

Our #1 goal is to make sure you have the resources and tools you need to get started. Please let us know if you have any questions.

Feel free to reach out to me directly or team@arcprize.org

See you on the leaderboard!

[1] https://arcprize.org/blog/launch
[2] https://www.youtube.com/watch?v=2avWAHXUXXs

spare frost
#

how big is the training set in this competition?

verbal mulch
#

Depends on what you mean. Usually between 2-5 examples.

#

You have to build something that can learn from that few

verbal mulch
#

But I believe it is 400 challenges in the training set.

cyan cipher
#

I had a dream about this competition last night even though I'm not even taking part in it 🙈

spice drum
#

Hi I am new here and not sure if this is the right place to ask questions about ARC Prize.

#

I am wondering if I could just use regular C to enter this competition?

#

I remember back when Sudoku was new and a computer magazine held a competition where participants submitted their algorithm solution in C to solve any Sudoku puzzles. I had hoped that the ARC Prize challenge could be the same way.

lean muralBOT
#
shure9200 has been warned

Reason: Posted an invite

lone axle
#

Has anyone managed to run a 70b model with a reasonable runtime?

#

TPU or GPU

verbal mulch
#

@spice drum Yes, it's technically possible, but the final solution is a Python notebook. So you'd have to call your C functionality from Python in a wrapper.

#

I'm struggling with the same thing. I'd prefer using Julia for this, but I don't feel like the hassle of mixing languages.

willow arch
#

Can you please help me. What am I doing wrong?

verbal mulch
#

Yeah you have to read in their file. You'll never see it.

#
    test_challenges = load_json('/kaggle/input/arc-prize-2024/arc-agi_test_challenges.json')
    test_predictions = make_predictions_advanced(model, test_challenges)
    submission = {}
    for task_id in test_predictions.keys():
        submission[task_id] = test_predictions[task_id]

    # Save the submission
    output_json_path = '/kaggle/working/submission.json'
    with open(output_json_path, 'w') as file:
        json.dump(submission, file)
#

You'll have to fill in the blanks for how you attack the problem, but this is what a submission will basically look like

#

@willow arch

willow arch
#

I did simple working code based on @verbal mulch proposal:

import json

def make_predictions_advanced(model, data):
    output = dict()
    for task_id, task_data in data.items():
        final_answer = []
        for j, question in enumerate(task_data['test']):
            answer = [[0, 0], [0, 0]]
            answer_double = {
                "attempt_1": answer,
                "attempt_2": answer,
            }
            final_answer.append(answer_double)
        output[str(task_id)] = final_answer.copy()
    return output
    

model = None
test_challenges = json.load(open('/kaggle/input/arc-prize-2024/arc-agi_test_challenges.json'))
test_predictions = make_predictions_advanced(model, test_challenges)
submission = {}
for task_id in test_predictions.keys():
    submission[task_id] = test_predictions[task_id]

# Save the submission
output_json_path = '/kaggle/working/submission.json'
with open(output_json_path, 'w') as file:
    json.dump(submission, file)
#

I still have the same problem...

#

Also I tried to do fork of working code from LB (18 points solution). And I also can't submit even working fork.

craggy orbit
#

Submission only works when you click it from inside the notebook , the notebook will need to run .

willow arch
#

It was old style... Now it must be submitted from edit page. I found thank you.

grand herald
#

Hey, does anyone have some starter code that I could modify (in Python)? It doesn't need to produce any good results; just an output that can be submitted to the leaderboard. Maybe even visualise the data too? Sorry, just new to all this and want to give it a go 🙂

steady forge
grand herald
fading acorn
#

Quick question: Are we allowed to use the validation dataset as part of the training data? Because 400 samples is ridiculously small and having another 400 would be very useful. After all, the only metric that really matters is performance on the hidden set.

verbal mulch
#

Unofficially, yes.

#

They can't stop you. But they do say not to.

#

You're supposed to learn based on the examples, not memorize everything. The best solution would do no training on any of it. It would see the example, learn from it, and do it.

narrow zenith
verbal mulch
narrow zenith
proven oak
#

@trail garnet
I found this part of rules a bit confusing;
are we actually allowed to use publicly available models like llama or gemma, or not?
is finetuning of such models eligible for the prize?

trail garnet
mild raft
steady forge
#

how to submit? lol

#

are there any code examples where they show correct format of submitting?

steady forge
#

why submission takes so long?

#

yay! I got 0.0

grand herald
#

Good job @steady forge, good to hear you got it working 🙂

mild raft
steady forge
#

or you create new file and submit?

#

I suggest you take the sample_submission, and replace the attempts there with yours

steady forge
mild raft
steady forge
#

ok wait I'm cooking someething

mild raft
#

can I DM?

steady forge
mild raft
#

sent fr, thanks

steady forge
#

i created notebook showing submission example

grand herald
#

Thank you for sharing @steady forge 🙂

steady forge
bleak mango
#

Hi everyone. Difficult challenge. I spent yesterday trying some MAML approach on it without any success....

bleak mango
#

Model Agnostic Meta Learning

steady forge
#

interesting, will look into it

bleak mango
#

I think I will post a notebook on it. Even with exception poor score, just to share this knowledge

steady forge
#

Where can I find the custom metric for this competition? I swear I've seen it before I can't find it now...

bleak mango
#

I don´t know if there is a custom metric since it is all or nothing. Just as simples as that.

#

Maybe others can try something that I failed to see

steady forge
bleak mango
#

ok, I will do it later

bleak mango
#

After reading a lot of papers on this challenge I decided to take a random task and see what is the best I can do on the "abstract reasoning" part. For that I selected task "ff805c23" and code an algorithm not to just solve it, but to random generate new examples for a given configuration.

#

Basically, one way to solve it is to get the coordinates from the blue box, flip the grid depending on the blue box location and extract the answer.

#

After that I decide to train a CNN (Auto Encoder / U-Net) for an infinite number of epochs on batches of randomly generated grid in an attempt to see if the ML learns on how to use the "mirror" part of the grid ("abstraction reason") to get a result instead of memorizing patterns

#

and now I don't know if the model is learning to reason something or just predicting that the closest surrounding label..

#

now if I just change the color of the "blue box" for another label It is clear that the model didn't "abstract reasoned" that the answer should at least involves the "box" part.

#

but If I take the model and train it in a few steps on the new "block color" It learns fast the new pattern

#

but still the "abstract reason" is falling, since to me , it is not clear that the model is learning to think on how to use the mirrored part of the grid to search for the answer.

#

Either way this is an insanely interesting challenge to learn new tricks

bleak mango
#

OK...after like a day letting the model training:

nimble dirge
#

what hardware did you train it on @bleak mango ?

bleak mango
#

On my local laptop. (RTX 3070 GPU - 8GB). The important next step is analyze what features of the input grid contributes the most for the output in order to see if it has something to do with the opposite side or just the local blue square.

hardy anvil
hardy anvil
# bleak mango

Or was this just for a random curiosity with no relation to the larger ARC task?

bleak mango
#

Brute Force (DSL, GA, CA...) and pattern memorization to generalization (NN) does not seem the final way to go on this challenge. The first needs an astronomical number of combinations to discover the solution while the second needs an astronomical number of samples to learn ways to generalizing without producing novel ways of thinking in the face of new problems. A combination of both is suggested to be the path.

supple axle
verbal mulch
#

Yeah a combination of both or something totally unique. Which is basically the point of the challenge. The creators definitely don't think the current state-of-the-art is sufficient to solve it. They are looking for someone to come out of nowhere and try something totally crazy to solve it.

keen marsh
# supple axle where did you find the papers on this challenge?
supple axle
#

Thank you

bleak mango
#

I was spotting the use o genetic algorithm combined with DSL to solve the problems (https://www.kaggle.com/code/zenol42/dsl-and-genetic-algorithm-applied-to-arc/notebook) and I was thinking on how to adapt it so you can use not only transformation function where just one argument can be passed, but a more flexible set of functions with multiple arguments like add_row(grid, values).

smoky pendant
#

How are there 100+ people with the same score of 26.00 lol

hardy anvil
smoky pendant
#

I'm getting ~21% on the validation set but 0% upon submission... is there something wrong with my submission file?

prisma swallow
trail garnet
prisma swallow
#

I start from a naive test that I hard train llama3.1 to follow the pattern in train dataset, but it seems not work, I get a 3/419 in eval test (

verbal mulch
#

Yeah LLMs are not the solution. You should be trying to come up with something new.

scenic anvil
#

where can i find the public notebook with a score of 26

hardy anvil
#

gemma 2 9b performs much better in my tests

prisma swallow
solemn zenith
#

Hey all, i am looking for a teammate any one looking for one?

hardy anvil
#

In many cases that will work, but definitely not in all cases.

#

There are cases where there are different output shapes for the training input.

hardy anvil
#

It's not a typo

#

there are different output shapes for different training inputs for a given task.

#

so you can't make the assumption you can just copy the training output shapes.

#

yeah

#

if you want to get a good score, probably yeha

#

yeah

marble anvil
#

hello

#

can someone explain this task? I dont get what is going on

#

how do the numbers know which grid position to move to?

verbal mulch
marble anvil
#

oh woah

#

that was tricky.

#

thanks!

marble anvil
#

just to clarify, in the test set there is a new task not seen in the train we have to solve? B c if tou show me this grid picture it isnt very likely I as a human could give you this output espeically not given an example output

#

or is it the case the skill has been seen in the train before

verbal mulch
#

@marble anvil The test set is just like the train set. There are example input/output pairs, and then the input with no output that you must create. And yes, a lot of them are brand new in the test set never seen in the training set.

marble anvil
#

gotcha just like the eval

stone ridge
jovial musk
#

hey does anyone know how you can use your own finetuned llama model

#

from what i understand its within the rules so how do i like upload the model weights to kaggle

#

or is that even possible

prisma swallow
#

the submit page has an entry for uploading your model.

jovial musk
#

thanks

static dagger
#

From the https://arcprize.org/ it states that the current leaderboard high score is 46%, but that the average (smart) human score is only around 22% ?! Is this correct?

solemn solar
#

Estimates of human performance are 80-100%. You can try yourself and see how many you can get correct.

summer wasp
#

Hi, everybody.
I'm finding a teammate leading to me or learning from each other.
Actually, I'm new to kaggle competition so I want to collaborate and assist in this competition.
Even I'm new to kaggle, I think I know about LLM well.
This is my profile.
https://www.kaggle.com/jasperjack

river fossil
#

hi guys; so i read that there will be re-training happening after we submit the notebook

how many GPU-hours are permitted for this re-training? I saw the number 12hour - is it a combined of training + doing the test or do we get 12hour for each?

also: 12hour GPU + 12hour CPU -> does that mean if coutning Idle or, it is possible to get ~24 hours total compute time?

hardy anvil
#

are you submitting someone else's code? Then that may indeed be the case. 12h is for total time

river fossil
#

lol why would i submit someone else code

river fossil
river fossil
# river fossil read the doc on data

alright; i mistook it that there is a hidden set of training data where they will re-train the model because it said train" input/output pairs”

hardy anvil
river fossil
#

seriously? then whats the fun

hardy anvil
hardy anvil
river fossil
#

this is like the most interesting thing i came across in many years

#

and quite sure that is not allowed

#

i think working on it will cure my depression hahah. but it’s sad to know ppl are not after the journey

worn rivet
worn rivet
hardy anvil
worn rivet
hardy anvil
#

Just the daily puzzle as shown in the link

#

I assume its the same

#

Drawing it is kinda annoying on phone tho

#

Just quadruple the input and put lightblue on the diagonally adjacent squares to each green one

worn rivet
#

The puzzle ID is on the upper left. Just want to make sure since not sure of the time zone arcprize uses. I have no idea what the solution should look like since the test are green inputs and it's not just put light blue on diagonals since the pink inputs aren't just diagonals.

#

Hmm, oh wait, the pinks are just diagonals.

hardy anvil
worn rivet
#

How to resize? I clicked Resize button and there isn't an input field nor am I able to adjust the output grid.

#

Got it, the field is left of the button

hardy anvil
#

nice

solemn solar
#

Interesting paper and environment: https://www.arxiv.org/abs/2407.20806 ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning
Hosung Lee, Sejin Kim, Seungpil Lee, Sanha Hwang, Jihwan Lee, Byung-Jun Lee, Sundong Kim
This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual tasks through ARCLE. The adoption of non-factorial policies and auxiliary losses led to performance enhancements, effectively mitigating issues associated with action spaces and goal attainment. Based on these insights, we propose several research directions and motivations for using ARCLE, including MAML, GFlowNets, and World Models.

summer wasp
#

Hi, everybody.
I'm finding a teammate leading to me or learning from each other.
Actually, I'm new to kaggle competition so I want to collaborate and assist in this competition.
Even I'm new to kaggle, I think I know about LLM well.
This is my profile.
https://www.kaggle.com/jasperjack

hardy crag
#

Hi, guys. After I click "submit prediction" and select my notebook(which has ran rightly and produced submission.json already), the submission has been stuck for long.
Is this expected? Thanks for any reply!

static dagger
hardy anvil
#

I do agree the wording might be a tad confusing here.

distant geode
#

Still looking for some one to explore combination of DSL and LLM. Ideally someone with strong LLM background (I know how to finetune models, but not that strong in theoretical part) to exchange ideas and try out these experiments. If you don't know much then it's still fine, we shoot in the dark and learn together 😊

hardy crag
#

Wondering how new Chatgpt o1 perform on ARC tasks, anyone have any benchmark data? 😃

lilac ivy
#

Hey everyone! Just did my first submission, how long time does it ususally take to do the Notebook Running step after submitting? 🙂

lilac ivy
#

Took roughly 10 minutes.

hardy anvil
#

Depends on what you’re submitting

worn rivet
#

Is there no way to import from private github repo? If not, how do I import the files after uploading them in DATASETS with name "pyfiles"? And how do I make existing imports within my src files uploaded work w/o any modifications?

worn rivet
#

K, got files to be flattened as 1 file.

worn rivet
fringe veldt
worn rivet
worn rivet
hardy anvil
#

For example, the blue object corresponds to a horizontal line of length 3 that extends to the left

#

The pink one is a vertical line of length 2 that extends downwards

#

They always start from 1 square under the place the last line left off.
Then ofc crop the right side and u got your solution

worn rivet
hardy anvil
#

I think if people are familiarized with a couple examples, most will be able to get 85%

#

I’ve been able to solve all examples I’ve seen in less than a minute (to get to the solution idea, drawing takes longer ofc), but that’s also likely due to prior experience with puzzles. For people with no related experience whatsoever, it may be much harder.

reef orbit
#

@distant geode check dm

lavish palm
#

can anyone help me ? whenever i went for submit it show inference error and got reject

dense sinew
#

CatFacepalm hard competion.

stark gate
#

Hello, we are looking for a researcher to join our University team for the ARC competition. If interested, please let me know!

fringe veldt
worn rivet
#

@fringe veldt I tried yours, but couldn't figure it out too. These 2 seem a bit similar. Figuring out one may help the other.

hardy anvil
# worn rivet I don't get this one: https://neoneye.github.io/arc/edit.html?dataset=ARC&task=3...

Certainly! I'd be happy to provide an analysis of the puzzle you've presented. Let's examine the pattern in these training examples:
The training examples demonstrate a consistent pattern:

  1. Left example: There is a single yellow square present. The rule applied is to surround this lone yellow square with a red border.
  2. Middle example: A solitary pink square is shown. Following the same principle, this individual pink square is encircled by a red border.
  3. Right example: The image contains just one green square. Adhering to the established pattern, this isolated green square is framed with a red border.

The key insight from these examples is:
When there is only one square of any color present in the image, that square is highlighted by surrounding it with a red border.
This rule appears to apply regardless of the specific color of the lone square, emphasizing the importance of uniqueness rather than the particular hue.

Is there anything else you'd like me to clarify or expand upon regarding this pattern?

hardy anvil
#

I guess his website is buggy for this puzzle

#

@latent timber I wonder is it maybe because it has 2 answer slots that the reveal button not work?

worn rivet
fringe veldt
hardy anvil
# worn rivet ? Can you further explain?

Every training grid has 1 color showing up more than the others. This color maps to a specific shape in the output grid. The blue always maps to the same, the red to the same and the green to the same .

worn rivet
#

@hardy anvil Thank you. For reference, an easier way to understand this may be this: The output shapes are constants. There are only as many possible outputs as number of colors of input. Each color maps or is represented by a shape. The color/shape chosen is by the count of the color appearances.

odd compass
#

I solved it but have to wait 7 more hours…….

knotty crystal
#

Anyone can tell me what is this project based learning what are the requirements to start building anything and how people follow this path?

velvet urchin
normal marsh
#

@whole olive nice to see that, I find the complete lack of interesting discussion on the kaggle forums very disappointing compared to other contests

vapid compass
#

Hey, our latest submission was started before the submission deadline, and finished in the time limit. However, we cannot select it for scoring, is it chosen automatically?

velvet urchin
vapid compass
#

Ah, no what I meant was that we submitted a code solution 2 hours before the deadline, which finished 10 hours after deadline and would have beaten place 1
But sadly it doesn't count, as the deadline is for the generation of the submission.json, and not for the code submission
(Although I think MindsAI had the same issue and might have beaten us either way, haha)

normal marsh
#

I believe not, my understanding was the submission has to finish before the deadline

#

it is a gotcha that isn't obvious

#

so, wow, you missed out on 1st place just because you decided to wait until the last moment before submitting to make minor improvements? Or did you only finish it with a couple hours to spare?

#

I also was working up to the deadline to get my submission ready but didn't meet it. It would have been extremely unfinished though, barely improving an ensemble (hopefully)

#

I think the distribution of prizes sucks

#

previous ARC contests have suffered from sometimes only 1-2 good entrants, having low 3rd-5th prizes doesn't help, and the initial paper prizes were even worse

keen marsh
#

the 600K grand prize is still out for grabs though. ARC challenge 2025, prepare yourselves...

vapid compass