#arc-prize-2024 | Kaggle | Page 1

verbal mulch Jun 12, 2024, 9:14 PM

#

This is such a cool problem. It's even pretty fun to do by hand!

bleak basin Jun 12, 2024, 10:08 PM

#

Yeeees!! Really cool problem. If anyone wants to work together on the competition!

clear yarrow Jun 13, 2024, 2:21 AM

#

bleak basin Yeeees!! Really cool problem. If anyone wants to work together on the competitio...

Agree, this problem seems to be really interesting! I'm a beginner, but I'm really eager to learn and contribute. I'd love to work together on the competition if you're willing to team up with a newbie!

brittle oasis Jun 13, 2024, 12:07 PM

#

Hey everyone! making AGI on a single GPU for $1 million. 😂 love it! Lets go... 😎 ❤️ 🚀

brittle oasis Jun 13, 2024, 2:10 PM

#

...very smart test. Bigtech LLMs will eventually bust ARC-AGI once they enough scale (pretrained on enough of their data). But then you could rebase the benchmark on totally different paradigm (but similar core knowledge) and they would get stuck again. Enforcing FOSS and low specs for ARC-AGI means a solution has to be real intelligence (unless its some BS domain-specific hack solution.. hope the incentive structures stipulate the generalizable solution as that's key to the benchmark integrity IMO). good luck everyone! 🙏 🙌

lean muralBOT Jun 13, 2024, 7:02 PM

#

gregkamradt has been warned

Reason: Posted an invite

trail garnet Jun 13, 2024, 7:09 PM

#

Hey team! Greg from the ARC Prize team here

We are excited to have you interested in this competition.

If you haven’t already, I highly suggest checking out our launch blog post [1] and competition trailer [2]

We have a bunch of resources for you to get started

Official website: https://arcprize.org
Getting started solving ARC-AGI: https://arcprize.org/guide
Community: https://discord. gg/9b77dPAmcA
YouTuber: https://www.youtube.com/channel/UC_rdrp-QkrZn-ce9uCE-0EA
Twitter: https://twitter.com/arcprize
Newsletter: https://arcprize.ck.page/bc80575d89

Our #1 goal is to make sure you have the resources and tools you need to get started. Please let us know if you have any questions.

Feel free to reach out to me directly or team@arcprize.org

See you on the leaderboard!

[1] https://arcprize.org/blog/launch
[2] https://www.youtube.com/watch?v=2avWAHXUXXs

pure jetty Jun 13, 2024, 7:10 PM

#

trail garnet Hey team! Greg from the ARC Prize team here We are excited to have you interest...

spare frost Jun 15, 2024, 12:56 AM

#

how big is the training set in this competition?

verbal mulch Jun 15, 2024, 2:39 AM

#

Depends on what you mean. Usually between 2-5 examples.

#

You have to build something that can learn from that few

verbal mulch Jun 15, 2024, 3:25 AM

#

But I believe it is 400 challenges in the training set.

cyan cipher Jun 16, 2024, 9:49 AM

#

I had a dream about this competition last night even though I'm not even taking part in it 🙈

delicate flint Jun 17, 2024, 12:14 AM

#

cyan cipher I had a dream about this competition last night even though I'm not even taking ...

what was the dream

spice drum Jun 17, 2024, 1:24 AM

#

Hi I am new here and not sure if this is the right place to ask questions about ARC Prize.

#

I am wondering if I could just use regular C to enter this competition?

#

I remember back when Sudoku was new and a computer magazine held a competition where participants submitted their algorithm solution in C to solve any Sudoku puzzles. I had hoped that the ARC Prize challenge could be the same way.

lean muralBOT Jun 17, 2024, 5:28 AM

#

shure9200 has been warned

Reason: Posted an invite

lone axle Jun 17, 2024, 9:22 AM

#

Has anyone managed to run a 70b model with a reasonable runtime?

#

TPU or GPU

verbal mulch Jun 17, 2024, 1:59 PM

#

@spice drum Yes, it's technically possible, but the final solution is a Python notebook. So you'd have to call your C functionality from Python in a wrapper.

#

I'm struggling with the same thing. I'd prefer using Julia for this, but I don't feel like the hassle of mixing languages.

willow arch Jun 18, 2024, 9:13 PM

#

Can you please help me. What am I doing wrong?

verbal mulch Jun 19, 2024, 12:28 AM

#

Yeah you have to read in their file. You'll never see it.

#

    test_challenges = load_json('/kaggle/input/arc-prize-2024/arc-agi_test_challenges.json')
    test_predictions = make_predictions_advanced(model, test_challenges)
    submission = {}
    for task_id in test_predictions.keys():
        submission[task_id] = test_predictions[task_id]

    # Save the submission
    output_json_path = '/kaggle/working/submission.json'
    with open(output_json_path, 'w') as file:
        json.dump(submission, file)

#

You'll have to fill in the blanks for how you attack the problem, but this is what a submission will basically look like

#

@willow arch

willow arch Jun 19, 2024, 6:07 AM

#

I did simple working code based on @verbal mulch proposal:

import json

def make_predictions_advanced(model, data):
    output = dict()
    for task_id, task_data in data.items():
        final_answer = []
        for j, question in enumerate(task_data['test']):
            answer = [[0, 0], [0, 0]]
            answer_double = {
                "attempt_1": answer,
                "attempt_2": answer,
            }
            final_answer.append(answer_double)
        output[str(task_id)] = final_answer.copy()
    return output
    

model = None
test_challenges = json.load(open('/kaggle/input/arc-prize-2024/arc-agi_test_challenges.json'))
test_predictions = make_predictions_advanced(model, test_challenges)
submission = {}
for task_id in test_predictions.keys():
    submission[task_id] = test_predictions[task_id]

# Save the submission
output_json_path = '/kaggle/working/submission.json'
with open(output_json_path, 'w') as file:
    json.dump(submission, file)

#

I still have the same problem...

#

Also I tried to do fork of working code from LB (18 points solution). And I also can't submit even working fork.

craggy orbit Jun 20, 2024, 1:50 PM

#

Submission only works when you click it from inside the notebook , the notebook will need to run .

willow arch Jun 20, 2024, 8:50 PM

#

It was old style... Now it must be submitted from edit page. I found thank you.

#

grand herald Jun 30, 2024, 8:10 AM

#

Hey, does anyone have some starter code that I could modify (in Python)? It doesn't need to produce any good results; just an output that can be submitted to the leaderboard. Maybe even visualise the data too? Sorry, just new to all this and want to give it a go 🙂

steady forge Jun 30, 2024, 11:21 AM

#

grand herald Hey, does anyone have some starter code that I could modify (in Python)? It does...

check the "code" section of the competition

grand herald Jun 30, 2024, 11:44 AM

#

steady forge check the "code" section of the competition

Thank you, I will do 🙂

fading acorn Jul 1, 2024, 11:39 AM

#

Quick question: Are we allowed to use the validation dataset as part of the training data? Because 400 samples is ridiculously small and having another 400 would be very useful. After all, the only metric that really matters is performance on the hidden set.

verbal mulch Jul 1, 2024, 4:39 PM

#

Unofficially, yes.

#

They can't stop you. But they do say not to.

#

You're supposed to learn based on the examples, not memorize everything. The best solution would do no training on any of it. It would see the example, learn from it, and do it.

narrow zenith Jul 1, 2024, 6:14 PM

#

verbal mulch You're supposed to learn based on the examples, not memorize everything. The bes...

There's no harm in using that, it's just that it is highly likely that the solution may not work on private lb

verbal mulch Jul 1, 2024, 6:41 PM

#

narrow zenith There's no harm in using that, it's just that it is highly likely that the solut...

I would argue you are more likely to succeed on the private ones since you aren't basing your answer on previously seen stuff and only on learning how to solve the problem in front of you. Which is really the point of the challenge.

narrow zenith Jul 1, 2024, 9:07 PM

#

verbal mulch I would argue you are more likely to succeed on the private ones since you aren'...

What I mean is using validation/test data for training purposes will harm the score on private lb, coz you have not tested if it's generalizes well to all the distributions or not

proven oak Jul 2, 2024, 10:24 AM

#

@trail garnet
I found this part of rules a bit confusing;
are we actually allowed to use publicly available models like llama or gemma, or not?
is finetuning of such models eligible for the prize?

trail garnet Jul 2, 2024, 1:09 PM

#

proven oak <@875389546051801088> I found this part of rules a bit confusing; are we actual...

Yes! You can use publicly available models that do not require internet access.

In fact we are a few templates using llama to get you started

mild raft Jul 5, 2024, 8:47 AM

#

Hello everyone! Question regarding the format of submission file - I have a smilar problem related to the discussion here: https://www.kaggle.com/competitions/arc-prize-2024/discussion/515327

Does anyone have a suggestion for thefix?

ARC Prize 2024

Create an AI capable of solving reasoning tasks it has never seen before

steady forge Jul 6, 2024, 11:22 PM

#

how to submit? lol

#

are there any code examples where they show correct format of submitting?

steady forge Jul 7, 2024, 12:35 AM

#

why submission takes so long?

#

yay! I got 0.0

grand herald Jul 7, 2024, 3:14 AM

#

Good job @steady forge, good to hear you got it working 🙂

steady forge Jul 7, 2024, 8:34 AM

#

grand herald Good job <@305317021661528066>, good to hear you got it working 🙂

Thanks

mild raft Jul 7, 2024, 2:48 PM

#

mild raft Hello everyone! Question regarding the format of submission file - I have a smil...

Still struck in the same problem 😦

steady forge Jul 7, 2024, 2:53 PM

#

mild raft Still struck in the same problem 😦

do you use the sample_submission file to submit?

#

or you create new file and submit?

#

I suggest you take the sample_submission, and replace the attempts there with yours

steady forge Jul 7, 2024, 3:11 PM

#

what

mild raft Jul 7, 2024, 8:55 PM

#

steady forge or you create new file and submit?

new file named submission.json

steady forge Jul 7, 2024, 8:55 PM

#

mild raft new file named submission.json

eeeeeeeeh

#

ok wait I'm cooking someething

mild raft Jul 7, 2024, 8:56 PM

#

can I DM?

steady forge Jul 7, 2024, 8:56 PM

#

mild raft can I DM?

yah

mild raft Jul 7, 2024, 8:56 PM

#

sent fr, thanks

steady forge Jul 8, 2024, 3:39 PM

#

https://www.kaggle.com/code/anrenk/submission-example/notebook

Submission Example

Explore and run machine learning code with Kaggle Notebooks | Using data from ARC Prize 2024

#

i created notebook showing submission example

grand herald Jul 9, 2024, 12:57 AM

#

Thank you for sharing @steady forge 🙂

steady forge Jul 9, 2024, 5:45 AM

#

grand herald Thank you for sharing <@305317021661528066> 🙂

welcome bro

bleak mango Jul 9, 2024, 4:42 PM

#

Hi everyone. Difficult challenge. I spent yesterday trying some MAML approach on it without any success....

steady forge Jul 9, 2024, 5:15 PM

#

bleak mango Hi everyone. Difficult challenge. I spent yesterday trying some MAML approach on...

what is MAML

bleak mango Jul 9, 2024, 5:32 PM

#

Model Agnostic Meta Learning

steady forge Jul 9, 2024, 6:26 PM

#

bleak mango Model Agnostic Meta Learning

first time hearing such a thing

#

interesting, will look into it

bleak mango Jul 9, 2024, 9:38 PM

#

I think I will post a notebook on it. Even with exception poor score, just to share this knowledge

steady forge Jul 10, 2024, 6:34 AM

#

bleak mango I think I will post a notebook on it. Even with exception poor score, just to sh...

that'd be wonderful!

steady forge Jul 10, 2024, 8:04 AM

#

Where can I find the custom metric for this competition? I swear I've seen it before I can't find it now...

bleak mango Jul 11, 2024, 2:53 PM

#

I don´t know if there is a custom metric since it is all or nothing. Just as simples as that.

#

I will share here the code for MAML for those interesed on it

📎 ARC_2024_-_NeuroKraft_-_MAML.py

#

Maybe others can try something that I failed to see

steady forge Jul 11, 2024, 3:26 PM

#

bleak mango I will share here the code for MAML for those interesed on it

can you share as a kaggle notebook?

bleak mango Jul 11, 2024, 3:33 PM

#

ok, I will do it later

bleak mango Jul 14, 2024, 4:02 PM

#

After reading a lot of papers on this challenge I decided to take a random task and see what is the best I can do on the "abstract reasoning" part. For that I selected task "ff805c23" and code an algorithm not to just solve it, but to random generate new examples for a given configuration.

#

Basically, one way to solve it is to get the coordinates from the blue box, flip the grid depending on the blue box location and extract the answer.

#

After that I decide to train a CNN (Auto Encoder / U-Net) for an infinite number of epochs on batches of randomly generated grid in an attempt to see if the ML learns on how to use the "mirror" part of the grid ("abstraction reason") to get a result instead of memorizing patterns

#

#

and now I don't know if the model is learning to reason something or just predicting that the closest surrounding label..

#

now if I just change the color of the "blue box" for another label It is clear that the model didn't "abstract reasoned" that the answer should at least involves the "box" part.

#

#

#

but If I take the model and train it in a few steps on the new "block color" It learns fast the new pattern

#

#

but still the "abstract reason" is falling, since to me , it is not clear that the model is learning to think on how to use the mirrored part of the grid to search for the answer.

#

Either way this is an insanely interesting challenge to learn new tricks

bleak mango Jul 15, 2024, 10:29 AM

#

OK...after like a day letting the model training:

#

nimble dirge Jul 15, 2024, 12:44 PM

#

what hardware did you train it on @bleak mango ?

bleak mango Jul 15, 2024, 12:47 PM

#

On my local laptop. (RTX 3070 GPU - 8GB). The important next step is analyze what features of the input grid contributes the most for the output in order to see if it has something to do with the opposite side or just the local blue square.

hardy anvil Jul 17, 2024, 3:31 PM

#

bleak mango

What is the purpose of this going to be tho?

hardy anvil Jul 18, 2024, 5:55 AM

#

bleak mango

Or was this just for a random curiosity with no relation to the larger ARC task?

bleak mango Jul 18, 2024, 4:22 PM

#

hardy anvil Or was this just for a random curiosity with no relation to the larger ARC task?

the latter one...

#

Brute Force (DSL, GA, CA...) and pattern memorization to generalization (NN) does not seem the final way to go on this challenge. The first needs an astronomical number of combinations to discover the solution while the second needs an astronomical number of samples to learn ways to generalizing without producing novel ways of thinking in the face of new problems. A combination of both is suggested to be the path.

supple axle Jul 18, 2024, 7:07 PM

#

bleak mango After reading a lot of papers on this challenge I decided to take a random task ...

where did you find the papers on this challenge?

verbal mulch Jul 18, 2024, 8:26 PM

#

Yeah a combination of both or something totally unique. Which is basically the point of the challenge. The creators definitely don't think the current state-of-the-art is sufficient to solve it. They are looking for someone to come out of nowhere and try something totally crazy to solve it.

keen marsh Jul 18, 2024, 9:44 PM

#

supple axle where did you find the papers on this challenge?

Link to it @ https://arxiv.org/abs/1911.01547

arXiv.org

On the Measure of Intelligence

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abund...

supple axle Jul 18, 2024, 11:20 PM

#

Thank you

bleak mango Jul 23, 2024, 12:48 AM

#

I was spotting the use o genetic algorithm combined with DSL to solve the problems (https://www.kaggle.com/code/zenol42/dsl-and-genetic-algorithm-applied-to-arc/notebook) and I was thinking on how to adapt it so you can use not only transformation function where just one argument can be passed, but a more flexible set of functions with multiple arguments like add_row(grid, values).

DSL and Genetic Algorithm applied to ARC

Explore and run machine learning code with Kaggle Notebooks | Using data from Abstraction and Reasoning Challenge

smoky pendant Aug 15, 2024, 2:03 AM

#

How are there 100+ people with the same score of 26.00 lol

hardy anvil Aug 15, 2024, 7:37 AM

#

smoky pendant How are there 100+ people with the same score of 26.00 lol

People just submit the highest scoring public notebook, which is a 26.

smoky pendant Aug 18, 2024, 5:18 PM

#

I'm getting ~21% on the validation set but 0% upon submission... is there something wrong with my submission file?

📎 submission_1.json

prisma swallow Aug 19, 2024, 7:36 AM

#

smoky pendant I'm getting ~21% on the validation set but 0% upon submission... is there someth...

maybe you should keep your keys in dict completely match with "sample_submission.json". remove the input field in each dict

trail garnet Aug 19, 2024, 7:27 PM

#

smoky pendant I'm getting ~21% on the validation set but 0% upon submission... is there someth...

Submission files don't have "input" on them, they need to have attempt_1 and attempt_2

You can check out the format here
https://www.kaggle.com/competitions/arc-prize-2024/overview

and there is also a "samle submission.json" on the dataset

ARC Prize 2024

Create an AI capable of solving reasoning tasks it has never seen before

prisma swallow Aug 20, 2024, 10:52 AM

#

I start from a naive test that I hard train llama3.1 to follow the pattern in train dataset, but it seems not work, I get a 3/419 in eval test (

verbal mulch Aug 20, 2024, 1:57 PM

#

Yeah LLMs are not the solution. You should be trying to come up with something new.

scenic anvil Aug 20, 2024, 2:16 PM

#

where can i find the public notebook with a score of 26

hardy anvil Aug 20, 2024, 3:43 PM

#

prisma swallow I start from a naive test that I hard train llama3.1 to follow the pattern in tr...

llama 3.1 is fairly bad at this task

#

gemma 2 9b performs much better in my tests

prisma swallow Aug 21, 2024, 4:49 AM

#

hardy anvil gemma 2 9b performs much better in my tests

Thanks， I will take it for a try : )

solemn zenith Aug 21, 2024, 2:21 PM

#

Hey all, i am looking for a teammate any one looking for one?

hardy anvil Aug 21, 2024, 6:21 PM

#

In many cases that will work, but definitely not in all cases.

#

There are cases where there are different output shapes for the training input.

hardy anvil Aug 21, 2024, 7:45 PM

#

It's not a typo

#

there are different output shapes for different training inputs for a given task.

#

so you can't make the assumption you can just copy the training output shapes.

#

yeah

#

if you want to get a good score, probably yeha

#

https://neoneye.github.io/arc/edit.html?dataset=ARC&task=136b0064

#

yeah

marble anvil Aug 23, 2024, 3:07 AM

#

hello

#

can someone explain this task? I dont get what is going on

Screenshot_2024-08-22_at_10.08.56_PM.png

#

how do the numbers know which grid position to move to?

verbal mulch Aug 23, 2024, 3:18 AM

#

marble anvil how do the numbers know which grid position to move to?

Yeah this one you take the square on the input that doesn't have blue (8) and then "blow it up" to the output. So on the top the red on the top left becomes the red on the top left. The green on the middle right is the green on the middle right.

marble anvil Aug 23, 2024, 3:30 AM

#

oh woah

#

that was tricky.

#

thanks!

marble anvil Aug 23, 2024, 5:06 AM

#

just to clarify, in the test set there is a new task not seen in the train we have to solve? B c if tou show me this grid picture it isnt very likely I as a human could give you this output espeically not given an example output

#

or is it the case the skill has been seen in the train before

verbal mulch Aug 23, 2024, 1:50 PM

#

@marble anvil The test set is just like the train set. There are example input/output pairs, and then the input with no output that you must create. And yes, a lot of them are brand new in the test set never seen in the training set.

marble anvil Aug 23, 2024, 2:28 PM

#

gotcha just like the eval

stone ridge Aug 25, 2024, 4:50 PM

#

This looks like a recording from the (online) conference: https://virtual.2020.emnlp.org/tutorial_T4.html

jovial musk Aug 30, 2024, 12:52 AM

#

hey does anyone know how you can use your own finetuned llama model

#

from what i understand its within the rules so how do i like upload the model weights to kaggle

#

or is that even possible

prisma swallow Aug 30, 2024, 6:03 AM

#

the submit page has an entry for uploading your model.

jovial musk Aug 30, 2024, 5:20 PM

#

thanks

static dagger Aug 30, 2024, 8:28 PM

#

From the https://arcprize.org/ it states that the current leaderboard high score is 46%, but that the average (smart) human score is only around 22% ?! Is this correct?

ARC Prize

ARC Prize is a $1,000,000+ nonprofit, public competition to beat and open source a solution to the ARC-AGI benchmark.

solemn solar Sep 1, 2024, 12:09 AM

#

Estimates of human performance are 80-100%. You can try yourself and see how many you can get correct.

summer wasp Sep 2, 2024, 3:09 PM

#

Hi, everybody.
I'm finding a teammate leading to me or learning from each other.
Actually, I'm new to kaggle competition so I want to collaborate and assist in this competition.
Even I'm new to kaggle, I think I know about LLM well.
This is my profile.
https://www.kaggle.com/jasperjack

Jason Dan | Novice

Kaggle profile for Jason Dan

river fossil Sep 2, 2024, 6:31 PM

#

hi guys; so i read that there will be re-training happening after we submit the notebook

how many GPU-hours are permitted for this re-training? I saw the number 12hour - is it a combined of training + doing the test or do we get 12hour for each?

also: 12hour GPU + 12hour CPU -> does that mean if coutning Idle or, it is possible to get ~24 hours total compute time?

hardy anvil Sep 2, 2024, 6:35 PM

#

river fossil hi guys; so i read that there will be re-training happening after we submit the ...

wdym retraining after submitting the notebook?

#

are you submitting someone else's code? Then that may indeed be the case. 12h is for total time

river fossil Sep 2, 2024, 6:36 PM

#

lol why would i submit someone else code

river fossil Sep 2, 2024, 6:37 PM

#

hardy anvil wdym retraining after submitting the notebook?

read the doc on data

river fossil Sep 2, 2024, 6:54 PM

#

river fossil read the doc on data

alright; i mistook it that there is a hidden set of training data where they will re-train the model because it said train" input/output pairs”

hardy anvil Sep 2, 2024, 7:03 PM

#

river fossil lol why would i submit someone else code

Most of the near top entries do exactly that xd

river fossil Sep 2, 2024, 7:03 PM

#

seriously? then whats the fun

hardy anvil Sep 2, 2024, 7:03 PM

#

river fossil alright; i mistook it that there is a hidden set of training data where they wil...

Yeah the naming can be a lil confusing

hardy anvil Sep 2, 2024, 7:03 PM

#

river fossil seriously? then whats the fun

Easy medals ig

river fossil Sep 2, 2024, 7:04 PM

#

this is like the most interesting thing i came across in many years

#

and quite sure that is not allowed

#

i think working on it will cure my depression hahah. but it’s sad to know ppl are not after the journey

worn rivet Sep 3, 2024, 1:06 AM

#

Anyone solved today's puzzle at https://arcprize.org/play ?

ARC Prize

ARC Prize - Play the Game

Easy for humans, hard for AI. Try ARC-AGI.

sinful osprey Sep 3, 2024, 2:49 AM

#

worn rivet Anyone solved today's puzzle at https://arcprize.org/play ?

Yup

worn rivet Sep 3, 2024, 4:47 AM

#

sinful osprey Yup

how?

hardy anvil Sep 3, 2024, 5:03 AM

#

worn rivet how?

Just put colors on the diagonally adjacent tiles

worn rivet Sep 3, 2024, 5:05 AM

#

hardy anvil Just put colors on the diagonally adjacent tiles

Talking about 10fcaaa3, right? Screen shot of solution please?

hardy anvil Sep 3, 2024, 5:13 AM

#

Just the daily puzzle as shown in the link

#

I assume its the same

#

Drawing it is kinda annoying on phone tho

#

Just quadruple the input and put lightblue on the diagonally adjacent squares to each green one

worn rivet Sep 3, 2024, 5:17 AM

#

The puzzle ID is on the upper left. Just want to make sure since not sure of the time zone arcprize uses. I have no idea what the solution should look like since the test are green inputs and it's not just put light blue on diagonals since the pink inputs aren't just diagonals.

#

Hmm, oh wait, the pinks are just diagonals.

hardy anvil Sep 3, 2024, 5:20 AM

#

worn rivet Sep 3, 2024, 5:24 AM

#

How to resize? I clicked Resize button and there isn't an input field nor am I able to adjust the output grid.

#

Got it, the field is left of the button

hardy anvil Sep 3, 2024, 8:03 AM

#

nice

solemn solar Sep 4, 2024, 12:12 AM

#

Interesting paper and environment: https://www.arxiv.org/abs/2407.20806 ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning
Hosung Lee, Sejin Kim, Seungpil Lee, Sanha Hwang, Jihwan Lee, Byung-Jun Lee, Sundong Kim
This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual tasks through ARCLE. The adoption of non-factorial policies and auxiliary losses led to performance enhancements, effectively mitigating issues associated with action spaces and goal attainment. Based on these insights, we propose several research directions and motivations for using ARCLE, including MAML, GFlowNets, and World Models.

arXiv.org

ARCLE: The Abstraction and Reasoning Corpus Learning Environment fo...

This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with...

summer wasp Sep 4, 2024, 4:29 PM

#

Hi, everybody.
I'm finding a teammate leading to me or learning from each other.
Actually, I'm new to kaggle competition so I want to collaborate and assist in this competition.
Even I'm new to kaggle, I think I know about LLM well.
This is my profile.
https://www.kaggle.com/jasperjack

Jason Dan | Novice

Kaggle profile for Jason Dan

hardy crag Sep 5, 2024, 7:48 AM

#

Hi, guys. After I click "submit prediction" and select my notebook(which has ran rightly and produced submission.json already), the submission has been stuck for long.
Is this expected? Thanks for any reply!

static dagger Sep 6, 2024, 6:09 PM

#

solemn solar Estimates of human performance are 80-100%. You can try yourself and see how man...

really? I'm confused, right on the ARC prize website, the only data it lists on human performance on the 'private set' the set being tested against that's already at 46%, by two ostensibly smart humans already familiar with this type of problem, scored 22% and 24% respectively? Am I reading something wrong here?

hardy anvil Sep 6, 2024, 9:06 PM

#

static dagger really? I'm confused, right on the ARC prize website, the only data it lists on ...

This talks not about their own ability to solve ARC puzzles, but about the results of their code submissions.

#

I do agree the wording might be a tad confusing here.

distant geode Sep 8, 2024, 10:28 AM

#

Still looking for some one to explore combination of DSL and LLM. Ideally someone with strong LLM background (I know how to finetune models, but not that strong in theoretical part) to exchange ideas and try out these experiments. If you don't know much then it's still fine, we shoot in the dark and learn together 😊

hardy crag Sep 13, 2024, 7:01 AM

#

Wondering how new Chatgpt o1 perform on ARC tasks, anyone have any benchmark data? 😃

lilac ivy Sep 13, 2024, 12:56 PM

#

Hey everyone! Just did my first submission, how long time does it ususally take to do the Notebook Running step after submitting? 🙂

lilac ivy Sep 13, 2024, 1:17 PM

#

Took roughly 10 minutes.

hardy anvil Sep 13, 2024, 4:00 PM

#

Depends on what you’re submitting

worn rivet Sep 13, 2024, 10:13 PM

#

Is there no way to import from private github repo? If not, how do I import the files after uploading them in DATASETS with name "pyfiles"? And how do I make existing imports within my src files uploaded work w/o any modifications?

worn rivet Sep 13, 2024, 11:31 PM

#

K, got files to be flattened as 1 file.

worn rivet Sep 14, 2024, 6:27 AM

#

Anyone knows how to solve https://neoneye.github.io/arc/edit.html?dataset=ARC&task=136b0064

fringe veldt Sep 14, 2024, 11:19 AM

#

worn rivet Anyone knows how to solve https://neoneye.github.io/arc/edit.html?dataset=ARC&ta...

Yes. Do you mean in words or in code?

worn rivet Sep 14, 2024, 3:22 PM

#

fringe veldt Yes. Do you mean in words or in code?

In words or image

worn rivet Sep 14, 2024, 3:26 PM

#

fringe veldt Yes. Do you mean in words or in code?

Actually in words, since the 3 images is of no help to me

hardy anvil Sep 14, 2024, 4:24 PM

#

worn rivet Actually in words, since the 3 images is of no help to me

Each of the objects on the left corresponds to a colorline of a chain that spawns on the right side, starting from the gray

#

For example, the blue object corresponds to a horizontal line of length 3 that extends to the left

#

The pink one is a vertical line of length 2 that extends downwards

#

They always start from 1 square under the place the last line left off.
Then ofc crop the right side and u got your solution

worn rivet Sep 14, 2024, 4:46 PM

#

hardy anvil Each of the objects on the left corresponds to a colorline of a chain that spawn...

Thank you! I'm not sure if the average person scores 85% on these...

hardy anvil Sep 14, 2024, 6:11 PM

#

worn rivet Thank you! I'm not sure if the average person scores 85% on these...

Theres probably a large variance in what people will score

#

I think if people are familiarized with a couple examples, most will be able to get 85%

#

I’ve been able to solve all examples I’ve seen in less than a minute (to get to the solution idea, drawing takes longer ofc), but that’s also likely due to prior experience with puzzles. For people with no related experience whatsoever, it may be much harder.

reef orbit Sep 15, 2024, 3:17 PM

#

@distant geode check dm

lavish palm Sep 20, 2024, 5:08 AM

#

can anyone help me ? whenever i went for submit it show inference error and got reject

dense sinew Sep 23, 2024, 12:42 PM

#

CatFacepalm hard competion.

stark gate Sep 24, 2024, 9:59 AM

#

Hello, we are looking for a researcher to join our University team for the ARC competition. If interested, please let me know!

fringe veldt Sep 29, 2024, 5:48 AM

#

Does this have no solution? https://neoneye.github.io/arc/edit.html?dataset=ARC&task=9110e3c5&filter=expert

worn rivet Sep 30, 2024, 12:38 AM

#

I don't get this one: https://neoneye.github.io/arc/edit.html?dataset=ARC&task=31aa019c . Anyone knows?

#

@fringe veldt I tried yours, but couldn't figure it out too. These 2 seem a bit similar. Figuring out one may help the other.

hardy anvil Sep 30, 2024, 1:15 AM

#

worn rivet I don't get this one: https://neoneye.github.io/arc/edit.html?dataset=ARC&task=3...

Certainly! I'd be happy to provide an analysis of the puzzle you've presented. Let's examine the pattern in these training examples:
The training examples demonstrate a consistent pattern:

Left example: There is a single yellow square present. The rule applied is to surround this lone yellow square with a red border.
Middle example: A solitary pink square is shown. Following the same principle, this individual pink square is encircled by a red border.
Right example: The image contains just one green square. Adhering to the established pattern, this isolated green square is framed with a red border.

The key insight from these examples is:
When there is only one square of any color present in the image, that square is highlighted by surrounding it with a red border.
This rule appears to apply regardless of the specific color of the lone square, emphasizing the importance of uniqueness rather than the particular hue.

Is there anything else you'd like me to clarify or expand upon regarding this pattern?

hardy anvil Sep 30, 2024, 1:23 AM

#

fringe veldt Does this have no solution? https://neoneye.github.io/arc/edit.html?dataset=ARC&...

these should be the solution

#

I guess his website is buggy for this puzzle

#

@latent timber I wonder is it maybe because it has 2 answer slots that the reveal button not work?

worn rivet Sep 30, 2024, 4:53 AM

#

hardy anvil these should be the solution

Thank you for the reasoning. woohoo What's the logic behind the solution to @fringe veldt ?

fringe veldt Sep 30, 2024, 6:56 AM

#

worn rivet Thank you for the reasoning. <:woohoo:1138924114707480726> What's the logic beh...

I get it now. The output shapes are directly correlated to the color that shows up the most in each grid.

worn rivet Sep 30, 2024, 7:23 AM

#

fringe veldt I get it now. The output shapes are directly correlated to the color that shows...

? Can you further explain?

hardy anvil Sep 30, 2024, 11:14 AM

#

worn rivet ? Can you further explain?

Every training grid has 1 color showing up more than the others. This color maps to a specific shape in the output grid. The blue always maps to the same, the red to the same and the green to the same .

worn rivet Sep 30, 2024, 5:01 PM

#

@hardy anvil Thank you. For reference, an easier way to understand this may be this: The output shapes are constants. There are only as many possible outputs as number of colors of input. Each color maps or is represented by a shape. The color/shape chosen is by the count of the color appearances.

odd compass Oct 13, 2024, 5:15 PM

#

I solved it but have to wait 7 more hours…….

knotty crystal Oct 15, 2024, 11:16 AM

#

Anyone can tell me what is this project based learning what are the requirements to start building anything and how people follow this path?

velvet urchin Oct 16, 2024, 6:55 PM

#

fringe veldt Does this have no solution? https://neoneye.github.io/arc/edit.html?dataset=ARC&...

the output depends on the dominant color of input.

normal marsh Oct 29, 2024, 1:39 AM

#

@whole olive nice to see that, I find the complete lack of interesting discussion on the kaggle forums very disappointing compared to other contests

vapid compass Nov 11, 2024, 11:03 AM

#

Hey, our latest submission was started before the submission deadline, and finished in the time limit. However, we cannot select it for scoring, is it chosen automatically?

velvet urchin Nov 12, 2024, 12:56 PM

#

vapid compass Hey, our latest submission was started *before* the submission deadline, and fin...

The answer is yes. The best public score is used to select submissions if you don't select yourself. This applies to ARC competition as well. And given the private leaderboard is the same as the public leaderboard, selectingt he best public score workes just fine.

vapid compass Nov 12, 2024, 1:49 PM

#

Ah, no what I meant was that we submitted a code solution 2 hours before the deadline, which finished 10 hours after deadline and would have beaten place 1
But sadly it doesn't count, as the deadline is for the generation of the submission.json, and not for the code submission
(Although I think MindsAI had the same issue and might have beaten us either way, haha)

normal marsh Nov 12, 2024, 11:12 PM

#

I believe not, my understanding was the submission has to finish before the deadline

#

it is a gotcha that isn't obvious

#

so, wow, you missed out on 1st place just because you decided to wait until the last moment before submitting to make minor improvements? Or did you only finish it with a couple hours to spare?

#

I also was working up to the deadline to get my submission ready but didn't meet it. It would have been extremely unfinished though, barely improving an ensemble (hopefully)

#

I think the distribution of prizes sucks

#

previous ARC contests have suffered from sometimes only 1-2 good entrants, having low 3rd-5th prizes doesn't help, and the initial paper prizes were even worse

keen marsh Nov 14, 2024, 3:47 PM

#

the 600K grand prize is still out for grabs though. ARC challenge 2025, prepare yourselves...

vapid compass Nov 14, 2024, 7:05 PM

#

normal marsh so, wow, you missed out on 1st place just because you decided to wait until the ...

Well, first place had exactly the same issue, haha. They would have increased to 58 and we got 56.5.
But no we could not have submitted earlier, there were several last minute changes and submitted the code at 11pm in the evening, but are happy with second place (this year, lol)

mystic phoenix Dec 29, 2024, 4:00 AM

#

keen marsh the 600K grand prize is still out for grabs though. ARC challenge 2025, prepare ...

🙌