frigid thistle Mar 26, 2025, 7:32 PM

#

Anyone want to collaborate?

fringe skiff Mar 26, 2025, 9:18 PM

#

frigid thistle Anyone want to collaborate?

I'm down

frigid thistle Mar 26, 2025, 9:20 PM

#

fringe skiff I'm down

I'll dm

harsh dagger Mar 27, 2025, 1:25 AM

#

frigid thistle I'll dm

I forget to ask u lol 💀

#

Thought u will find it complicated rather than interesting

frigid thistle Mar 27, 2025, 1:29 AM

#

harsh dagger Thought u will find it complicated rather than interesting

oh

harsh dagger Mar 27, 2025, 2:26 AM

#

I forgot my planned algo it'd been so much time 🤧

short sable Mar 27, 2025, 3:01 AM

#

This is the Powerball of all competitions @harsh dagger 😂 🤔

#

Everything else is like playing Pick 3, or, Pick 4

covert sun Mar 27, 2025, 3:13 PM

#

Can someone tell me what this actually is and how the model selection process may work for this competition?

short sable Mar 27, 2025, 6:36 PM

#

covert sun Can someone tell me what this actually is and how the model selection process ma...

It's about Abstracting. You are presented with input girds and must predict their corresonding output grids. In all the examples sets, multiple sets of input-output pairs are presented to you, in which a a consistent rule or set of rules is being applied to transform input grids into output girds . So your challenge is to, given that (heuristic) example set, predict the output that pairs with a novel input. You are given 3-5 examples to abstract a learning rule for a particular (heuristic) set and aly it to an inut whose outut you don't know. Hoefully that helps.

#

Try going to https://arcprize.org/play and solve the puzzles. Basically, you are building an AI/ML system that can do this

ARC Prize

ARC Prize - Play the Game

Easy for humans, hard for AI. Try ARC-AGI.

harsh dagger Mar 28, 2025, 12:58 AM

#

short sable It's about Abstracting. You are presented with input girds and must predict thei...

input grids ~~input gir||d||s~~ btw

harsh dagger Mar 28, 2025, 12:59 AM

#

short sable Try going to https://arcprize.org/play and solve the puzzles. Basically, you are...

But the rule for score seems a bit strict it's not 2 attempts for a task actually since ur score for the task will average of the two attempts

#

It shouldn't ve been max of 2 attempts per task

lunar sky Mar 28, 2025, 3:24 AM

#

harsh dagger But the rule for score seems a bit strict it's not 2 attempts for a task actuall...

it is not average of 2 attempts

#

it is best of 2

#

if any one of the answers are correct you get the score for it

harsh dagger Mar 28, 2025, 3:24 AM

#

lunar sky it is best of 2

How ??

lunar sky Mar 28, 2025, 3:24 AM

#

harsh dagger Mar 28, 2025, 3:26 AM

#

lunar sky if any one of the answers are correct you get the score for it

Oh right i overlooked it 😅

#

Thnx to clarify

onyx prairie Mar 31, 2025, 10:39 AM

#

frigid thistle Anyone want to collaborate?

Sure!

frigid thistle Mar 31, 2025, 1:35 PM

#

onyx prairie Sure!

ok

tame bane Apr 1, 2025, 9:42 AM

#

Hey pals, I'm Anser. About to graduate from NUST in Pakistan. I've been working for the past few days on this challenge. I am really feeling the lack of some team. Yk more minds, more possibilities. If you are good at communicating, hit the DM if you wanna work with me. Let's try to beat it and publish a paper.

brave estuary Apr 1, 2025, 6:35 PM

#

I'm new in kaggle. Does this competition have any restrictions on model size or gpu memory usage?

jovial lily Apr 3, 2025, 2:13 AM

#

The problem I see with this challenge, is that for some harder Puzzles the training data is too low to have only one possible solution for the puzzle (even for humans). E.g. puzzle 1ae2feb7 (first one in public evaluation set v2) by seeing the training data another solution would be that the second (counting from the wall) color on the left starts on the total right of the grid going to the left

#

So more training data or simple unambiguous puzzles are needed.

dusty imp Apr 3, 2025, 2:35 AM

#

jovial lily The problem I see with this challenge, is that for some harder Puzzles the train...

I agree that some puzzles are ambiguous, but it's also about simplicity of the rules you find

jovial lily Apr 3, 2025, 2:40 AM

#

Ok reasonable

onyx prairie Apr 4, 2025, 2:45 PM

#

I see that you have to mix a lot of strategies and come with some novel solution to this. If you check the interview with the creators it is clearly not a challenge to just use some gigantic deep learning model, throw data inside and pray to solve it. There are symbolic reason, heuristic and a lot more involved here.

steep rain Apr 4, 2025, 3:03 PM

#

jovial lily The problem I see with this challenge, is that for some harder Puzzles the train...

That is why you are allowed to submit multiple solutions also

dusty imp Apr 4, 2025, 7:39 PM

#

onyx prairie I see that you have to mix a lot of strategies and come with some novel solution...

Competitors don't always use the techniques the creators initially thought about

#

Last year the winning solution was just an LLM finetuned on ARC grids with test time finetuning, augmentation & funny inference

pale crest Apr 6, 2025, 2:53 AM

#

so i was wondering about the evaluation metrics
the score on leaderboard is the actual score or they would apply metric so the number would transform in some matter ?
like for ex 7.92 becomes like 60 % or what ?

dusty imp Apr 6, 2025, 3:54 AM

#

pale crest so i was wondering about the evaluation metrics the score on leaderboard is the...

the score on the leaderboard should the percentage, I believe

rain kite Apr 9, 2025, 9:53 PM

#

Do we have architecture of last year solution? I already have access to code which is there on Kaggle.

dusty imp Apr 9, 2025, 10:23 PM

#

rain kite Do we have architecture of last year solution? I already have access to code whi...

What do you mean architecture?

#

you mean a higher level description of how their code works?

rain kite Apr 9, 2025, 11:29 PM

#

dusty imp you mean a higher level description of how their code works?

Yes That's correct.

grim parcel Apr 13, 2025, 12:12 PM

#

Last year, there was a stated limit to the size of the grid. I haven't found one this year. Have I just missed it, or it's really not there?

short sable Apr 13, 2025, 3:27 PM

#

grim parcel Last year, there was a stated limit to the size of the grid. I haven't found one...

I think it is the same this year. Grids are at most 30x30 grids, or smaller. Of course some are even, say 3x4, etc.

grim parcel Apr 13, 2025, 8:12 PM

#

Is that specified anywhere?

grim parcel Apr 13, 2025, 10:49 PM

#

grim parcel Is that specified anywhere?

yes, here: https://www.kaggle.com/competitions/arc-prize-2025/data 30x30 is the maximum size for a grid.

ARC Prize 2025

Create an AI capable of novel reasoning

crisp dew Apr 16, 2025, 5:42 PM

#

I have an approach - I have come up with a new "think" technique for a neural network to defeat ARC-AGI-2
So look, AI has 3 stages. Stage 1 of chunking a question or task and does how many "think" there will be, let's say a regular neural network has a module where it thinks according to its own scheme, but stage 1 splits into chunks and thinks about each chunk separately. Stage 2 will combine some chunks together and will use what she has already thought before, then she will think. Stage 3 of combining everything she thought about all the chunks. Let's say there is a question "come up with creativity for a neural network" stage 1 - breaks down words into chunks that mean something, let's say "come up with" - she thinks what does it mean to come up with? Then "creativity" what does creativity mean? then "for a neural network" what does it mean for a neural network? If you can do something, then you need to apply your knowledge in this area. Stage 2 - merges chunks of words that she thinks will be better in meaning and to advance the goal of answering and using knowledge from past reflections. Stage 3 - uses the knowledge of everything she thought before and tries to find the answer. And for the first stage, I think so that she doesn't think for a long time, you can use an approach where she interprets words into the meanings that she understands best, or use a separate neural network for stage 1 so that she quickly types, say, she has 1 billion parameters and will learn the main neural network to give the correct answer to the question. What do you guys think?

harsh dagger Apr 17, 2025, 2:13 AM

#

-# sorry she's silly

barren horizon Apr 18, 2025, 11:10 AM

#

crisp dew I have an approach - I have come up with a new "think" technique for a neural ne...

is this very different from how multi headed attention works?

#

seems similar conceptually

harsh dagger Apr 18, 2025, 11:41 AM

#

Try ViT encoder with usual text based transformer decoder

unborn moat Apr 27, 2025, 6:31 AM

#

guys is there anyway to download the training and evaluation data sets?

#

like arc org has 1520 problems

#

so i was wondering if there is a way to download all of them at once
or does the data in the data section of the competion have al those

short sable Apr 27, 2025, 2:17 PM

#

unborn moat guys is there anyway to download the training and evaluation data sets?

Umm. First you join the competition. Then you can download all the datasets with a simple click on "Data" page. Look, just this time, I'll save you the trouble; here is the dataset (a set of JSON files) for ARC 2024 challenge. Make sure to join though before the deadline for joining, if you want to be entered into this competition. Getting datasets for competitions, in general: this applies for most (all?) competitions, as a rule.

📎 arc-prize-2025.zip

unborn moat Apr 27, 2025, 3:06 PM

#

short sable Umm. First you join the competition. Then you can download all the datasets with...

no like i know that

#

but is that all the data from the website asw

#

or is there an api i have to call or write a script to scape it

short sable Apr 27, 2025, 3:11 PM

#

unborn moat no like i know that

It is a multitude of JSON files that contain the data you have to work with in developing your system. Except for the last which gives you a sample sumbmission file (which also needs to be in JSON format). From the site:

Screenshot_2025-04-27_at_11.09.21_AM.png

#

i don't know what you mean about scraping. You have to read and use these files in your code. This is the data you have to work with. Nothing needs to be scraped off web from any website

unborn moat Apr 27, 2025, 3:14 PM

#

short sable i don't know what you mean about scraping. You have to read and use these files ...

kk thanks

crisp dew Apr 29, 2025, 2:02 PM

#

unborn moat so i was wondering if there is a way to download all of them at once or does the...

on github has some data
https://github.com/arcprize/ARC-AGI-2

GitHub

GitHub - arcprize/ARC-AGI-2

Contribute to arcprize/ARC-AGI-2 development by creating an account on GitHub.

shy orbit May 16, 2025, 10:56 PM

#

Idk if this is the right place to post, but I’m looking for teammates to work on this challenge.

marsh night May 17, 2025, 10:31 AM

#

shy orbit Idk if this is the right place to post, but I’m looking for teammates to work on...

I'm interested

wicked summit May 18, 2025, 3:03 PM

#

shy orbit Idk if this is the right place to post, but I’m looking for teammates to work on...

I would like to join your team if you are still looking

#

Hi everyone,
I'm currently looking to join a team for the ARC Prize competition. I bring strong skills in Python, deep learning, and reasoning-based AI challenges, and I’m eager to actively contribute to both the modeling and experimentation aspects of the project.

I’ve been following ARC-style problems for a while and am particularly excited about the unique reasoning challenges this competition presents. I'm also comfortable with collaboration tools (Git, Notebooks, etc.) and can commit time regularly throughout the competition.

If you're looking for a motivated teammate who’ll pull their weight, I’d love to connect.
You can check out my Kaggle profile here: https://www.kaggle.com/sathishkumartheta

Feel free to DM or reply here!

Sathishkumartheta | Contributor

Kaggle profile for Sathishkumartheta

pseudo hawk May 21, 2025, 5:05 PM

#

Hi, I would like to team up with anyone to work in the arc prize 2025.

I have worked in Finetuning LLM, building agents, Rag systems, Core ML, and RL. Would like to team up with anyone to work in collaboration

pseudo hawk May 21, 2025, 5:06 PM

#

wicked summit Hi everyone, I'm currently looking to join a team for the ARC Prize competition....

Would love to team up if you are still up for it

fallen marsh May 27, 2025, 8:58 PM

#

😉

sly meteor May 31, 2025, 12:29 PM

#

Hi I'm trying to use Qwen3 with vLLM, can anyone share a starter notebook for this?

dusty imp May 31, 2025, 4:02 PM

#

sly meteor Hi I'm trying to use Qwen3 with vLLM, can anyone share a starter notebook for th...

I recommend just taking a starter notebook that uses vLLM and swapping in Qwen3

cinder kite Jun 7, 2025, 7:35 PM

#

Hi all, has anyone tried out the LPN-based approach (search latent program spaces : https://arxiv.org/pdf/2411.08706) on ARC AGI 2 yet? I am looking to make some improvements to this.

short sable Jun 13, 2025, 11:06 AM

#

Warning: Task 'bbb1b8b6' has 7 clues. Expected 2 to 5.

short sable Jun 15, 2025, 2:05 AM

#

What's the deal with the possibility o there being more than 1 solution for some challenges? Anyone understand this. Do we have to present two different solutions. Found in training data that sometimes there was multiple solutions, btw, too.

digital forge Jun 15, 2025, 9:47 AM

#

short sable What's the deal with the possibility o there being more than 1 solution for some...

I think you're misunderstanding the task data format.

#

There is a demonstration set (train set), and an inference set (test set). Each set contains N tasks where each tasks contains both an input and and output

#

the goal is to predict ALL the Inference outputs given the Demonstration Inputs, Demonstration Outputs, and Inference Inputs

#

not even one pixel can be incorrect and the shape cannot also be wrong

#

you also get two attempts per output predicted

#

so if one of the attempts if corrects, it's fully correct

short sable Jun 18, 2025, 8:18 AM

#

JUST A FRIENDLY HEADS UP as related to the Challenge: Some of the Tasks have up to 10 clue/heuristic pairs (Training input and output pairs). Also, some Tasks have multiple Test inputs and Solutions. In these cases, there will always be as many Solutions to figure out, as there will be Test inputs to be processed, using the same heuristic pair sets. And when submitting, for those Tasks, they will need to be submitted, in a list, in order (test[0] -> solution[0]), for that specific Task. That said, GOOD LUCK!

copper pagoda Jun 27, 2025, 4:24 PM

#

Hey everyone. I understand the data format for this challenge, but I have some questions :

If you are training your own model, could you share how you are dealing with padding & grid_size / class imbalance / loss ?

(I'm not necessarily asking about your best approach of course, even just ideas from your baseline would be ok.)

digital forge Jun 29, 2025, 2:59 AM

#

copper pagoda Hey everyone. I understand the data format for this challenge, but I have some q...

There are mainly 3 ways for predicting shape.

Upsampling/Interpolation
Dedicated Prediction Heads (usually CrossEntropy over the possible 900 shapes)
Boundary detection using Softmax on logits who's probability exceed certain threshold.

For padding there are 2 obvious methods.

Shift all colors by 1 and use 0 padding and 0-padding tensors to collate.
Same as above but use negative numbers for padding safety.

Class imbalance can be solved by Focal Loss

short sable Jul 1, 2025, 7:14 PM

#

copper pagoda Hey everyone. I understand the data format for this challenge, but I have some q...

If you want to pad, or more precisely, STANDARDIZE all grids, for every grid create a grid that is 30X30, and all padding (give it value = 10, for instance). Then overwrite that grid's squares with your data grid square values, row by row and column by column, starting from top left corner. So all your grids will be uniform. Relatively simple in code. Hope that makes sense. Precede from there.

sinful egret Jul 16, 2025, 7:55 PM

#

Is this year's challenge harder than the 2024 one?

harsh dagger Jul 17, 2025, 2:56 AM

#

sinful egret Is this year's challenge harder than the 2024 one?

probably

brittle shadow Jul 17, 2025, 4:45 AM

#

Is it at all possible to work on this while not being very proficient in python? What’s the best tool to translate English to python? Or is AI not there yet..

harsh dagger Jul 17, 2025, 6:30 AM

#

brittle shadow Is it at all possible to work on this while not being very proficient in python?...

python is the bare minimum not english

fresh zealot Jul 18, 2025, 8:29 PM

#

Im having issues with fragmented stop word tokenization in transformers library. Using a qwen2.5 model to write code and i want to stop on "```\n" after a python block, so i can inject the code result before continuing generation.

but i cant get it to work. anyone faced the same issue, and resolved it? Thanks.

elder haven Jul 26, 2025, 2:47 PM

#

harsh dagger python is the bare minimum not english

I build just via LLM collaborating letting the LLM produce the code then I just kinda coordinate that. You need to understand ML fundementals but you could just let AI teach you about AI essentially haha--edit not a replacement from coding yourself, but if the learning curve is too steep just have fun! Whatever works! you_got_it_dude

burnt token Jul 27, 2025, 7:01 PM

#

brittle shadow Is it at all possible to work on this while not being very proficient in python?...

Yes

brittle shadow Jul 27, 2025, 7:01 PM

#

burnt token Yes

How?

burnt token Jul 27, 2025, 7:01 PM

#

brittle shadow How?

Google it

brittle shadow Jul 27, 2025, 7:02 PM

#

burnt token Google it

I did already. Is there a specific recommendation

burnt token Jul 27, 2025, 7:03 PM

#

brittle shadow I did already. Is there a specific recommendation

Get a really smart partner

brittle shadow Jul 27, 2025, 7:05 PM

#

burnt token Get a really smart partner

You mean get me a partner that knows Python. Yes, I considered that when I asked. Since I’ve learned absolutely no new information from this back and forth I’ll explore other avenues.

burnt token Jul 27, 2025, 7:05 PM

#

brittle shadow You mean get me a partner that knows Python. Yes, I considered that when I asked...

That was partially a joke

#

Idk

astral vessel Jul 30, 2025, 5:55 PM

#

Can anyone tell what's arc prize ? And what's the problem

celest sparrow Jul 30, 2025, 11:59 PM

#

astral vessel Can anyone tell what's arc prize ? And what's the problem

The ARC Prize is a non-profit dedicated to accelerating the development of Artificial General Intelligence (AGI). We believe that true AGI requires more than just scaling up existing AI models.
please see more detail below.
https://arcprize.org/

astral vessel Aug 1, 2025, 1:22 PM

#

celest sparrow The ARC Prize is a non-profit dedicated to accelerating the development of Artif...

Brother tell me more about the which type of problem is they ask to be solved?

sacred vapor Aug 1, 2025, 2:12 PM

#

Basically pattern recognition but they give too little input data so current transformer models can't solve it

#

U should give the input data a peek u will get a better idea

subtle cloud Aug 5, 2025, 7:33 PM

#

Hey everyone!

Long before the launch of Kaggle’s Game Arena, I developed a real-time AI chess battle platform where you can watch different AI models play against each other.

🔗 Try it here: https://adaptive-ai-chess-game.streamlit.app/

🎮 Modes available:

Human vs Adaptive AI

Adaptive AI vs Hyperbolic AI

It’s a fun and insightful way to observe how different AI strategies behave on the board — ideal for anyone into AI, reinforcement learning, or game theory.

📩 If you're interested in collaborating, expanding, or integrating it into more formal benchmarks, feel free to connect:

👤 Karthikeya Guduru
📧 karthikeyaguduru19@gmail.com
🔗 https://www.linkedin.com/in/karthikeya-guduru-70227b262/

Streamlit

Chess AI Battle

Play chess against an Adaptive AI that learns your style and skill as you play. The AI adjusts it...

tall bramble Aug 9, 2025, 1:52 PM

#

celest sparrow The ARC Prize is a non-profit dedicated to accelerating the development of Artif...

deadline for ARCPRIZE is diffrent here than the one on kaggle?
does anyone know why?

celest sparrow Aug 9, 2025, 2:03 PM

#

tall bramble deadline for ARCPRIZE is diffrent here than the one on kaggle? does anyone know ...

I’ve noticed the deadline for ARCPRIZE here doesn’t match the one on Kaggle either.
It might be due to differences in time zones, updates on one platform that haven't been reflected on the other yet, or simply an error in listing.

tall bramble Aug 10, 2025, 6:27 PM

#

celest sparrow I’ve noticed the deadline for ARCPRIZE here doesn’t match the one on Kaggle eith...

so which one should we go with?

celest sparrow Aug 10, 2025, 9:00 PM

#

tall bramble so which one should we go with?

I’d recommend going with the deadline listed on Kaggle, since that’s the official competition host and the one submissions will be judged by. The ARCPRIZE site might just be showing it in a different time zone or hasn’t updated yet. You can confirm by checking the “Timeline” section on the Kaggle competition page or asking the organizers directly in the Kaggle Q&A forum.

tall bramble Aug 10, 2025, 9:03 PM

#

celest sparrow I’d recommend going with the deadline listed on Kaggle, since that’s the officia...

Thanks, are you also participating?

celest sparrow Aug 10, 2025, 9:07 PM

#

tall bramble Thanks, are you also participating?

not now .

median maple Aug 12, 2025, 1:28 PM

#

Does anybody know the transformation (logic) for task 09629e4f ? - Cannot solve it for the life of me...

ARC Prize

ARC Prize - Play the Game

Easy for humans, hard for AI. Try ARC-AGI.

narrow crow Aug 13, 2025, 11:50 AM

#

pick the cell with 4 colored squares, then enlarge it?

#

that seems to be it, I was able to solve it

harsh dagger Aug 13, 2025, 4:26 PM

#

narrow crow that seems to be it, I was able to solve it

for arc agi 3 rules seems to be changed very much

narrow crow Aug 14, 2025, 8:24 AM

#

be careful this is a scam message

uneven sapphire Aug 16, 2025, 7:07 AM

#

can we write function p as lambda function, p=lambda x:

#

or should it be def p(x)

#

submitting lambda worked in the kaggle, just wanted to clarify here am i not missing some rules

median maple Aug 26, 2025, 2:48 PM

#

⭐ For those who want to solve this challenge with cellular automata

I've made a puzzle game: https://caped.ferenczi.eu/

Enjoy!

livid furnace Sep 28, 2025, 3:53 AM

#

Hi, is there any chance that you can provide an additional Python environment with a higher Python version >= 3.12? The current version is 3.11, which is not compatible with some new models in the HF Transformers libs.

median maple Oct 4, 2025, 9:23 PM

#

That was one of my pain points as well. I just reproduced the same 3.11 environment and used that, but it is far from convenient.

fresh zealot Oct 13, 2025, 5:46 PM

#

that would break a lot of old code, but i agree its time to upgrade to 3.12 or later. The python kernels should be possible to set like any other jupyter ntoebook

#

but its like kaggle is no longer under development, so many old bugs from 10 years ago still looming around

harsh dagger Oct 17, 2025, 3:58 PM

#

Anyone trying TRM?

south plover Oct 19, 2025, 5:53 AM

#

Is this true ??

fringe beacon Oct 20, 2025, 1:33 PM

#

harsh dagger Anyone trying TRM?

I have successfully replicated trm and am now at war with the kaggle ui to aubmit

harsh dagger Oct 21, 2025, 11:13 AM

#

fringe beacon I have successfully replicated trm and am now at war with the kaggle ui to aubmi...

were u able to solve first few arc agi puzzles

fringe beacon Oct 21, 2025, 11:18 AM

#

Like, me personally? or the model?

dusty imp Oct 22, 2025, 5:32 AM

#

fringe beacon I have successfully replicated trm and am now at war with the kaggle ui to aubmi...

did you do the whole $500 training run or a smaller version?

harsh dagger Oct 22, 2025, 12:00 PM

#

fringe beacon Like, me personally? or the model?

i meant the preview puzzles

harsh dagger Oct 23, 2025, 6:59 AM

#

this comp we cant use arc agi agents there?

digital forge Oct 23, 2025, 7:18 AM

#

harsh dagger this comp we cant use arc agi agents there?

no, it's ARC-AGI 2 compeition

harsh dagger Oct 23, 2025, 7:21 AM

#

digital forge no, it's ARC-AGI 2 compeition

the kaggle one ?

digital forge Oct 23, 2025, 7:21 AM

#

yes

harsh dagger Oct 23, 2025, 7:21 AM

#

oh right thnx n its still going

digital forge Oct 23, 2025, 7:21 AM

#

13 or so days left

agile geyser Oct 24, 2025, 4:13 PM

#

Hello - I have a question regarding the Kaggle challenge. Our team has developed a system and evaluated it using the arc-agi_evaluation-challenges.json and evaluated that we conform to the submission format by validating it against the arc-agi_evaluation-solutions.json.
More specifically we test this using the pass@2 metric:

# Calculating the pass@2 metric
total = len(submissions)
correct = 0

for challenge_name in solutions:
    test_solutions = [
        np.array(test_solution)
        for test_solution in solutions[challenge_name]
    ]

    submission_solutions = [{
        "attempt_1": np.array(test_submission["attempt_1"]),
        "attempt_2": np.array(test_submission["attempt_2"]),
    } for test_submission in submissions[challenge_name]]

    if len(test_solutions) != len(submission_solutions):
        continue
    else:
        correct_tests = 0
        for test_solution, submission_solution in zip(test_solutions, submission_solutions):
            if np.array_equal(test_solution, submission_solution["attempt_1"]):
                correct_tests += 1
            elif np.array_equal(test_solution, submission_solution["attempt_2"]):
                correct_tests += 1

        if correct_tests == len(test_solutions):
            correct += 1

print(f"pass@2: {correct}/{total}")

We know when our model is "correct" (we can verify that it creates correct solutions) and when we submitted to Kaggle we indeed solved some of the puzzles. We investigated the solutions and are confident they are correct. However, the Kaggle leaderboard has rated us at 0.0.
From this we conclude that something must be wrong in our submission format.
Is there any more information on how the submission.json is used?

#

Specifically, for instance, what is the order of numpy lists (row from 0 to n). Can we assume numpy.tolist() produces a correct output?

agile geyser Oct 24, 2025, 4:37 PM

#

Maybe for starters and clarification, is the challenge actually read from:
/kaggle/input/arc-prize-2025/arc-agi_test_challenges.json
Sorry if this question is a bit trivial 😄 but I'm really confused

#

Uff - is it possible that if you only solve tasks in the 50% that Kaggle doesn't choose for the leaderboard, you could get 0%, even though you actually solve tasks?

golden vortex Oct 24, 2025, 5:42 PM

#

/claim 316

digital forge Oct 24, 2025, 6:13 PM

#

agile geyser Specifically, for instance, what is the order of numpy lists (row from 0 to n). ...

no it has to be a grid, not a flat list of ints

agile geyser Oct 24, 2025, 6:14 PM

#

numpy.tolist() produces a nested list of lists and we also load it via np.array(input_mat), e.g:

[[2, 0, 2], [0, 0, 0], [0, 0, 0]]

#

This is a 3x3 matrix from row 0 (TOP) to bottom, correct?

digital forge Oct 24, 2025, 6:14 PM

#

agile geyser Maybe for starters and clarification, is the challenge actually read from: `/kag...

yes that file is swapped during private evaluation

digital forge Oct 24, 2025, 6:15 PM

#

agile geyser `numpy.tolist()` produces a nested list of lists and we also load it via `np.arr...

ah in that case yes that's fine

#

so what exactly is the issue? most likely the solution is just wrong even by 1 pixel wrong is wrong

#

what is your evaluation perfect task accuracy?

agile geyser Oct 24, 2025, 6:17 PM

#

16/120 ~ 13%

#

The test challenge evaluates 240 tasks - if I understand correctly 120 of those are selected for the public leaderboard. Is it that we could be extremely unlucky and only solve "private" puzzles?

#

Are the puzzles used for the public leaderboard randomised? If so, I guess submitting again tomorrow should show if that is the case

digital forge Oct 24, 2025, 6:19 PM

#

so 16/120 means it's not a very good chance of score unfortunately..but it's pretty good ngl on eval if you're getting 16/120

agile geyser Oct 24, 2025, 6:20 PM

#

well, good or not, we want to figure out why the leaderboard shows 0 😄

digital forge Oct 24, 2025, 6:22 PM

#

i suggest keep improving to 30 or above and if it stil shows no score then it's an issue

#

16 can be explained by variance

agile geyser Oct 24, 2025, 6:22 PM

#

variance in which way?

digital forge Oct 24, 2025, 6:23 PM

#

the variance of the distribution of the eval sets, according to the hosts both sets have the same distribution of difficulty

#

"roughly"

#

so if you get good score on eval set, you "should" get similar score on leaderboard

agile geyser Oct 24, 2025, 6:25 PM

#

I'm a bit confused. Are the outputs generated from notebooks submitted to the challenge not the ones with the actual test input? Because we know when a solution is correct

#

That's the whole point of the model we're writing, explainable AI 😄

#

The score might be low but we're fairly certain that the tasks that are logged as "solved" in submitted notebooks are in fact solved

digital forge Oct 24, 2025, 6:27 PM

#

yeah they all have test input

#

The score might be low but we're fairly certain that the tasks that are logged as "solved" in submitted notebooks are in fact solved
This is where i'm confused, what do you mean soived? where do you see this on kaggle?

#

the train and eval sets have "solutions"

#

but are you talkin specifically about your specific submission run? or just the kaggle arc data itself?

agile geyser Oct 24, 2025, 6:29 PM

#

I would guess my specific run

#

In the output section of your notebook, you receive a log. We have a per-task log. Our model informs us in this log if a task can be solved, as in, if a consistent algorithm can be found that explains everything (all test examples and train examples)

#

We solve a certain amount of tasks in the submission - and receive a 0 on the leaderboard

#

So we're a bit puzzled as to what exactly the issue is

#

It could of course be that this log is not correct, that it was generated on a public dataset instead of the private one, for instance

digital forge Oct 24, 2025, 6:31 PM

#

ahh that's misleading because the notebook is run twice, once on dummy data which is just the training data copied into the test data, and then it's run

then a second time for private run where the test data is swapped with private hidden test data and it's not the copied training data but actual evaluataion-like data

#

you can see the logs for the first run, but not the second

agile geyser Oct 24, 2025, 6:32 PM

#

AHA

digital forge Oct 24, 2025, 6:32 PM

#

the second run is also not accessible via internet to prevent leaks via probing 🙂

agile geyser Oct 24, 2025, 6:33 PM

#

That makes a lot more sense - so indeed the logs we were reading are not the actual submission logs, and debugging them in any way doesn't make sense?

#

In this case I would assume our algorithms haven't found solutions - good to know as well

digital forge Oct 24, 2025, 6:34 PM

#

yups

agile geyser Oct 24, 2025, 6:37 PM

#

Thanks for your clarifications @digital forge

digital forge Oct 24, 2025, 6:37 PM

#

np 🙂

harsh dagger Oct 25, 2025, 12:22 PM

#

anyone tried using TRM?

#

https://www.kaggle.com/code/seconds0/trm-arc-agi-2-inference-py311-offline/notebook why this one got 0 score

fringe beacon Oct 28, 2025, 4:46 PM

#

Because i am fucking up the eval pipeline somehow

#

and trying to diagnose

somber elm Oct 31, 2025, 3:08 AM

#

ARC Prize 2025: Submission Scoring Error Help Needed

Problem: Getting "Submission Scoring Error" despite passing all local validation checks.

Verified Requirements (from Kaggle discussion):
✅ Each test input → test output: 240 tasks, 259 entries (223×1 + 15×2 + 2×3)
✅ Two attempts per output: All entries have attempt_1 and attempt_2
✅ Output length = Input length: Matches exactly
✅ Grid format: Rectangular, integers 0-9, matching shapes

My Format:

{"00000000": [{"attempt_1": [[0,0,0],[0,0,0],[0,0,0]], "attempt_2": [[0,0,0],[0,0,0],[0,0,0]]}]}

Matches sample_submission.json structure exactly (except values).

What I've Tried:

Validated against sample format ✅
Verified 240 tasks, 259 entries match test inputs ✅
Compact JSON: separators=(',', ':') ✅
Saved to /kaggle/working/submission.json ✅
All grids: rectangular, 0-9 integers, matching shapes ✅

Notebook Setup:
Script loads from /kaggle/input/arc-submissions-2025/submission.json, validates, formats (compact JSON, sorted keys), saves to /kaggle/working/submission.json. Includes auto-fixing for shape mismatches, invalid values, size limits.

Questions:

Subtle format differences causing rejection?
File saving issues (encoding/line endings)?
Wrong path/filename?
Hidden characters/whitespace problems?
JSON format requirements (compact vs pretty)?
Multiple test inputs handling issue?

Context: ARC Prize 2025, Notebook v7, local validation passes. Model achieved 33.98% on EVALUATION set (validation, 120 tasks). Submission uses TEST set (240 tasks) - actual test accuracy unknown until Kaggle scores it.

Any guidance is appreciated, as I have been trying to for almost a week get this submitted.

lavish river Oct 31, 2025, 8:24 PM

#

"Script loads from /kaggle/input/arc-submissions-2025/submission.json, validates, formats (compact JSON, sorted keys), saves to /kaggle/working/submission.json. "

Don't do that, just rewrite attempt_1 and attempt_2 with [[0, 0], [0, 0]] for each tasks/tests (Or Scoring will fail)

somber elm Nov 3, 2025, 3:41 AM

#

Hey, would we be able to do one last submission tomorrow? When making the submission, I was able to get the scoring to work, but it was replacing my predictions. They were being reformatted into similar dimensions to the inputs. I think I got a working submissions file now, but have to wait until tomorrow to resubmit. I was able to RL train my model and reached 35% accuracy and when reloading for evaluating I got around 30%-32%. was hoping to be able to submit one last one. If not oh wells.

agile geyser Nov 3, 2025, 8:33 PM

#

Is using [[0, 0], [0, 0]] really a requirement? ... uff

glass pier Nov 7, 2025, 8:45 AM

#

when can i see the result at the private LB?

glossy pond Nov 7, 2025, 4:38 PM

#

glass pier when can i see the result at the private LB?

Dec 5.
https://www.kaggle.com/competitions/arc-prize-2025/discussion/612886

minor oriole Nov 8, 2025, 1:34 PM

#

Hi

obtuse pumice Nov 9, 2025, 12:42 PM

#

https://media.discordapp.net/attachments/1436719817624256534/1436719913518633010/1.JPG?ex=6910a130&is=690f4fb0&hm=6a48397700e40b701b7defba0bc73ccc590e83e58af09eb7035cae318e9fb319&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914034659408/2.jpg?ex=6910a130&is=690f4fb0&hm=5d3c01e3db0b2fe7135969c69c22cbf49db07bae5ed8cb9a98ac3e18d3c73ce5&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914512547951/3.jpg?ex=6910a130&is=690f4fb0&hm=59a326eaa4d74733a406431b5c2eb8ee07f6b78d95094102deb1153d2e261407&=&format=webp&width=515&height=687

raven mural Nov 10, 2025, 7:48 PM

#

Hi Kaggle team,

I wanted to report that I was banned from the ARC Prize Discord after sharing part of my research idea.
I was accused of using AI-generated text, which isn’t true. I write in Portuguese and translate my thoughts into English myself.
I believe this may have been a misunderstanding, but I’d like to clarify that my contributions were original and based on my own research. (Books from 2011, published in 2019)

I’m continuing to develop my work independently, and I hope Kaggle remains a space that values openness to new approaches and fair discussion.

#

And I'm not frustrated. I'm simply questioning something deeper:
how can a group aiming to build general AI be so resistant to the very type of reasoning that could lead to it?

It seems paradoxical that a community dedicated to the advancement of intelligence rejects approaches that challenge its own premises.
Openness to epistemic diversity should be part of the path towards true general artificial intelligence.

molten sequoia Nov 12, 2025, 1:15 PM

#

raven mural Hi Kaggle team, I wanted to report that I was banned from the ARC Prize Discord...

Which wallet do you have

amber heron Nov 15, 2025, 11:03 AM

#

https://www.linkedin.com/posts/abhishek-kumar-3729582b2_ai-aiagents-googlecloud-ugcPost-7395371139272724480-Bjjf?utm_source=share&utm_medium=member_desktop&rcm=ACoAAEs_WQYB5okznOS3-3-yFoKYQqJcB0HYSU4

dusty imp Nov 15, 2025, 11:22 AM

#

raven mural And I'm not frustrated. I'm simply questioning something deeper: how can a group...

If you make a proper submission on kaggle and get a score, that will always be respected

#arc-prize-2025

ARC Prize 2025: Submission Scoring Error Help Needed