#arc-prize-2026-arc-agi-3
1 messages · Page 1 of 1 (latest)
I want to team up for this competition
how come arc-agi-2 doesn't get a channel or whatever its called?
I havent seen a massive number of messages in any of these channels about the competitions, so you can write your messages here. It probably doesnt matter if its Arc 2 or Arc 3
Ok. Strange in thought this ARC stuff was supposed to be a big deal. I guess people are trying to figure out the rules? I tried the interactive app and I couldn’t get past level 2.
anyone want to team up? I understand how most of the public games work, but need help getting the notebook to work
This is an intense challenge. Sensing and Exploration seem to be two high level modules to be developed, to begin to tackle the problem.
The main insight I've had so far is that GPT-5.4 is actually quite dumb. The games really reveal the idiot in it. It fumbles so hard on the most obvious games.
looking for team mates for any or all tracks. final decision will be decided after discuss. only interested person can contact.
Anyone noticing that the current kaggle frontrunners are copies of chronos (is that allowed), or are just using BFS, or MCTS / A* Search?
sure it is allowed
Copying the top public notebook is standard practise
and making slight adaptations to overfit the leaderboard
@gentle fern interesting, but if that version wins, and there is a monetary prize, the original creator of the solution gets nothing?
It will not win, but yes, if you published something then it's fair game to submit it and you have no claim to it anymore. That's why people generally don't put anything competitive out
Yes.
A while back someone placed in top 5 for a competition with an unmodified version of a public notebook (though they themselves declined prize money iirc)
Wow, that kind of disincentivizes people from participating then, maybe if they updated their terms to cover that cases, people would be more open to open sourcing their solutions
Has anyone tried vibe coding a solution? If so what were your thoughts on how it scored
@earnest ingot yes i will be pseudo vibe coding mostly will let you know in a few weeks. first iteration got 0% (but it was just qwen out of the box).
looking for team mates for ARC Prize any or all tracks. final decision will be decided after discuss. only interested person can contact.
Currently 8th on the leaderboard with a score of 0.42, without any pretraining weights. Anyone have any tips for pre training CNN?
i just enrolled in this competition what i have to do ?
make model that can solve games
ok
Has anyone found any game mechanics that are partially observable or stochasic? From what I've seen, the public games seem to be fully observable and deterministic.
Ah, g50t is partially observable
If you find a stochastic one, please post
Yes. Heat fields df01 is partially observable in the sense that to win, your agent must arrive at the goal within the correct temperature range, where temperature is a hidden field.
Also I've not seen an answer to this anywhere, is the public score out of 100 or 1? As my code scores around 40% on the public games (average over 50 games) but scores around 0.4 or worse on the private submission.
I can't find a game called df01?
I'd assume it's out of 100. The ARC-AGI-2 leaderboard has lots of scores above 30.
If your submission is like many of the others, and attempts to instantiate the game class directly, I suspect that 'hack' doesn't actually work at all in the submission and it's just using the fallback.
It's part of the arc agi public dataset of over 250 games it's shared on kaggle
Yes I realised that a core part of my previous approach used getattr of information from the game itself. Reverted to my initial approach and managed to improve it again so will see how it scores in about 8 hours
Link? All I see is environment_files in the Data tab, the same 25 games on the website.
I also have never seen these 250 games
Maybe he's talking about ARC-AGI-2?
It's called arc agi 3 interactive testbed 200+ games by theredbluepill
Ah I see, not an official game then. Thanks, that actually looks pretty useful.
The official games seem much more difficult, just checked my agent against them.
Yes, a local LLM is a pre-trained model. There's no rule against language models that I'm aware of.
As long as it is small enough for the GPU I guess
When will h100 accelerators will be added to competition?
shared my notebook publicly, score of 0.42 - 2nd highest score notebook that is publicly available, feel free! https://www.kaggle.com/code/ashvinsingh/ash-s-arc-agi-3-agent
it is a modified version of CHRONOS's FORGE BFS and CNN agent
also here is a notebook I made that tests your agent against the 25 official ARC AGI 3 games, as well as against over 250 community made games, with boolean toggles (True, False), that you can easily swap to use swarm mode which tests your agent against many games at once, or sequential mode which tests your agent against each game at a time. Also change N_GAMES value to change the number of games your agent is tested against. The official 25 games cell also scores your agent using the arc agi scorecards. https://www.kaggle.com/code/ashvinsingh/arc-agi-3-interactive-testbed-200-games
Guys... if you're doing RL and your agent's performance increases at first but then goes to 0, it might be because you're not resetting the envs properly between rollouts. The reset() method only resets the current level, unless you call it twice. 🤦♂️
My agent solving ka59: https://arcprize.org/replay/e4e6694c-7b6f-4dd9-9208-cde9f3c0c90b
Impressive. Was it pretrained on the game before this run?
r11l: https://arcprize.org/replay/da3ed5c1-f02d-4eb3-a1f0-3fa2e61b8f0a
This was hard
I try to make it general, but it's inevitably overfit to some extent. I didn't show it how to play the game, though. ka59 and r11l are different agents, too. This task is very difficult
Can you test it against SK48? That's what I'm working on at the moment.
If you're training on the games, you're not following the rules correctly as it does not generalize.
Are you guys hard capping your actions per level to protect efficiency score or are you letting your agent run till it hits a logical dead end? Curious to see if anyone is capping that
Due to the way the score works, there is no benefit to capping your actions per level.
This is due to the fact that the score is calculated per level and the action-count only affects the score for that level.
If your agent doesn't complete a level, the action-count for that level doesn't matter at all for its overall score.
How do I see the input in the notebooks e.g. for the forge notebooks. I see it references some pretrained_weights.pt there for the CNN fallback but I don't find them anywhere in the Input? Do I have to check a different notebook
Have you tried iterations in the chronos version?? Like integrating multimodal capabilities using Agentic swarm?? Would love to know your thoughts 💭
is there anyway to set an agent to run locally on your machine?
i feel like there’s a distinct lack of documentation on the arcagi3 api
the documentation suggests using environment when trying to run locally but calls it an agent..very confusing
Anyone struggling in Consistency and want to learn together.
DM me.
Hi, I'm keep getting Kaggle error after submit prediction, does anybody encountered this problem today?
Do we only submit just one output per day?
I'm new, trying to figure out the details to submit a prediction. It looks like we can submit up to 2 per day. But I have not yet succeeded in submitting any. I'm working through the requirement to submit a notebook with internet access off. Pip imports and many other things don't work when the notebook is set to offline mode.
Anyone knows how to deal with getting Kaggle Error on all my submission attempts? I can't get access to any meaningful logs to know what's causing the issue
I too get only "Kaggle Error" on 5 submissions. My only 5.
🚀 Hey @everyone!
We’re building PromptGram and aiming to reach 150 GitHub stars ⭐
If you like AI, FastAPI, microservices, or developer tools, please check it out and support the project by starring the repo 🙌
GitHub Repo: https://github.com/dewangsahuji/promptgram
Every star really helps and motivates us to improve the project further 💙
@proud moon I like your project, but why do you post this in the arc-prize chat
he's a spammer
if people like a project, they'll star it, you don't have to spam it everywhere for some validation ;-;
Anyone who knows, please tell me too! My submissions return "Kaggle Error" and the versioned notebook shows no errors. How do I find the source of the Kaggle Error?