#orbit-wars
1 messages · Page 1 of 1 (latest)
Hi! Can I we plz have code of the game?
It is really useful to optimize agents
https://github.com/Kaggle/kaggle-environments/tree/master/kaggle_environments/envs/orbit_wars
Happy to say, I have an answer for my own question
Hello,
I'm trying to use java (like in lux competitions). But my submissions fail.
I've created a discussion https://www.kaggle.com/competitions/orbit-wars/discussion/692898
If someone has used java or another language (like in lux) sucessfully, it would help a lot.
Thanks.
Hi all
Has anybody summitted any agents yet ?
Yes
What I'm quite worried about is that my agents are not performing up to standard
Like for the same agent I submit now I got 200 points less than the agent I submitted previously
I got my first agent submitted 🙂 I'm assuming they make it play itself as a validation ?
Yes
Doesn't the score depend on the field strength? So it would be expected for the same agent to show lower score as others improve
That's true but I'm really running out of ideas on how to improve my score 😞😞
Hello I’m seeking teammates dm me if you’re interested
Hello all!
I'm wondering if someone can answer a question for me. How might you submit a bot that uses external libraries, e.g. numpy for this competition?
I woul just test it Im sure numpyt would be available in the enviornment
Makes sense! I just realized the kaggle_environments install includes a ton of stuff, numpy included.
hello. im happy to be the part of this competition. i have a question. what version of orbit wars that used in orbit_wars competition right now?
Sorry what do u mean?
The rankings will keep on changing based on the agents launched by the participants
sory ,i mean orbit wars version. becuase now im using orbit wars 1.09
I don't think there are versions??
Yes, the leaderboard is running version 1.0.9. There should be an update with fixes for 4p incoming on Monday
Has there been more 4P games recently?
owh, im lucky. here i use pydroid to run the simulation
I've tried A2C before, didn't fully know what I was doing though
Now I'm experimenting with evolutionary algorithms because they seem more intuitive to implement (at least for me). It's really slow, but I got a score of 700~800 with 500 generations and just self play
is the agent still a policy network, or is it a hardcoded python agent?
im trying to train an agent with ppo, but its really slow to converge
It's a two-layer network implemented with numpy (because GPU acceleration will not help, as the main bottleneck is the environment, as far as I know)
What helped me was better selecting the features for the model (fewer and informative features make the model output faster, avoiding timeouts and speeding the training)
So my code looks like this:
- Make a list of N discrete possible fractions of ships to send
- Make K tuples of possible combinations of (source, destination, number of ships) as possible actions
- Calculate features based on each action and the current state
- NN should return probabilities of all actions given the current state
- Take the most probable actions
- For each action, calculate the angle between source and destiny numerically (I don't need the network to learn this parameter as it would overcomplicate the approach)
i can understand using evolutinoary algos for architecture discovery, but you still have to train the net, right? like are you using the algo to optimize the net?
There is no optimizer and it's gradient free
Play agent vs agent → calculate the fitness of each agent → cross over best agents (combine vectors) → mutate (add noise to the weight vectors) → repeat
sounds strange to me, but if it works time will show
Yeah, I don't think it will win, but it is fun
And it's generating good results so far
cool cool
i think RL is the way to go though. just gotta tune the hyper params, make the env easier to learn etc. then just let ppo converge at optimal policy
Yes, probably. Good luck!!
Good luck to you too!
looks like lots of folks are rewriting the env in jax or cuda (could help the env not be the bottleneck)
what kind of reward shaping are people using here?
Main reward is the win/lose one,and i adopted some aux rewards like ship difference, fleet hit reward.(But i scaled it in order to make 0-sum) As far as i guess, aux reward will not be important after the model become strong enough. until then, aux rewards would help to improve faster, but after that, it would prevent learning strong strategy
I will certainly try, thanks for the suggestion
I figure reward shaping is essential for early learning, it shouldn't have to prevent learning a strong policy, just put more emphasis on terminal rewards
how do i start implementing RL? im being stopped hard at ~1050 right now using rule based
Thanks for the tip, im getting 6-8x speedup with a rust implementation of the env. This will really make RL feasible.
Hello guys
Hello,
Have you seen any resources for building RL
use a framework like stable-baselines3 or similar, lots of resources in the docs also
the youtube series Foundations of Deep RL with among Pieter Abbeel etc is a good place to learn the basics
hot take: I think it's imprudent to attempt ML/RL without having a heuristics bot in the top 100
I think it's imprudent to even make a heuristic bot for a kaggle challenge.
i havent 😭 ive paused for now since finals are coming up but i will resume once im done
i kinda agree since having a heuristics bot in the top 100 give you a pretty good understanding of how the game works and how to win in it
and that could translate over to ML/RL since you already know what to do, training it will be a little easier (though i cant say much about it since i havent tried it yet)
Okay
There really isnt much to understand about how the game works. I'm here to learn ML, AI and datascience, not python programming. I guess everyone has different priorities...
I think a prerequisite to building strong ML models is developing familiarity with the domain. I think writing heuristic bots at the beginning is the fastest way to attain that. I'll probs explore RL a bit later. Best of luck!
I get what you are saying, but having a bot at top 100, heuristic or not, is more than just exploratory. You could be sinking too much time into it, that could otherwise go to train an RL bot. Best of luck to you as well!
After a week of tuning and testing I finally have an RL bot that is actually learning. Still will take days to train a competent bot with my setup
Learning better abstractions / transformations of the game that let you build better feature + action spaces seem critical. Can make the learning task half as hard if you have a better feel for the game
how do you handle the action space?
multi discreet masked action space
so each edge, 3 one-hots for 0/needed+1/all? and do you combine that with any search or just single inference -> play move
launch, edge, ship fraction bin
i search for the best angle, too hard to learn the angle itself
my feature extractor builds candidate edges that policy chooses from, togheter will a launch flag and a ship fraction from a bin. this way i can use MaskedPPO from sb3. then i search the angle and convert to native action
what is the strongest open source heuristic bot right now? want to do some benchmarking
probably orbit wars or something similar - you are not doing some form of self play and heuristic in compedium? i keep hitting a plateau and dont get anywhere beyond 10% v my heuristic
Are you guys going for a 100% win rate against random bots before submitting for live gameplay, or just winging it?
You get 5 submissions per day, so depends on how miuch you are planning to work on it in a specific day. The leaderboard is a very good feedback mechanism, watching your games there can often show you what to work on next.
Good point!
Im doing a curriculum. Just going hardest from the beginning is not optimal for speed. Self play kicks in at later stages
@pulsar barn how are you building your curriculum i have been trying with different ways - i wanted to try and do steps 0-50, 0-100, 0-200, 0-500 because its so sparse and i was hoping to get to like 70% v a good heuristic in each step before progressing but no chance 🙁
-
so i used ppo initially then added awr then curriculum - but i am just not getting above 65% or so on the self play - using the baseline and a lagged player he should beat and the heuristic not more than say 15% for the first set of steps
-
i am using a delta of fleet + prod * potential prod of mine - the opponent :- really not getting far
-
am more than open to ideas as i am a little stuck for how i can improve - there isnt really enough data i think hence my idea with self play as much as possible
First i benchmarked all the heuristic agents i have gathered, then i introduce them by hardness slowly throughout the learning run. i also increase four player probability over time. i dont go to new stages before i reach a certain win rate
i also do pretraining with imitation learning to bootstrap early learning, saves me many steps
hope by next week i have a competent agent. self play is introduced much later because it kills my throughput significantly
ppo is hard to do because it only works within a sane range. i therefore made a pid controller to adjust learning rate to keep approximate kl divergence within a range
yeah i am watching the entropy and kl - any luck so far - i mean i am assuming the reward function is gonna be the killer, i assume you are using some sort of dense function to begin with, i tried vs random and a few others and it kinda learns to optimise fleet and planets but against the heuristics they are just too good - you are probably right that i will need to introduce a load of not so good agents for it to learn to beat progressively
i use a dense reward yes. score is dense reward signal so you dont have to only do win/lose. but i cut back shaping trhoughout the curriculum
ok - sounds interesting 🙂 i may give your pid controller and the weaker to stronger opponents ago from my baseline - do you have any other tips? 🙂
i think the main thing is the curriculum. because of the sample inefficiency of rl, trying to help the model as much as possible to have a smooth learning is crucial, unless you have unlimited compute.
also, rewrite the env in a high performance language like rust. it will really speed up throughput
it has dramatically improved! i have passed through plenty of gates already now onto the 200 steps against where i need to be beating my better heuristics! havent tried it on 4 player yet tho...but i have 2 very weak and 2 weak and they are being destroyed!
nice man!
thanks for the input!
--- Eval checkpoint (ep 500, horizon=200) ---
[tier 1] vs bully: 4/4
[tier 1] vs prospector: 1/4
[tier 2] vs rage: 4/4
[tier 2] vs dual: 3/4
[tier 3] vs baseline: 0/4
[tier 4] vs shunlite: 0/4
[tier 4] vs v131_2p: 0/4
[tier 5] vs v131_denial: 0/4
[tier 5] vs v131_wave: 0/4
Eval total: 12/36 (33.3%) improving 🙂
may i ask what framework for rl you are using? or is it a custom implementation?
custom
@thorny mason you can find some more opponents here: https://github.com/automatylicza/orbit-wars-lab/tree/main/agents/external
not my github, but you find it in the kaglle discussion board
cool
this is where i got some from the old planet wars winner https://github.com/melisgl/planet-wars/tree/master/example_bots
they are in java and not exactly orbit wars but llm of choice can port them 🙂
he also has a great write up of his tactics!
i am getting about a game per second so currently 200 steps
and i am running a 2p and 4p training harness at the same time
4p sucks tho looks like it is learning nothing
i sample 2p or 4p games at an increasing rate through the curriculum, so during a rollout i will get some 2p and some 4p based on a percentage
ah with just one game so 1 model that will play both? from my heuristic i have also split the logic because i find the tactics are so different - i figured also for the rl
yes, my model plays both types of games. starting really low with only 5% 4p games, up to 50% which i believe is the true objective with reagards to 2p vs 4p
interesting...
it is in lisp, not java.
the agents in java i believe
there is some lisp - but same same for a llm 🙂
his write up is gold tho
@everyone so how do you train models to play this game
Like my basic strategy is make a function which calculates a distance from my planet to a target, and based on the number of turns, targets either rotational or static planets, high production, only settling for low production if all has been taken
And if I send ships, I send no more than 20 percent of the planets total quota
does production change over time, for planets?
i train it with ppo implementation from stable-baselines3
i think it is constant
Yes constant
But where'd you get the data though
Like first you train the thing and then what? Like I coded a function the environment runs but i dont know what next
PPO is reinforcement learning
you get the data from training...
every rollout generates data
the bot plays against itself
guys is there any really good bots i can download to use as opponents for tests?
enders fleet is pretty good, also marco dg v3.3
those are probably the best bots in the forums right now
Hi,
I wanted to ask something.
Each fleet is represented as [id, owner, x, y, angle, from_planet_id, ships].
If the owner here is enemy, is the angle leaked, or is it a placeholder only?
It's not leaked, rather deliberately present to help us calculate the position of ship and where it's headed to
Further, It can help you find out the nearby enemy planets if you verify the coordinates and trajectory (mostly ships go directly towards the enemy)
But if that's provided for the enemy fleets, then shouldn't it be called leaked? Because I dont feel there's game anymore. Everyhting is visible.
Not leaked, think of it as modern day warfare.. Country A and B both have high tech machinery and as soon as Country B sends a missile towards A, the defence system of A detects the angle, coordinates, point of origin and what not to narrow down the threat
It's not leaked, but calculated by the defence (radar detector)
Your focus should be on to narrowing down the threat, the real game tbh.. threat detection is a work for radars now
Thanks for that POV.
Yup 🐣, welcome :)
Would you call Chess a game? It's perfect information too...
So when we figure out where to send ships, dont we multiply the angular velocity of the target planet with the speed of my planet ships in that pair or s o mething?? Then we use the radius and distance from sun and we do some polar type crap to find the new cartesian coordinates and then we compute the distance??
Umm, I don't quite understand what you're trying to convey or the formula you're describing
From my knowledge of kinematics and circular motion, I don't remember any formula where we multiply "angular velocity of thing with speed of any thing" to get a proper pair
I don't know about this formula, but upon doing dimensional analysis
Angular velocity(radian/second) * speed(metre/second) will give (m/s²) which is acceleration (i guess centripetal one) and upon integrating acceleration twice with respect to time we can get to position. But still I'm very unsure about your formula, can you please elaborate it or was it just an assumption?
And about finding new Cartesian co-ord, are you referring to (x = Rcos(theta' + omega.t), y = Rsin(theta' + omega.t))?
But anyways , here's my mathematical intuition:
First we determine the start(q) and destination(d) of ships:
Assume ship starts at: (x', y') and target is at (x'', y')
For any circular motion(here planet) we can say
x = x(original) + vt
or, in terms of circular motion:
Theta(final) = Theta(original) + omega*time
Here, theta = angular position & omega = angular velocity
And then the Cartesian co-ord would be:
x'' = R.cos(theta(original) + omega*t) [horizontal component]
y'' = R.sin(theta(original) + omega*t) [vertical component]```
And we know that `distance (d) = speed.time`
```distance with target t ≤ d```
i.e.
```underRoot[(x'' - x')² + (y'' - y')²] ≤ s.t```
This should give a minimum `t` i guess.
**Note that I still haven't properly tested this out, on large scale data, but it did work for a few tests. I might not be correct 100%)**
New angle = initial angle + angular velocity × ship speed
Then x = r cos new angle
Oh that's the same formula i mentioned, just in terms of theta(angle) and omega(veloctiy)
Y = r sin angle or something
I get it, we have the same explanation
But i dont know... there was this demo with this generic function returning moves where you just input into the environment variable
What is with these people training models and how do I implement it? 🤔
This one (orbit wars) is an ongoing competition, all models available for training are demos and mostly we're expected to do something better to get better results and win
I don't think we can use someone else's model for this 🤔
So you make ur own model woth pytorch or something?
It's like, they're giving us a bunch of data (a dataset basically) and values as mentioned in Komil's message #orbit-wars message
And then we're expected to do the math and build code to reach the highest score
Every win increases the score, and every defeat decreases it...
And yes we're training a model but not with PyTorch, we're making it accurate with maths
Check more here: https://www.kaggle.com/competitions/orbit-wars/overview
Ugh, I was finally able to attain a good score.
But how do I coordinate simultaneous attacks, anyone got any suggestions ;-;?
Could I take a look at ur notebook?
Group instances of a compute moves function or so.ething and have them return moves
Like with parallel processing or async?
I'll share it soon, lemme just edit some more stuff
Parallel processing might slow it down I doubt, I found something else, I'll have multiple planets attack one target that is closest then decide if they should attack same target or upcoming threats can be handled easily and individually
Basically pseudo group work
If you were to build a reinforcement learning model for this competition how would you do it pls
Cause it seems like some pple are training RL models on this task
id start by writing the a gym wrapper for the kaggle env, then pick a rl method from a framework and run it
Ok will definitely try this
👍
Thanks man
Anyone struggling in Consistency and want to learn together.
DM me.
nah there's like a month left
also, I remember seeing that someone made a byte-accurate faster simulator for the kaggle environment
I wonder if anyone has the code for it
It shouldn't be too hard to recreate the simulation in c/cpp then bind it to python
i got someone's feedback in the same channel, they refactored it with rust and it was 6x faster; I followed the same, was just 4-5x faster for me but its worth a try
I see
thanks ya'll
I might just write it in JAX, vectorizing what I can, and call it a day lol
hopefully that works
probably won't be exact but whatever
jax cool lol
yeah
I'd be able to choose my accelerator without bothering with reoptimization
or torch.compile
i understand that
it'll be helpful
but im trying to find more ways to improve the performance
collision checking on GPU might be significantly faster than on CPU
especially with RL environments when you're running a bunch of sims at once
I'm currently just building a JAX wrapper for the provided env and probably will work on a JAX sim reimplimentation
but GPU has usage limits : (
I vibe coded something a while ago and it had a 10^3 throughput speedup
but it made some approximations that I didn't like
I'm gonna do it myself later
I'm kinda just using colab and my local GPU for now
10^3 damnn
GPT 5.3 Codex via Zed
but I think it didn't model comets properly
among other things
so a realistic speedup from a GPU implementation of the sim might be less good
but even a 100x speedup is insane
it also wrote a multiprocessing + numba implementation that achieved a 10x speedup over the base env
it wrote a byte parity test and according to that the numba implementation was byte accurate, but idk how much I trust it
the numbers seem to good to be true
but I still believe that there is something of substance that can be achieved via GPU acceleration
even if numerical precision may be off
oh i see
try to prompt hard code, feed it some examples about the comets yourself
yea that was so cool
i dont like vibecoding, but tbh its cool at times if used properly
It wrote comet code for literally everything but the JAX implementation
It kept claiming that comets weren’t easily modeled on GPUs
I found that LLMs struggle with JAX a lot
most of the time I end up just doing it myself lol
yeah, me neither, was just feeling lazy
I’m not satisfied with the end result
I cannot get coding agents to think in vectors and math for logic no matter what i try
try with the rust one next time, and if it persists its claim then its probably on limits
wdym
like refactor the env in rust
oh I see
yupp
I mean I know it’s claim is bs because I know exactly how to model comets along with planets
that happens to the best of us ;-; we're brave
it’s just that I was feeling lazy
it claimed that a bunch of things weren’t vectorizable until I told it how
they have limits, might have to feed it the winner's solution to get it to work lol
lmao fair enough
might try it
but honestly I’d rather have code that I can understand atp
so I can write better tests and such
tru that
i understand you
well goodluck with your tests, i gtg
alr seeya
thanks
YOU VIBE CODED THE WHOLE THING?
yeah, but as I said before, I don’t really trust it
writing my own implementation as we speak
I've just finished hand writing the simulation into cpp + with rendering for debug
Like any limitations here
In my code
Im thinking of making planets robust, only expending lots of ships for high production orbiting targets
Barbell strategy
Is this idea feasible or not
What's cpp+ and what model are you using
cpp + rendering*
I don't have a model yet, just rule based stuff
I wanted to rewrite the enviroment before starting models
I can’t speak for strategy but performance wise i’m personally trying to avoid loops when i can
the same approach might be able to speed things up for you
tho idk how much that would help as from what I understand you’re not exactly using an rl strategy
i’m currently at the environment setup phase moreso than the agent creation one
I plan on using RL with JAX so I want everything set up nicely before I start
But i dont know how to use barbell strategy and rl
Like i have robust planets
But my attacks are always so terrible
Thats why I lose most times
And I thought rl allows for more dynamic and unconventional attacks
But im not sure
I mean
potentially
the hard part is getting things set up right
also why would you want to combine the two
genuinely curious
I'm currently looking at your notebook, and in my very unprofessional and unqualified opinon, it seems that you're a bit too conservative
code is very short as well imo
yeah maybe more attacks would be a good idea
either I'm not running things right, or, when put up against the baseline nearest planet sniper, it just gets overwhelmed bc the sniper just outproduces
also your agent seems to really hate being put on the edge of the map
im thinking about using rl but confused about which algortihm to use
The idea is that you need to be hyperconservative with much of your assets and also attack lucrative high risk and high reward assets expending those you dedicated
So how can I attack more while meeting this conservative stance on ship accumulation
Dynamically computations of moves
New strategy
i think the conservative stance on ship accumulation is the real issue
because the reward comes from the number of planets you have rather than having the best planets imo
its better to have a bunch of low production planets than a few high production ones
because it seems that your total ship product/step is pretty low
and gets lower relative to the opponent as the game goes on
so whats wrong with leaving strategy completely up to the rl agent instead?
So ur saying that i produce less ships than the enemy and attacking broadly = more ships
yeah, because more planets means more total production
I was thinking of it
But i noticed my code is really good at defense
But I'll figure out a way to accomodate
So improvements: frequency of planets
Better orbital computations
like the counter to stockpiling ships on one planet via that planet's production is just take many planets and just outproduce the defending planet
but low production, low defense planets are worthwhile taking
But I cant understand why my planet has like 700 ships im each but the opposing player sometimes gets more ships
maybe not taking low production high defense
How so
Like i thought I had all the high production planets in the world
I'll revise the algorithm
More attacks, high production
But what about defense
Its not like you can take every planet
And how to replace the for loops
You win by taking the most planets????
I thought that meant having the most ships
isn't the win condition whoever has the most planets after 500 steps
Then it changes everything
I low-key thought it was ships
Let's check the official rules
yeah im checking as well
Scoring and Termination
The game ends when:
Step limit reached: 500 turns.
Elimination: Only one player (or zero) remains with any planets or fleets.
Final score = total ships on owned planets + total ships in owned fleets. Highest score wins.
i think you're right
it is indeed ships
Hence my current strategy of taking high production planets
But my defense is good
But offense is terrible
Ships fly into space and into the sun as if snipers drank too much vodka
Thats the problem
here I think I might just have a replication issue
My computations functions did not account for home planet as rotational or so.e weird planet shooting ships into space
??
oh wait
I can't paste images
uh
here try opening that?
so I'm putting yours against the nearest planet sniper bot
so from what I can see, your agent's planets are the most well defended by far
but I'm not sure you're winning
unless I'm reading the output wrong
and then sometimes the nearest planet sniper just takes your planets early game
can you run yours against the sniper and see if you get the same thing
bc that's what I have on my end
hopfully this image link works
you don't need to iterate through items in a loop if one step in your loop doesn't depend on the previous ones
you can process everything in parallel via python multiprocessing
or just turn everything into vectors and do vector math via numpy or jax
kinda like what I'm doing with the comet code rn
original loop implementation:
# Discretize the continuous ellipse into a dense array of points
dense = []
num = 5000
for i in range(num):
t = 0.3 * math.pi + 1.4 * math.pi * i / (num - 1)
ex = c_val + a * math.cos(t)
ey = b * math.sin(t)
x = CENTER + ex * math.cos(phi) - ey * math.sin(phi)
y = CENTER + ex * math.sin(phi) + ey * math.cos(phi)
dense.append((x, y))
my vectorized jax implementation:
# Discretize the continuous ellipse into dense array of points
num = 5000
t = 0.3 * jnp.pi + 1.4 * jnp.pi * jnp.arange(num) / (num - 1)
ex = c_val + a * jnp.cos(t)
ey = b * jnp.sin(t)
x = CENTER + ex * jnp.cos(phi) - ey * jnp.sin(phi)
y = CENTER + ex * jnp.sin(phi) + ey * jnp.cos(phi)
dense = jnp.stack((x,y), axis=1)
EDIT: Pasted the wrong snippet from the original code
Matchup: neutralCostAgent vs productionWeighted:20
Matches: 1000
Start seed: 0
neutralCostAgent wins: 597
productionWeighted:20 wins: 401
Draws: 2
neutralCostAgent win rate: 59.7%
productionWeighted:20 win rate: 40.1%
Average steps: 277.351
Wall time (s): 43.7999
Average final ships: neutralCostAgent=1627.64, productionWeighted:20=2043.3
Average final planets: neutralCostAgent=15.73, productionWeighted:20=10.618
Finally getting to mess around with stuff
nice!
o7
But i tried writefile submissoon.py and I couldnt get a spot on the leaderboard 😭
Why are we using rust here
it makes the processing faster
here
i saw this suggestion and it worked
itz oki, keep trying ;-l
hi! are your agent rule-based, or have NN and RL involved?
Rules based
yeah, it seems that rule-based models have a higher advantage over NN and RL. My origenal plan is to switch to NN after a few days, but now I'm considering not to switch
Rules based ain't even ai
Or neural net
Why does rules based rule
Cause it has a prepared good algorithm while models take forever or something
My algorithm has robust planets
But it is so bad at attacking
Like im cringing😭 😭 😭 😭
Do your agent have some kind of Weight for attakc&defense? I think adding a Weight to let the agent be more "aggressive" can help.
Like i have a quota of ships and i prioritize high production planets
yeah, i think that could help. my approach is still rule-based, but I’m trying to make the attack/defense weight dynamic instead of fixed.
fr exp, if a planet is safe, it can become more aggressive and send ships to high production targets. If it is under pressure, it should preserve more ships or support nearby planets.
that’s the idea 😄 I’m trying to avoid using one fixed aggression value. Safe backend planets can act more like supply planets, whle frontier planets should be more conservative unlss they have enough advantage
congratulation!!
Looking at the games it seems my agent wins it is exclusively when the other agent shoots ships into the void and not otherwise, that's around 650 ELO, so uh, anyone else particularities they otice at their ELO's?
just submitted my RL agent with 2x as much training so I'm hoping it's gotten any better 🥲
lesssgoooo
nice!
also do you guys have any theories as to why RL agents aren't performing as well as rule based agents
I think this is because each match is too short and with too little data. RL is better when dealing with long-term complex data. But for short term, rule-based is better
And you have to manually train the bastard with a strategy you can program with static python
But i swear the attacks are the worst part
Defense is so easy
But how to bypass the for loops though
What's your strat for attacking?
honestly if you’re not doing rl, it’s probably not worth it
High production planets
Compute distance rotational to compute distance and angle of attack
Ships needed exceeds safety floor and planet ships
How long do the bots get per step to run?
does someone made a public rust version of the environment?
Does kaggle even allow rust code
Yeeeee, it does
I’m curious about this too
I don’t think I’ve seen anything in the sim’s source code that has a timer
@brazen swallow from what I understand this is the config for the game
{'episodeSteps': 500, 'actTimeout': 1, 'runTimeout': 1200, 'agentTimeout': 2, 'shipSpeed': 6.0, 'cometSpeed': 4.0, 'seed': None}
so 1 sec / agent for the actions and 2 seconds for init?
Do all planets have the same angular velocity
yeah i believe so
1 sec / agent with a 60 second bank (agentTimeout should be 60)
wait what does the bank do?
Maybe when your agent times out, it will spend the time in the bank
My agent is crushing 1v1s but almost never winning the 4p
sameeee
are you using RL with self play?
What's yr algorithm
like he'd tell you 😆
He gets it
How'd people compute the angle from a moving planet to static, and static to moving and even moving to moving
Precisely
I've explored a bunch of approaches. In order of complexity:
- Just simulate hundreds of launches in different angles and see what they hit
- Draw straight lines from future positions of the target to current position of the origin - if they have the right length for the fleet speed, calculate angle from origin
- Model the fleet as an expanding circle from the origin, and solve for tangency of expanding circle to target circle
Well what about early game logic
Like im supposed to ideally find nearest high producing planets but i feel like im not expansive enough
I use RL
Hey, I wanted to ask if bigger bot system are causing more heuristics issues? Is a smaller bot better for Orbit Wars?
I thought of this
So basically I take the parametric equations of the planets orbit
Then I find the new angle after a guess of 10 turns
Then I get the distance
Divide by speed to get the actual time aka amount of turns
Is the difference between my guess and the actual close to 0
Then I can cinflate
And grab the x and y I had computed for my guess
Its like Epsilon delta proofs
hey do you know if we can run our models on a gpu?
or a tpu even maybe
or do they have to be cpu only
cpu, less than 2 (slow) cores allocated and 1 second per turn time limit 😢
unless your model is weirdly big you should still be able to do at least 1 forward per turn, but search is difficult
yikes…
no kernel launch overhead I guess…
why doesnt obs expose num_agents to agents?
hi, im new to agent competitions on kaggle
are the environment configurations fixed as default?
i mean this
Parameter Default Description
episodeSteps 500 Maximum number of turns
actTimeout 1 Seconds per turn
shipSpeed 6.0 Maximum fleet speed
sunRadius 10.0 Radius of the sun
boardSize 100.0 Board dimensions
cometSpeed 4.0 Comet speed (units/turn)
Yes the constants are constant for the competition
ah thanks, i thought since it was introduced as "environment variables" it might be subjected to change per episode
@misty oyster are you intending to make the fix suggested in this topic? https://www.kaggle.com/competitions/orbit-wars/discussion/702758 so that numba can be used?
hi, in the combat section it says
When one or more fleets collide with a planet (either by flying into it or being swept by a moving planet), combat is resolved:
All arriving fleets are grouped by owner. Ships from the same owner are summed.
The largest attacking force fights the second largest. The difference in ships survives.
If there is a surviving attacker:
If the attacker is the same owner as the planet, the surviving ships are added to the garrison.
If the attacker is a different owner, the surviving ships fight the garrison. If the attackers exceed the garrison, the planet changes ownership and the garrison becomes the surplus.
If two attackers tie, all attacking ships are destroyed (no survivors).
so if there are 3 arriving fleets on the same turn (possible with 4 players), what would happen to the fleet with the least amount of ships? instantly destroyed and never be considered?
Hello,
Could you please advise how learned (updated) weights are expected to be stored between episodes? Are we supposed to train the model locally first, and then, once it is uploaded, it no longer continues learning?
yes
Pretty much
So the process is:
Combine arriving fleets by owner.
Sort owners by total arriving ships.
Compare only the largest and second-largest totals.
Ignore all lower-ranked owners.
If one owner survives, that survivor then reinforces or attacks the planet.
Anyone else having trouble getting scores above 900? I feel like I was able to get into the 800s with a somewhat simple agent but no changes I have made from there has managed to get much above 900.
Are you doing heuristics or rl?
I was unable to push higher than 900 with rule based stuff
I haven't tried RL yet, trying to see how well I can do with rule based. I though I saw some people saying they were getting up to 1200 with rule based but that was earlier in the competition, I think scores have inflated a bit since then.
The leaderboard does move up, but don't be disencouraged, your rule based agent will be invaluable when you move onto self play ppo or something else as a good opponent to collect samples from
hey, how good are you guys' RL agents at aiming if you guys are giving them free reign on choosing angles
you should give them free reign on choosing planets, there is no point in chosing angles
oh yeah ik
i just gave free reign over angles to see what would happen
it’s somehow not entirely terrible
If you have high throughput it'll learn how to aim over time, but you should be thinking in terms of action space
Planet * angle * ships is incredibly large, especially because angle is a float here
yeah its basically infinite...
was just kinda messing around to see if it will learn to aim
as it turns out it does
kinda