Eskiv is a 2016 game: you move a square, dodge balls, grab a new square each tick — every pickup spawns another ball, so the field fills up until the agent can’t find a safe path home. Ten thousand games, one brute-force agent with lookahead=100, three plates of how it behaves.
Grey square is the player. Blue circles are enemies — they bounce, accelerate, and there are more of them every second. Light grey squares are pickups worth one point each. The agent searches over reachable pickup targets and picks the one whose safe-neighbourhood survives the next hundred ticks of simulation. When no such target exists, the game ends.
The engine was rewritten headless — vectorised enemies, no pygame in the hot loop — so ten thousand 5-minute games finish in an afternoon. Every frame of every game is saved. Three questions, chosen because I was curious and the answers weren’t obvious from watching one game:
Straight-line distance divided by actual path walked, per pickup. 1.0 means the agent walked a straight line. Bucketed by how many balls were on the field at the time.
With an empty field the agent walks at ~76% of the straight line — not 100% because it’s already dodging a few enemies. By the time there are 60+ balls it’s down to 40%, and at 100+ it’s taking routes more than four times longer than the crow flies. The fall-off is smooth and roughly linear from ~20 balls onward. Not surprising, but this is the shape of the concession.
Player-position density by score bucket. Log-smoothed, 16.4 million samples. Auto-scrubs through the game’s life.
Up to about 200 points the agent hugs the centre — maximum optionality, minimum wall risk. Somewhere between 300 and 400 it flips: the centre becomes so ball-dense that the only safe pockets are against the edges. By the time scores hit 500 it’s spending most of its steps pinned to a single wall. It’s not a strategy the agent was told about — it’s where an agent that only minimises immediate risk ends up.
Final score per game. Brute agent, lookahead=100, small field.
The distribution is remarkably tight — stddev 58, p10–p90 range of ~310–445. More striking: not a single game ended by getting hit. Every death was a trap: the search raised DeathError because no target had a safe enough 100-step future. The agent dodges perfectly right up until it corners itself — which is a very specific failure mode of conservative search.