about

draft

I'm a developer by trade and I've spent the last few years working with AI, understanding its capabilities, integrating it into my workflows, building products on top of LLMs and learning on the job. Then I read The Infinity Machine and wanted to go deeper.

Inspired by the story of AlphaGo, and armed with a Mac Mini and a little bit of free time (I have no free time, but I am sleeping even less nowadays), I set off to build my own (much smaller) version of it. I will not conquer Go, but Connect4 seemed well within reach.

01

hardware

I don't have a crazy set-up. Turns out you don't need one. My day-to-day development on an M1 MacBook Pro. The longer self-play (we'll get to this) and training runs go on a Mac Mini M4 Pro with 64 GB of unified memory sat in the corner of my home office. Both use the MPS PyTorch backend. Sadly, no NVIDIA cards.

The website is pretty basic, mostly vibe coded (can you tell?), hosted on Netlify. The API is hosted at Fly.io (2 CPUs that wake on demand, which is why your first move sometimes takes a while.) This is a truly hobby hardware project. I might run some training in the cloud for later games, but we'll definitely aim to reach the limits of the current set-up before that.

02

architecture

Policy/value residual network in the AlphaZero style, just much smaller. A 6×7 board goes in; a stack of residual blocks feeds two heads. The policy head predicts a probability over the 7 columns; the value head predicts who's winning from the current position. The exact size has changed over the project; see later training for the most recent shape.

At play time the model doesn't move straight from the policy output. Each turn it runs MCTS (a tree search guided by the network's predictions), and the number of MCTS simulations matters as much as which checkpoint is loaded. Easy and expert in the table below are literally the same network; the difference is how long it thinks.

03

early training

The first hundred-something iterations were really about getting the training loop and measurement discipline into shape. The early runs were deliberately small: fewer self-play games, shallower searches, and simple ways of deciding whether a checkpoint looked better. That gets you a network that plays Connect 4. It does not get you a network you can confidently say is better than yesterday's network.

The first more serious selection rule was still simple: score candidates against depth-2 minimax over a fixed opening suite, using (wins + 0.5 × draws) / games. By checkpoint 108 that score was 0.83. Looked great on the dashboard. But it was still a comfort metric. Beating a shallow search is not the same as beating your own previous best neural net.

The lesson wasn't that the code was broken. It was that a training run can look like progress while the thing measuring progress is too blunt to tell. Early training was valid, just underinstrumented. The question had to shift from does it beat a baseline? to does it actually replace the current champion?

04

later training

The fix was gated composite promotion. A candidate now has two bars to clear:

  • Win in head-to-head play against the current champion.
  • Avoid regressing against the minimax baseline.

The first test is the key: does this model beat the current champion? The second is a guardrail. A head-to-head test can promote a model that wins a narrow matchup while getting worse in broader positions. The minimax check catches some of those regressions and prevents them becoming a problem later on.

Once selection got serious, the next constraint was the network itself. The first strong model was small: 4 residual blocks, 64 channels, about 301k parameters. Plenty for the basics. But it made silly moves. Don't get me wrong, it was strong. I never beat it, but it was far from elite.

So we widened it. Same shape (still 4 residual blocks), but 96 channels each, giving about 673k parameters (a little over 2× larger). The algorithm did not change. MCTS still does the search; the network still returns a policy and a value. There is just more room inside the network to encode threats, traps, and positional patterns. The training loop stayed the same. The model being trained got more expressive. The result was pretty good. Before we even started gated promotion, it was comfortably beating the smaller "expert level" model. This is the one live on the site today.

05

model selection

Currently we have four difficulty tiers. I am still playing with this, but for now, each one is a specific (checkpoint, MCTS simulations) pair:

tiercheckpointmcts sims
easy4050
medium6850
hard10850
expert108200

The current settings come from 48-opening head-to-head matches between candidates. Each adjacent tier (medium over easy, hard over medium) wins about 75 to 80% of those games, which is the separation I was aiming for. Expert is the same network as hard, just with 4× the search budget. That one is supposed to be unbeatable (by humans at least). Neither me, my kids or my wife have managed it yet. I am actually pretty bad at Connect4 so I don't see me being the one to take the throne.

I don't think I have the balance right just yet, I will continue to tweak. But the goal is not not really to build a usable arcade, their are clearly much better options out there. My kids prefer the Nintendo Switch. I would like to optimise the levels better, but current focus is on how far we can go with expert mode (and more games, see below...)

06

next: draughts

I want to try some imperfect games, the training is totally different and will likely surface some real-world value but first, I plan to step up the complexity through draughts.

Connect 4 is, game-theory-wise, gentle. Branching factor of 7. Shallow tree. State fits in two 6×7 binary planes. Draughts is rougher in every direction: bigger board, captures that chain across the board, longer games, four piece types to encode.

The training loop should transfer (the network doesn't care about the game is, as long as you can encode the rules sufficiently). My Mac Mini might not like it. A self-play iteration on draughts is probably 10× more expensive (plucked from thin air) than the Connect 4 equivalent on the same hardware.