The plateau

3 min read
Winding road through the Chilean Andes

Day 60 / 60

60 days ago I set out to have a study done by Don Nelson submitted to the CEN so we could run real tests. I didn't make it ☹️.

I've been thinking a lot about why. I think I hit the plateau (though I'm already coming out of it).

The plateau is that state of a project where, after a lot of progress, you simply stop moving forward. Some say it's necessary. I think it's just part of the process. Even though I was working every day, I couldn't feel that I was making progress.

I started to doubt things I thought were settled: the business model, the current Don Nelson harness, whether to keep going with Spark and keep everything in a single agent, whether skills are the way, whether MCPs are worth it. The crazy part is that one of the few things I never doubted was the model: I still trust Google's.

And even though we're in the peak age of agents, I spend entire days researching and reading papers on how to build an agent that solves what I'm trying to solve. There's nothing written. It's not just state of the art: it's a genuinely hard problem.

The weeks I didn't publish on the blog weren't because I wasn't working — they were because I wasn't making progress. Or at least that's how it felt. But looking back, there were improvements:

  • Important improvements in Spark and Don Nelson.
  • National and international interest in Don Nelson.
  • First and second public study by Don Nelson.
  • A paper in progress.

What I understood

These days I realized something: I'm teaching the machine to do studies. And that shouldn't be my job. My job should be to teach the machine to learn on its own how to do studies.

The difference isn't subtle. If tomorrow I switch grids — say I want to run a study on the Belgian grid — does anything Don Nelson learned about the Chilean grid help? Or do I have to put him to simulate from scratch and learn this new grid from zero?

A human would be able to adapt. Obviously they'd have to spend a couple of hours understanding the new grid and adjusting to the other country's standards, but their judgment as a "Studies Engineer" would carry over.

That's what had me stuck these weeks: that doubt.

It's not that I'm technically stuck. It's that I hit the limit of what I can build with the current approach.

For weeks I've been reading and experimenting with reinforcement learning and self-play. The intuition is that if Don Nelson is going to generalize from the Chilean grid to any other grid, it can't be because I wrote the rules — he has to have discovered them himself, simulating against himself. I'm still halfway through understanding whether this is viable or whether I'm falling in love with a pretty idea.

Concretely: I'm testing whether Don Nelson can understand a grid by running hundreds of thousands of simulations, without me writing the rules, and then carry that knowledge to any other grid — Chilean, Belgian, Peruvian — without starting from scratch.

Thanks to my friend Frau, who helped me understand the plateau and, somehow, get out of it.


60 / 100

I'm extending the challenge to 100 days. I expect that in the next 40 days I'll have a study submitted to the CEN. This time, for real.

I'll keep documenting the next 40 days here. If you've worked with reinforcement learning applied to physical systems — or have an intuition for why this shouldn't work — reach out.

Subscribe to the blog