Operations Procedures

Day 70 / 100

Today I learned what an Operations Procedure (GM — Guía de Maniobra) is.

As someone who's only been deep in this space for 5 months, there are things that are obvious to everyone else but pure revelations to me.

What an operations procedure is

A technical document that describes, step by step, the sequence of operations to execute on the equipment of an electrical installation (substation, line, plant) to move it from one operating state to another safely and in order.

Open a breaker before a disconnector. Verify the absence of voltage. Measure. In order, no ambiguity.

It's the recipe that keeps someone from dying or half the country from losing power.

GM and PGP

Many GM live inside PGP.

Just like EFP, ECAP, ECC and every study I analyzed last week. Same flow. Same iterative process between the owner and the CEN.

Same public data available to anyone who wants to look.

GM universe snapshot

I pulled the full PGP and filtered by requirement = GM. Here's what's there today:

806 projects have submitted or need to submit a GM. 32% of PGP.
585 approved GM. 12 in progress. 209 not started.
376 unique owners have submitted at least one GM.
Approval rate: 73%.

And the timing:

Median 217 days between first draft and approval.
Average 257 days. The slow tail drags the average again.
Only 43% approved on the first try.
1.86 average iterations with the CEN.

That last one matters: just like with studies, GM have iterations.

Who submits GM

376 unique owners. But the distribution is very unequal: the top 15 hold most of the volume. Transelec alone has 369 GM (266 approved). CGE Transmisión next, with 199.

The interesting bit isn't volume — it's quality: how few rounds each company needs to get approved.

What stood out

A GM behaves more like an ECAP than an ECC.

It's interpretive. There's no single "correct" sequence. CEN judgment kicks in. That's why only 43% pass on the first try.

It looks like a calculation, but it isn't.

If you skim a GM, you'd think it's a mechanical checklist. Open, disconnect, ground. Clear rules. Low rejection rate.

But 57% bounce back. And that changes everything. It's not a problem of following an order. It's a problem of deciding which order given a system with specific conditions: neighboring equipment, outages, operational margins, SCADA comms.

That makes it a judgment problem, not a procedural one.

The pipeline is accelerating.

2020: 36. 2021: 90. 2024: 107. 2025: 116. 2026: 54 in 5 months → projection ~130. More projects, more GM, more CEN reviewers doing the same iterative work.

The bottleneck shows up in the same place again.

Just like with ECAP, the median (217 days) and average (257 days) tell different stories. Half clear in ~7 months. The slow tail pushes the average close to a year.

The ones that go well, go fast. The ones that get stuck, get stuck hard.

Can Don Nelson write a GM?

When I started writing this post, I thought "probably".

After writing the post, my answer is yes.

What changed was looking at the observations.

What I thought before looking at the observations

I'll be honest: I assumed the problem was pure engineering judgment.

My reasoning went like this. A GM is structured text. Numbered steps, a small set of verbs (open, close, verify, ground, switch), a small set of subjects (disconnectors, breakers, transformers, bays). A modern LLM writes that in its sleep.

But 57% bounce back. If it were just writing, first-try would be 80-90%. It's 43%.

So it had to be something else. Decision under context: neighboring equipment, outages, operational margins, SCADA comms. Senior engineer judgment about the whole system.

Exactly the hard-to-automate part.

But I got a surprise when I read the observations carefully.

What I found in the observations

I pulled 16 DUOs (Unique Observation Documents) from 7 iterated GM. 8 with real observations — the other 8 are the final-approval DUOs.

I labeled every observation. Grouped them. A pattern showed up that I wasn't expecting.

CEN observations cluster into 8 categories. The first — the most frequent — is the opposite of what I thought.

1. Omitted safety verifications — 4 of 8 cases

The owner forgets to include the verification of some piece of equipment before operating:

"Add: S/E Nueva Pozo Almonte verify 89J1-3 line disconnector open." (#2316)

"S/E Graneros open or verify 52CS1 open." (#1117)

"Grounding disconnectors of new positions 89ES1-1T through 89E25-1T must be verified open." (#5785)

"State that the installations are grounding-free and ready to be energized." (#2493)

2. Sequence errors

"Operation #17 duplicates operation #7." (#1117)

"Operation #23 must come earlier, propose adding it as operation #9." (#1117)

3. Wrong description of electrical state

"Operation #56 closing is not energized — installations are fully de-energized. Fix." (#1117)

4. Missing or illegible one-line diagram

"Include a simplified one-line diagram... legible and properly sized." (#2515 iter 1)

"Include a simplified one-line diagram that is legible." (#2515 iter 2 — same problem, not fixed WTF)

5. Inconsistency between one-line and procedure

"In the one-line the grounding disconnectors are 89H01-1T through 89H15-1T but the procedure calls them 89H01-T, 89H14-1T and 89H15-1T." (#5785)

"Item 9 says 52H01 through 52H17, should say 52H01 through 52H15." (#5785)

6. Insufficient equipment identification

"Whenever an operation is verified or confirmed, the name of the verified equipment must be stated." (#2493)

7. Failure to address prior observations

#2316 iter 2 re-raises the same observation as iter 1 — verify 89J1-3 open. The owner responded but didn't fix.

That explains why the average iteration count climbs so much for some owners.

8. Administrative iteration

"Requesting party needs an iteration to upload an updated document." (#3561)

The iteration doesn't come from a technical observation, it comes from the owner needing to re-upload.

What this changes

70% of technical observations are detail errors, not conceptual errors.

Forgetting a disconnector. Wrong code (52H17 vs 52H15). Not naming verified equipment. One-line inconsistent with the operations list. Not fixing what you were already told.

The underlying sequence is usually fine.

This blows up my original hypothesis. I thought the bottleneck was engineering judgment about the system. Turns out in most cases the bottleneck is attention to detail: cross-checks between documents, completeness of verifications, consistency of naming.

Exactly what an agent does well.

A human gets bored cross-checking 60 steps to verify every named device also appears in the one-line. Don Nelson doesn't get bored.

A human forgets to verify disconnector 89J1-3 because they've been writing for 4 hours. Don Nelson doesn't get tired.

A human typos —52H17 instead of 52H15—. A raw LLM could hallucinate Don Nelson's codes; a well-trained agent wouldn't.

What I'm going to do

First step isn't writing GM. It's reviewing them.

I'm going to put Don Nelson to review GM before they get sent to the CEN. Predict the observations that would come. If it gets the 7 already labeled right, I have an eval set. If it passes the eval, I have a product.

It's the cheapest, fastest, most useful version of what I originally planned.

And it opens up something more interesting: if Don Nelson can predict the CEN's observations, it knows what a correct GM has to meet. And if it knows that, it knows how to write one.

The reviewer is the path to the writer.

And there's one more thing, and I think it's the most important: the same algorithm Don Nelson uses to do studies also works for GM.

Same loop. Pull the public data, label CEN observations, predict, compare against the real observation, iterate. The domain changes, the method doesn't.

If it works for EFP and ECAP, it works for GM. And if it works for GM, it probably works for everything the CEN reviews iteratively.

What's next

Things I want to think about in my spare time:

What fields the agent needs to write a GM that I haven't considered yet.
How much GM content transfers between projects. If it does, the agent learns it.
Whether the 8 observation categories hold up when I download the 200+ remaining DUOs, or whether new patterns emerge.

If you write GM or review them at the CEN and this experiment interests you, get in touch. I want to see how much of this holds up with more data.