Llm on Bradley Fidler

Wireword: Agent Control Words Should Be Hard to Misread

Tue, 21 Apr 2026 00:00:00 +0000

This is a research note for Wireword, a small tool I am building to lint LLM agent control words.

By control words, I mean short labels that can change what an agent does:

route names
tool names
prompt macro names
environment targets
approval targets
exact enum values the model must emit

The goal is narrow: make labels that control agent behavior harder to misread, miscopy, or misroute.

Of words and tokens being expensive

This started with caveman-style LLM output. The useful comparison is not really cavemen. It is telegraphese: compressed language for an expensive channel.

Western Union did not bill like an LLM API, but the pressure was similar. Ordinary domestic telegrams were billed by chargeable body word, usually with a ten-word minimum; address, signature, and date were free, while extra body words cost more.¹ A ten-word sentence from New York to Boston could cost 30 cents.²

That maps to LLM work in two basic ways:

Token cost: shorter turns are cheaper.
Context quality: shorter turns leave less low-information text in the conversation history.

The second point is not just aesthetic. Long histories are not used perfectly. Irrelevant text can distract the model or bury the useful constraint.

But compression has a failure mode. If compressed labels become too similar, the model has less redundancy to recover the intended control word.

Learning from telegraphy

I looked at other telegraph practices to see what might apply to LLM agents. Could Victorian engineers provide fresh insights for our changing world? No, except for one thing, sort of.

Most parallels are useful but general:

Telegraph practice	General pattern	LLM-agent version
`STOP` and spelled punctuation	delimiters	source/task boundaries
repeat-back	confirmation	human approval gates
service classes	priority and cost tiers	model routing / effort levels
codebooks	macros	prompt libraries
word-count checks	validation	output checks
operators	review and observability	linters / traces
private codes	substitution	PII masking

These are durable information-management practices. They are worth remembering, but they do not justify a new tool by themselves.

The more specific lead was codeword design.

Compression with redundancy

Commercial telegraph codebooks had to balance compression and recoverability. A codeword had to be short enough to save money, but distinct enough that a damaged word did not silently become another valid word.

E. L. Bentley described the rule directly: good codewords should differ by at least two letters. Then a one-letter mutilation produces an invalid codeword, not the wrong valid codeword.³

The ABC Code used the same principle. John McVey’s index quotes the 1920 sixth edition saying its five-letter codewords were built with at least a two-letter difference. The same note says the compilers considered Morse similarities and removed risky words.⁴

Useful rule:

Good compression leaves enough redundancy to detect mistakes.

The LLM agent version

This problem is not unique to LLMs. Similar issues appear in APIs, command-line flags, protocol enums, medication names, service names, and airport codes.

LLM agents make the problem newly common because they combine:

probabilistic language generation
exact symbolic control
natural-language prompts around short labels
tool calls and routes with real side effects

Example labels:

A1
AI
Al
prod
production
live
docs.api
doc.api
FACTCHECK_API
FACT_CHECK_API

These are not just strings. In an agent system, they may route work, call tools, select environments, expand macros, approve targets, or satisfy exact enum values.

The risk boundary is narrow. Similar labels matter when three conditions hold:

the label is visible to the model or copied through natural language
the model or a human can choose or emit the label
downstream code treats the label as an exact control input

A wrong valid label is worse than an invalid label. Invalid labels can fail validation. Wrong valid labels can pass validation and trigger the wrong action.

This matters less when routing is deterministic, internal IDs are hidden from the model, schemas constrain the choice, or a UI forces selection from canonical options.

So Wireword should not only ask whether two strings are similar. It should ask:

What kind of label is this?
Can the model emit it?
Does a parser require an exact match?
What happens if the wrong label is chosen?
Does it target production or another external system?

Generic check vs agent-aware check

Generic similarity check:

docs.api / doc.api
Reason: edit distance 1.

Agent-aware check:

CRITICAL docs.api / doc.api
Reason: route-name collision across different effects.
Risk: read-only route is one edit away from external-write route.
Fix: rename to ROUTE_DOCS_REVIEW and ROUTE_DOCS_PUBLISH.

Generic similarity check:

prod / production / live
Reason: related strings.

Agent-aware check:

CRITICAL prod / production / live
Reason: multiple production-like environment labels.
Risk: agent may choose an inconsistent deployment target.
Fix: use ENV_PRODUCTION as the only valid production label.

That is the product line: do not only lint strings. Lint control words by the action they can trigger.

Current prototype and V1 plan

The tool is Wireword. V1 should stay small.

The current prototype now checks both layers:

raw labels: visual confusables, edit-distance-one pairs, case-only differences, punctuation-only differences, plural/stem collisions, and production-like aliases
agent-aware labels: routes, tools, named agent handoffs, approval targets, macros, profiles, production-like environments, and exact enum values the model must emit

That is enough to test the shape of the idea. The repo now has a small validation corpus with safe, dangerous, and malformed configs, plus a narrow FastMCP source extractor for tool names. It is still not a full agent security scanner.

The useful output is not just these strings are similar. It is these strings are similar, the model can see or emit them, and confusing them could call the wrong tool, route work to the wrong place, or target the wrong environment.

Representative targets:

MCP servers with model-visible tools
router or handoff agents
graph-based agent workflows
skill/plugin systems with named routes
exact enum outputs consumed by parsers

The repo should carry the detailed CLI examples, fixtures, and tests. This note only needs the argument.

What Wireword is not

Wireword is not:

an agent framework
a prompt framework
a general security scanner
a replacement for schemas or constrained decoding
a proof that LLMs confuse every similar label
necessary when labels are hidden behind deterministic routing, internal IDs, or strict UI selection

It is a narrow lint pass for labels that become model-visible or human-visible control inputs.

Conclusion

Telegraph codebooks might inspire useful linting for LLM agent control identifiers.

Sources

Nelson E. Ross, How to Write Telegrams Properly (1928), “How Tolls Are Computed” and “Punctuation Marks.” Ross explains domestic body-word billing, cable/radiogram address billing, and the rule that requested punctuation marks were counted and charged as words. ↩︎
Western Union Telegraph Company, The Proposed Union of the Telegraph and Postal Systems (1869). Western Union gives the 1866 New York-to-Boston tariff as 30 cents for ten words, exclusive of address and signature. ↩︎
E. L. Bentley, “Codes: Their Nature and Manipulation”, transcribed by John McVey. Bentley describes the two-letter-difference rule and explains that it prevents a one-letter mutilation from silently becoming another valid codeword. ↩︎
John McVey, “A.B.C. Telegraphic Codes, seven editions 1873-1936”. The page quotes the 1920 sixth edition on five-letter codewords built with at least a two-letter difference and notes the code’s attention to Morse similarities. ↩︎

Strachey's Principle: The Discipline That Makes Abstraction Work

Mon, 23 Feb 2026 00:00:00 +0000

In 1965, Christopher Strachey changed his mind about machine code.

He was building the General Purpose Macrogenerator — the GPM — at Cambridge to help write a compiler for the Combined Programming Language (a C precursor). The original plan was simple: mix machine code with macro calls where convenient. The GPM was designed to make this possible.

He ended up abandoning the mix entirely. For the CPL compiler, all machine code would be incorporated as macro calls — even sections called only once, where defining a macro added no economy at all. It had started as a way to save effort. It became a principle.

Sixty years later, programmers and writers are asking the same questions about LLMs that Strachey’s contemporaries were asking about macro systems. Whether the tools are legitimate. Whether work done through them counts.

Wheeler’s subroutine

The groundwork comes thirteen years earlier. In 1952, David Wheeler presented a short paper at the ACM national meeting: “The Use of Sub-Routines in Programmes.”¹ Wheeler had been part of the EDSAC team at Cambridge — the group that built the first stored-program computer to run a practical program.

His paper introduces the library subroutine as a unit of abstraction. A subroutine is self-contained, reusable, testable in isolation. You call it; control returns to the point after the call. From the outside it behaves like a single instruction, even though it may be dozens. Wheeler’s summary: “All complexities should — if possible — be buried out of sight.”

The asymmetry problem

By the early 1960s, macro-assemblers were standard. You define a macro; the assembler expands it into an instruction sequence before assembly. It extends the instruction set without modifying the hardware.

Strachey identified a structural problem with all of them. A conventional macro takes text as parameters — register names, addresses, literal values, arbitrary strings — and produces complete instructions as output. The domain (what a function accepts as input) and range (what it produces as output) don’t overlap — in Strachey’s terms: “the domain and range of these macro-functions do not overlap.”² This limits composition. You cannot build complex behavior by nesting simpler behaviors, because the types don’t match.

GPM eliminates this by making everything a character stream. Input is a stream of characters. Output is a stream of characters. A macro call is a pattern in the stream; its expansion is substituted back into the stream. That re-entry is the key step: if the expansion contains another macro call, that call gets expanded too. The system processes its own results. Because domain and range are now the same type, a macro call can appear anywhere a string can appear — as a parameter to another macro call, in the name position of a macro call, inside a macro’s own definition. Recursion, conditionals, and higher-order behavior follow directly from the unification. Strachey notes they “appeared in the wash.”

This is also why the all-macros discipline was a structural requirement, not a preference. Allow raw machine code anywhere in the stream alongside macro calls and you have introduced a second type: patterns to be expanded, and instructions to be left alone. GPM would need rules to tell them apart. The uniformity breaks down, and with it the recursive model. The decision Strachey arrived at — no machine code in the source, only macro calls, even where defining a macro added no economy — was not aesthetic. It was the condition the system required.

The CPL compiler was written entirely in macro calls; the programmer never touched opcodes. Strachey noted the consequence: “even very experienced programmers indeed tend to spend hours simulating its action when one of their macro definitions goes wrong.” Power and opacity arrived together.

The pattern

Every major abstraction layer in computing has produced the same debate. Is this real programming? Do you understand what the machine is actually doing?

Subroutines raised it. Macro systems raised it. FORTRAN raised it — John Backus’s team spent years arguing that compiled code could match hand-written machine code and therefore be trusted. High-level languages raised it. Each time, the layer was legitimized. Each time, the locus of required skill shifted upward.

The machine kept receding. The question kept returning.

What is different about LLMs

The question is back. Whether LLM-assisted code is real programming. Whether text written with LLM help is real writing.

The legitimacy debate will resolve itself — not because history mandates it, but because using the best available tool and understanding its properties is good engineering. The interesting question is not whether the layer counts. It is what the layer actually is.

Every previous abstraction layer was deterministic and traceable. A macro expansion has a defined structure — given a name and the current state, you can compute the result by hand. A compiler’s transformation of source to object code is, in principle, auditable. The failure modes are structural — unmatched brackets, undefined names — and they are reported precisely.

LLMs are not like this. The binding between a prompt and its output is not a defined rule you can look up. It is distributed across a weight matrix — a large table of numerical values produced by training on text you did not write and cannot inspect. The output is not deterministic. The failure modes are not structural — they are probabilistic. The model can fail silently, plausibly, confidently. There is no Find routine you can call to check the current state.

This is not an argument against the layer. It is a description of the layer’s properties. Strachey never stated it as a principle — but one can be distilled from what he built and decided: the power of an abstraction layer comes from committing to it entirely. Partial adoption breaks the model.

Strachey was not alone in finding it. Dijkstra hit the same constraint with structured programming: full commitment to provable control structures was the condition for formal reasoning, and IBM’s reduction of it to “abolish goto” lost that property entirely. Kay hit it with OOP — the real innovation was message-passing, not objects, and most languages adopted the lesser idea. Each discovered it independently.

I work this way too. The failure modes are real. The discipline is learning what they are.

The longer question is what that discipline evolves into. Strachey’s commitment became infrastructure we now take for granted. The LLM equivalent is being designed now: verification practices, workflow conventions, the points where human judgment belongs.

That remains the right question.

Wheeler, D.J. (1952). “The Use of Sub-Routines in Programmes.” Proceedings of the ACM national meeting, Pittsburgh, pp. 235–236. ↩︎
Strachey, C. (1965). “A General Purpose Macrogenerator.” The Computer Journal, 8(3), pp. 225–241. ↩︎

Using CHANGELOG.md as LLM session memory

Sat, 21 Feb 2026 00:00:00 +0000

Most LLM assistants don’t maintain memory between sessions. The standard workaround — a large CLAUDE.md or AGENTS.md with everything in it — breaks down quickly. What’s more, it duplicates other content in your repo, growing the documentation maintenance surface without adding value.

Lately I avoid this problem by treating CHANGELOG.md as my LLM’s memory — specifically the [Unreleased] section from the format standardized by Keep a Changelog, which becomes the primary mutable state document.

Why it works

Keep a Changelog defines a format most LLMs recognize on sight: a fenced [Unreleased] block at the top, dated releases below. Most LLMs recognize the convention: [Unreleased] is active work, dated entries are history.

That maps directly onto what you need for session continuity:

[Unreleased] — mutable, updated every session. Current state, active priorities, blockers, decisions pending. The model reads this first.
Dated entries — append-only history. Evidence that decisions happened and why. The model reads these to reconstruct context if it needs depth.

The AGENTS.md (or CLAUDE.md) file becomes stable configuration: conventions, file paths, source-of-truth map. It changes rarely. The CHANGELOG takes on everything that does change.

The session start instruction

One line at the top of AGENTS.md is enough:

Read CHANGELOG.md [Unreleased] at session start.

From there the model knows where it is, what’s in flight, and what to do next — without re-explanation.

What goes in [Unreleased]

I use explicit subsections:

## [Unreleased]

### Current State
One-paragraph snapshot. Where things stand right now.

### Active Priorities
Ordered list of what needs to happen next.

### In Progress
What the model started in the current session.

### Blocked
Anything waiting on external action.

### Decisions Needed
Open questions the model should surface, not resolve unilaterally.

### Recently Completed
What just shipped. Moves to a dated entry on the next commit.

The model updates [Unreleased] at the end of each session. The next session reads it cold and picks up cleanly.

What this is not

This is not a replacement for good project documentation. Architectural decisions, integration details, and source-of-truth maps still belong in stable docs. The changelog is the session state layer, not the full context layer.

It also does not solve the problem of context window limits on large projects. It reduces the cost of context: the model loads a small, structured, current-state document instead of scanning a stale megafile.

Result

Sessions are shorter to start, more reliable to hand off, and easier to audit. The changelog does the work it was always supposed to do — track what changed and when — and the LLM does less redundant orientation work each time.

The format is well-understood, self-describing, and version-controlled. If you’re already using Keep a Changelog, the only addition is a discipline: update [Unreleased] at the end of each session.