<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Llm on Bradley Fidler</title>
    <link>https://brfid.github.io/tags/llm/</link>
    <description>Recent content in Llm on Bradley Fidler</description>
    <generator>Hugo -- 0.156.0</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 21 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://brfid.github.io/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Wireword: Agent Control Words Should Be Hard to Misread</title>
      <link>https://brfid.github.io/posts/wireword-control-words/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://brfid.github.io/posts/wireword-control-words/</guid>
      <description>&lt;p&gt;This is a research note for &lt;a href=&#34;https://github.com/brfid/wireword&#34;&gt;Wireword&lt;/a&gt;, a small tool I am building to lint LLM agent control words.&lt;/p&gt;
&lt;p&gt;By control words, I mean short labels that can change what an agent does:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;route names&lt;/li&gt;
&lt;li&gt;tool names&lt;/li&gt;
&lt;li&gt;prompt macro names&lt;/li&gt;
&lt;li&gt;environment targets&lt;/li&gt;
&lt;li&gt;approval targets&lt;/li&gt;
&lt;li&gt;exact enum values the model must emit&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal is narrow: make labels that control agent behavior harder to misread, miscopy, or misroute.&lt;/p&gt;
&lt;h2 id=&#34;of-words-and-tokens-being-expensive&#34;&gt;Of words and tokens being expensive&lt;/h2&gt;
&lt;p&gt;This started with &lt;a href=&#34;https://github.com/juliusbrussee/caveman&#34;&gt;caveman-style LLM output&lt;/a&gt;. The useful comparison is not really cavemen. It is telegraphese: compressed language for an expensive channel.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>This is a research note for <a href="https://github.com/brfid/wireword">Wireword</a>, a small tool I am building to lint LLM agent control words.</p>
<p>By control words, I mean short labels that can change what an agent does:</p>
<ul>
<li>route names</li>
<li>tool names</li>
<li>prompt macro names</li>
<li>environment targets</li>
<li>approval targets</li>
<li>exact enum values the model must emit</li>
</ul>
<p>The goal is narrow: make labels that control agent behavior harder to misread, miscopy, or misroute.</p>
<h2 id="of-words-and-tokens-being-expensive">Of words and tokens being expensive</h2>
<p>This started with <a href="https://github.com/juliusbrussee/caveman">caveman-style LLM output</a>. The useful comparison is not really cavemen. It is telegraphese: compressed language for an expensive channel.</p>
<p>Western Union did not bill like an LLM API, but the pressure was similar. Ordinary domestic telegrams were billed by chargeable body word, usually with a ten-word minimum; address, signature, and date were free, while extra body words cost more.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> A ten-word sentence from New York to Boston could cost 30 cents.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></p>
<p>That maps to LLM work in two basic ways:</p>
<ul>
<li><strong>Token cost:</strong> shorter turns are cheaper.</li>
<li><strong>Context quality:</strong> shorter turns leave less low-information text in the conversation history.</li>
</ul>
<p>The second point is not just aesthetic. Long histories are not used perfectly. Irrelevant text can distract the model or bury the useful constraint.</p>
<p>But compression has a failure mode. If compressed labels become too similar, the model has less redundancy to recover the intended control word.</p>
<h2 id="learning-from-telegraphy">Learning from telegraphy</h2>
<p>I looked at other telegraph practices to see what might apply to LLM agents. Could Victorian engineers provide fresh insights for our changing world? No, except for one thing, sort of.</p>
<p>Most parallels are useful but general:</p>
<table>
  <thead>
      <tr>
          <th>Telegraph practice</th>
          <th>General pattern</th>
          <th>LLM-agent version</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>STOP</code> and spelled punctuation</td>
          <td>delimiters</td>
          <td>source/task boundaries</td>
      </tr>
      <tr>
          <td>repeat-back</td>
          <td>confirmation</td>
          <td>human approval gates</td>
      </tr>
      <tr>
          <td>service classes</td>
          <td>priority and cost tiers</td>
          <td>model routing / effort levels</td>
      </tr>
      <tr>
          <td>codebooks</td>
          <td>macros</td>
          <td>prompt libraries</td>
      </tr>
      <tr>
          <td>word-count checks</td>
          <td>validation</td>
          <td>output checks</td>
      </tr>
      <tr>
          <td>operators</td>
          <td>review and observability</td>
          <td>linters / traces</td>
      </tr>
      <tr>
          <td>private codes</td>
          <td>substitution</td>
          <td>PII masking</td>
      </tr>
  </tbody>
</table>
<p>These are durable information-management practices. They are worth remembering, but they do not justify a new tool by themselves.</p>
<p>The more specific lead was codeword design.</p>
<h2 id="compression-with-redundancy">Compression with redundancy</h2>
<p>Commercial telegraph codebooks had to balance compression and recoverability. A codeword had to be short enough to save money, but distinct enough that a damaged word did not silently become another valid word.</p>
<p>E. L. Bentley described the rule directly: good codewords should differ by at least two letters. Then a one-letter mutilation produces an invalid codeword, not the wrong valid codeword.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p>
<p>The ABC Code used the same principle. John McVey&rsquo;s index quotes the 1920 sixth edition saying its five-letter codewords were built with at least a two-letter difference. The same note says the compilers considered Morse similarities and removed risky words.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup></p>
<p>Useful rule:</p>
<blockquote>
<p>Good compression leaves enough redundancy to detect mistakes.</p>
</blockquote>
<h2 id="the-llm-agent-version">The LLM agent version</h2>
<p>This problem is not unique to LLMs. Similar issues appear in APIs, command-line flags, protocol enums, medication names, service names, and airport codes.</p>
<p>LLM agents make the problem newly common because they combine:</p>
<ul>
<li>probabilistic language generation</li>
<li>exact symbolic control</li>
<li>natural-language prompts around short labels</li>
<li>tool calls and routes with real side effects</li>
</ul>
<p>Example labels:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">A1
</span></span><span class="line"><span class="cl">AI
</span></span><span class="line"><span class="cl">Al
</span></span><span class="line"><span class="cl">prod
</span></span><span class="line"><span class="cl">production
</span></span><span class="line"><span class="cl">live
</span></span><span class="line"><span class="cl">docs.api
</span></span><span class="line"><span class="cl">doc.api
</span></span><span class="line"><span class="cl">FACTCHECK_API
</span></span><span class="line"><span class="cl">FACT_CHECK_API
</span></span></code></pre></div><p>These are not just strings. In an agent system, they may route work, call tools, select environments, expand macros, approve targets, or satisfy exact enum values.</p>
<p>The risk boundary is narrow. Similar labels matter when three conditions hold:</p>
<ul>
<li>the label is visible to the model or copied through natural language</li>
<li>the model or a human can choose or emit the label</li>
<li>downstream code treats the label as an exact control input</li>
</ul>
<p>A wrong valid label is worse than an invalid label. Invalid labels can fail validation. Wrong valid labels can pass validation and trigger the wrong action.</p>
<p>This matters less when routing is deterministic, internal IDs are hidden from the model, schemas constrain the choice, or a UI forces selection from canonical options.</p>
<p>So Wireword should not only ask whether two strings are similar. It should ask:</p>
<ul>
<li>What kind of label is this?</li>
<li>Can the model emit it?</li>
<li>Does a parser require an exact match?</li>
<li>What happens if the wrong label is chosen?</li>
<li>Does it target production or another external system?</li>
</ul>
<h3 id="generic-check-vs-agent-aware-check">Generic check vs agent-aware check</h3>
<p>Generic similarity check:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">docs.api / doc.api
</span></span><span class="line"><span class="cl">Reason: edit distance 1.
</span></span></code></pre></div><p>Agent-aware check:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">CRITICAL docs.api / doc.api
</span></span><span class="line"><span class="cl">Reason: route-name collision across different effects.
</span></span><span class="line"><span class="cl">Risk: read-only route is one edit away from external-write route.
</span></span><span class="line"><span class="cl">Fix: rename to ROUTE_DOCS_REVIEW and ROUTE_DOCS_PUBLISH.
</span></span></code></pre></div><p>Generic similarity check:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">prod / production / live
</span></span><span class="line"><span class="cl">Reason: related strings.
</span></span></code></pre></div><p>Agent-aware check:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">CRITICAL prod / production / live
</span></span><span class="line"><span class="cl">Reason: multiple production-like environment labels.
</span></span><span class="line"><span class="cl">Risk: agent may choose an inconsistent deployment target.
</span></span><span class="line"><span class="cl">Fix: use ENV_PRODUCTION as the only valid production label.
</span></span></code></pre></div><p>That is the product line: do not only lint strings. Lint control words by the action they can trigger.</p>
<h3 id="current-prototype-and-v1-plan">Current prototype and V1 plan</h3>
<p>The tool is <a href="https://github.com/brfid/wireword">Wireword</a>. V1 should stay small.</p>
<p>The current prototype now checks both layers:</p>
<ul>
<li><strong>raw labels:</strong> visual confusables, edit-distance-one pairs, case-only differences, punctuation-only differences, plural/stem collisions, and production-like aliases</li>
<li><strong>agent-aware labels:</strong> routes, tools, named agent handoffs, approval targets, macros, profiles, production-like environments, and exact enum values the model must emit</li>
</ul>
<p>That is enough to test the shape of the idea. The repo now has a small validation corpus with safe, dangerous, and malformed configs, plus a narrow FastMCP source extractor for tool names. It is still not a full agent security scanner.</p>
<p>The useful output is not just <code>these strings are similar</code>. It is <code>these strings are similar, the model can see or emit them, and confusing them could call the wrong tool, route work to the wrong place, or target the wrong environment</code>.</p>
<p>Representative targets:</p>
<ul>
<li>MCP servers with model-visible tools</li>
<li>router or handoff agents</li>
<li>graph-based agent workflows</li>
<li>skill/plugin systems with named routes</li>
<li>exact enum outputs consumed by parsers</li>
</ul>
<p>The repo should carry the detailed CLI examples, fixtures, and tests. This note only needs the argument.</p>
<h3 id="what-wireword-is-not">What Wireword is not</h3>
<p>Wireword is not:</p>
<ul>
<li>an agent framework</li>
<li>a prompt framework</li>
<li>a general security scanner</li>
<li>a replacement for schemas or constrained decoding</li>
<li>a proof that LLMs confuse every similar label</li>
<li>necessary when labels are hidden behind deterministic routing, internal IDs, or strict UI selection</li>
</ul>
<p>It is a narrow lint pass for labels that become model-visible or human-visible control inputs.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Telegraph codebooks might inspire useful linting for LLM agent control identifiers.</p>
<h2 id="sources">Sources</h2>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Nelson E. Ross, <a href="https://en.wikisource.org/wiki/How_to_Write_Telegrams_Properly"><em>How to Write Telegrams Properly</em></a> (1928), &ldquo;How Tolls Are Computed&rdquo; and &ldquo;Punctuation Marks.&rdquo; Ross explains domestic body-word billing, cable/radiogram address billing, and the rule that requested punctuation marks were counted and charged as words.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Western Union Telegraph Company, <a href="https://www.gutenberg.org/ebooks/62214.html.images"><em>The Proposed Union of the Telegraph and Postal Systems</em></a> (1869). Western Union gives the 1866 New York-to-Boston tariff as 30 cents for ten words, exclusive of address and signature.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>E. L. Bentley, <a href="https://www.jmcvey.net/cable/harmsworth_2.htm">&ldquo;Codes: Their Nature and Manipulation&rdquo;</a>, transcribed by John McVey. Bentley describes the two-letter-difference rule and explains that it prevents a one-letter mutilation from silently becoming another valid codeword.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>John McVey, <a href="https://jmcvey.net/cable/scans/ABC.htm">&ldquo;A.B.C. Telegraphic Codes, seven editions 1873-1936&rdquo;</a>. The page quotes the 1920 sixth edition on five-letter codewords built with at least a two-letter difference and notes the code&rsquo;s attention to Morse similarities.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Strachey&#39;s Principle: The Discipline That Makes Abstraction Work</title>
      <link>https://brfid.github.io/posts/stracheys-principle/</link>
      <pubDate>Mon, 23 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://brfid.github.io/posts/stracheys-principle/</guid>
      <description>&lt;p&gt;In 1965, Christopher Strachey changed his mind about machine code.&lt;/p&gt;
&lt;p&gt;He was building the General Purpose Macrogenerator — the GPM — at Cambridge to help write a compiler for the Combined Programming Language (a C precursor). The original plan was simple: mix machine code with macro calls where convenient. The GPM was designed to make this possible.&lt;/p&gt;
&lt;p&gt;He ended up abandoning the mix entirely. For the CPL compiler, all machine code would be incorporated as macro calls — even sections called only once, where defining a macro added no economy at all. It had started as a way to save effort. It became a principle.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>In 1965, Christopher Strachey changed his mind about machine code.</p>
<p>He was building the General Purpose Macrogenerator — the GPM — at Cambridge to help write a compiler for the Combined Programming Language (a C precursor). The original plan was simple: mix machine code with macro calls where convenient. The GPM was designed to make this possible.</p>
<p>He ended up abandoning the mix entirely. For the CPL compiler, all machine code would be incorporated as macro calls — even sections called only once, where defining a macro added no economy at all. It had started as a way to save effort. It became a principle.</p>
<p>Sixty years later, programmers and writers are asking the same questions about LLMs that Strachey&rsquo;s contemporaries were asking about macro systems. Whether the tools are legitimate. Whether work done through them counts.</p>
<h2 id="wheelers-subroutine">Wheeler&rsquo;s subroutine</h2>
<p>The groundwork comes thirteen years earlier. In 1952, David Wheeler presented a short paper at the ACM national meeting: &ldquo;The Use of Sub-Routines in Programmes.&rdquo;<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> Wheeler had been part of the EDSAC team at Cambridge — the group that built the first stored-program computer to run a practical program.</p>
<p>His paper introduces the library subroutine as a unit of abstraction. A subroutine is self-contained, reusable, testable in isolation. You call it; control returns to the point after the call. From the outside it behaves like a single instruction, even though it may be dozens. Wheeler&rsquo;s summary: &ldquo;All complexities should — if possible — be buried out of sight.&rdquo;</p>
<h2 id="the-asymmetry-problem">The asymmetry problem</h2>
<p>By the early 1960s, macro-assemblers were standard. You define a macro; the assembler expands it into an instruction sequence before assembly. It extends the instruction set without modifying the hardware.</p>
<p>Strachey identified a structural problem with all of them. A conventional macro takes text as parameters — register names, addresses, literal values, arbitrary strings — and produces <em>complete instructions</em> as output. The <em>domain</em> (what a function accepts as input) and <em>range</em> (what it produces as output) don&rsquo;t overlap — in Strachey&rsquo;s terms: &ldquo;the domain and range of these macro-functions do not overlap.&rdquo;<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> This limits composition. You cannot build complex behavior by nesting simpler behaviors, because the types don&rsquo;t match.</p>
<p>GPM eliminates this by making everything a character stream. Input is a stream of characters. Output is a stream of characters. A macro call is a pattern in the stream; its expansion is substituted back into the stream. That re-entry is the key step: if the expansion contains another macro call, that call gets expanded too. The system processes its own results. Because domain and range are now the same type, a macro call can appear anywhere a string can appear — as a parameter to another macro call, in the name position of a macro call, inside a macro&rsquo;s own definition. Recursion, conditionals, and higher-order behavior follow directly from the unification. Strachey notes they &ldquo;appeared in the wash.&rdquo;</p>
<p>This is also why the all-macros discipline was a structural requirement, not a preference. Allow raw machine code anywhere in the stream alongside macro calls and you have introduced a second type: patterns to be expanded, and instructions to be left alone. GPM would need rules to tell them apart. The uniformity breaks down, and with it the recursive model. The decision Strachey arrived at — no machine code in the source, only macro calls, even where defining a macro added no economy — was not aesthetic. It was the condition the system required.</p>
<p>The CPL compiler was written entirely in macro calls; the programmer never touched opcodes. Strachey noted the consequence: &ldquo;even very experienced programmers indeed tend to spend hours simulating its action when one of their macro definitions goes wrong.&rdquo; Power and opacity arrived together.</p>
<h2 id="the-pattern">The pattern</h2>
<p>Every major abstraction layer in computing has produced the same debate. Is this real programming? Do you understand what the machine is actually doing?</p>
<p>Subroutines raised it. Macro systems raised it. FORTRAN raised it — John Backus&rsquo;s team spent years arguing that compiled code could match hand-written machine code and therefore be trusted. High-level languages raised it. Each time, the layer was legitimized. Each time, the locus of required skill shifted upward.</p>
<p>The machine kept receding. The question kept returning.</p>
<h2 id="what-is-different-about-llms">What is different about LLMs</h2>
<p>The question is back. Whether LLM-assisted code is real programming. Whether text written with LLM help is real writing.</p>
<p>The legitimacy debate will resolve itself — not because history mandates it, but because using the best available tool and understanding its properties is good engineering. The interesting question is not whether the layer counts. It is what the layer actually is.</p>
<p>Every previous abstraction layer was deterministic and traceable. A macro expansion has a defined structure — given a name and the current state, you can compute the result by hand. A compiler&rsquo;s transformation of source to object code is, in principle, auditable. The failure modes are structural — unmatched brackets, undefined names — and they are reported precisely.</p>
<p>LLMs are not like this. The binding between a prompt and its output is not a defined rule you can look up. It is distributed across a weight matrix — a large table of numerical values produced by training on text you did not write and cannot inspect. The output is not deterministic. The failure modes are not structural — they are probabilistic. The model can fail silently, plausibly, confidently. There is no <code>Find</code> routine you can call to check the current state.</p>
<p>This is not an argument against the layer. It is a description of the layer&rsquo;s properties. Strachey never stated it as a principle — but one can be distilled from what he built and decided: the power of an abstraction layer comes from committing to it entirely. Partial adoption breaks the model.</p>
<p>Strachey was not alone in finding it. Dijkstra hit the same constraint with structured programming: full commitment to provable control structures was the condition for formal reasoning, and IBM&rsquo;s reduction of it to &ldquo;abolish goto&rdquo; lost that property entirely. Kay hit it with OOP — the real innovation was message-passing, not objects, and most languages adopted the lesser idea. Each discovered it independently.</p>
<p>I work this way too. The failure modes are real. The discipline is learning what they are.</p>
<p>The longer question is what that discipline evolves into. Strachey&rsquo;s commitment became infrastructure we now take for granted. The LLM equivalent is being designed now: verification practices, workflow conventions, the points where human judgment belongs.</p>
<p>That remains the right question.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Wheeler, D.J. (1952). &ldquo;The Use of Sub-Routines in Programmes.&rdquo; <em>Proceedings of the ACM national meeting</em>, Pittsburgh, pp. 235–236.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Strachey, C. (1965). &ldquo;A General Purpose Macrogenerator.&rdquo; <em>The Computer Journal</em>, 8(3), pp. 225–241.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
    <item>
      <title>Using CHANGELOG.md as LLM session memory</title>
      <link>https://brfid.github.io/posts/changelog-as-llm-memory/</link>
      <pubDate>Sat, 21 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://brfid.github.io/posts/changelog-as-llm-memory/</guid>
      <description>&lt;p&gt;Most LLM assistants don&amp;rsquo;t maintain memory between sessions. The standard workaround — a large &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; with everything in it — breaks down quickly. What&amp;rsquo;s more, it duplicates other content in your repo, growing the documentation maintenance surface without adding value.&lt;/p&gt;
&lt;p&gt;Lately I avoid this problem by treating &lt;code&gt;CHANGELOG.md&lt;/code&gt; as my LLM&amp;rsquo;s memory — specifically the &lt;code&gt;[Unreleased]&lt;/code&gt; section from the format standardized by &lt;a href=&#34;https://keepachangelog.com/&#34;&gt;Keep a Changelog&lt;/a&gt;, which becomes the primary mutable state document.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>Most LLM assistants don&rsquo;t maintain memory between sessions. The standard workaround — a large <code>CLAUDE.md</code> or <code>AGENTS.md</code> with everything in it — breaks down quickly. What&rsquo;s more, it duplicates other content in your repo, growing the documentation maintenance surface without adding value.</p>
<p>Lately I avoid this problem by treating <code>CHANGELOG.md</code> as my LLM&rsquo;s memory — specifically the <code>[Unreleased]</code> section from the format standardized by <a href="https://keepachangelog.com/">Keep a Changelog</a>, which becomes the primary mutable state document.</p>
<h2 id="why-it-works">Why it works</h2>
<p><a href="https://keepachangelog.com/">Keep a Changelog</a> defines a format most LLMs recognize on sight: a fenced <code>[Unreleased]</code> block at the top, dated releases below. Most LLMs recognize the convention: <code>[Unreleased]</code> is active work, dated entries are history.</p>
<p>That maps directly onto what you need for session continuity:</p>
<ul>
<li><strong><code>[Unreleased]</code></strong> — mutable, updated every session. Current state, active priorities, blockers, decisions pending. The model reads this first.</li>
<li><strong>Dated entries</strong> — append-only history. Evidence that decisions happened and why. The model reads these to reconstruct context if it needs depth.</li>
</ul>
<p>The AGENTS.md (or CLAUDE.md) file becomes stable configuration: conventions, file paths, source-of-truth map. It changes rarely. The CHANGELOG takes on everything that does change.</p>
<h2 id="the-session-start-instruction">The session start instruction</h2>
<p>One line at the top of <code>AGENTS.md</code> is enough:</p>
<pre tabindex="0"><code>Read CHANGELOG.md [Unreleased] at session start.
</code></pre><p>From there the model knows where it is, what&rsquo;s in flight, and what to do next — without re-explanation.</p>
<h2 id="what-goes-in-unreleased">What goes in [Unreleased]</h2>
<p>I use explicit subsections:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl"><span class="gu">## [Unreleased]
</span></span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### Current State
</span></span></span><span class="line"><span class="cl">One-paragraph snapshot. Where things stand right now.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### Active Priorities
</span></span></span><span class="line"><span class="cl">Ordered list of what needs to happen next.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### In Progress
</span></span></span><span class="line"><span class="cl">What the model started in the current session.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### Blocked
</span></span></span><span class="line"><span class="cl">Anything waiting on external action.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### Decisions Needed
</span></span></span><span class="line"><span class="cl">Open questions the model should surface, not resolve unilaterally.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="gu">### Recently Completed
</span></span></span><span class="line"><span class="cl">What just shipped. Moves to a dated entry on the next commit.
</span></span></code></pre></div><p>The model updates <code>[Unreleased]</code> at the end of each session. The next session reads it cold and picks up cleanly.</p>
<h2 id="what-this-is-not">What this is not</h2>
<p>This is not a replacement for good project documentation. Architectural decisions, integration details, and source-of-truth maps still belong in stable docs. The changelog is the <em>session state layer</em>, not the full context layer.</p>
<p>It also does not solve the problem of context window limits on large projects. It reduces the cost of context: the model loads a small, structured, current-state document instead of scanning a stale megafile.</p>
<h2 id="result">Result</h2>
<p>Sessions are shorter to start, more reliable to hand off, and easier to audit. The changelog does the work it was always supposed to do — track what changed and when — and the LLM does less redundant orientation work each time.</p>
<p>The format is well-understood, self-describing, and version-controlled. If you&rsquo;re already using Keep a Changelog, the only addition is a discipline: update <code>[Unreleased]</code> at the end of each session.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
