How this experiment evolved

A record of the corrections, naming changes, and methodological discoveries that shaped the experiment as it ran. The substance is on the main pages; this is the procedural history behind them.

Naming conventions — let, aim

Early arms in the experiment were named alphabetically (A, B, C, then H for the human). When new arms were added that varied a single dimension — the style brief — the natural temptation was version-bumping: C2, D2. That naming confused as much as it clarified: a "C2" sounds like a sequel to C, but the actual relationship is sibling-on-a-different-axis. The convention settled on let (permission, "you may translate freely") and aim (instruction, "the school is …"). C2 became C-let; the broader-corpus arm became D-let; its school-instructed sibling, D-aim. The suffix names the kind of nudge each arm carries, not a version.

The text-conversion pipeline

When Agent D's Step-1 reading began, the Claude Code session hit a 32 MB request-size limit. The cause: source materials were PDFs and EPUBs; each Read tool call serialized the full binary into the request; cumulative reads pushed past the limit. The fix wasn't a workaround — it was structural. All source materials (Tergit's primary corpus, scholarly works, the anglophone reference writers added for D, the German Effingers) were pre-converted once, centrally, into a single text archive: 22 .txt files, about 2.2 million words. Every subsequent arm draws from that archive. No instance ever extracts binary text; no request gets near the size limit. Mitford's EPUB had a malformed manifest pandoc refused, so a Python fallback (zip + HTML-strip) handled that one. The clean-text archive sits beside the raw source archive for provenance.

The kickoffs-as-record discipline

Each AI arm's setup is in two parts: a system prompt (the agent's identity and standing constraints) and a sequence of per-step kickoffs (the user-messages that trigger each phase: "Begin step 1", "Begin step 2", and so on). System prompts were always saved; per-step kickoffs were not — they lived only in the operator's chat history and would have been lost. Mid-experiment, the discipline was established: every kickoff is recorded verbatim in agent_X/kickoffs.md, dated, alongside the system prompt. The original arms' kickoffs were backfilled from chat-history records. The convention: kickoffs are protocol (reused across arms where the shape is shared); system prompts are the experimental variable.

The Step-2 kickoff ambiguity

Step 1's kickoff said: "Log your reading notes to notes/step1.md as you go." Step 3's kickoff said: "Log notes to notes/step3.md." Step 2's kickoff said only: "leaning on your notes as memory" — no file name, no explicit instruction to write notes. B's instance inferred the convention from Steps 1 and 3 and produced notes/step2.md. D's instance read the silence literally and went straight to persona.md, consolidating step-2 reading into the persona document. Both readings were defensible against the brief. The fix: rewrite the Step-2 kickoff with an explicit notes/step2.md instruction and a clear distinction between "analytical record" (notes) and "integrated voice" (persona). D's Step 2 was rewound and re-sent with the corrected kickoff; the new persona, this time alongside step-2 notes, came out cleaner-formed. The corrected version is the canonical Step-2 brief going forward.

The persona-snapshot discipline

Arms with personas (A, B, D) may revise their persona after reading the novel — the Step-3 kickoff explicitly permits it. Without intervention, the post-Step-2 state is overwritten by the revision. The coordinator clones persona.mdpersona_before_novel.md at the close of Step 2; the post-novel revision then becomes the new persona.md. The diff between the two is recoverable. (A did not revise. B revised by a single inserted paragraph. D revised substantially.)

D-aim's architecture, in three iterations

The architecture for D-aim — the sibling arm that translates the same persona under explicit school instruction rather than permission — went through three corrections during setup.

First attempt: D-aim as a fully separate agent with its own inputs and its own system prompt. Wrong — that would have built a different persona, defeating the controlled comparison.

Second attempt: D-aim and D-let as parallel forks in the same workspace, writing to different filenames (translation.md vs translation_d2.md). Wrong — both translations in the workspace at once risks confusing the instance about which file is "its" output.

Third attempt (correct): D-aim is the same instance as D-let, conversation-rewound to before Step 4, with a different Step-4 kickoff sent. D-let's outputs are physically archived out of the workspace before D-aim's run; the workspace holds one translation at a time. Identical persona, identical reading, identical context — only the Step-4 kickoff differs.

The principle-only brief revision

D-aim's first-draft Step-4 kickoff enumerated five operations: anglicize foreign vocabulary, render dialect plainly, fuse paragraphs, find rhyme for embedded lyric, domesticate culturally specific references. Each operation is downstream of the principle "render naturally for the target reader" — but read as a checklist, the brief invited box-ticking rather than principled reasoning. The rhyme bullet specifically read as an instruction to perform the most extreme of the human translator's distinctive moves. The revision: principle-only. "Domesticating translation, in Schleiermacher's sense — bring the author to the reader, not the reader to the author. Every choice — scene by scene, word by word — follows from this principle." The instance derives the operations from the principle. C-let's permission brief still has the enumerated operations; if the experiment is ever rerun cleanly, the principle-only version is the cleaner brief for both sides.

A-craft considered, A-N designed, both deferred

An "A-craft" arm was discussed as an additional probe of the persona-source axis — a persona distilled from the human translator's actual translation work (with Chapter 25 held out) rather than from her writing-about. The clinical-summary approach. Considered, but the more informative experimental design is A-N: a dose-response gradient where each arm forks from A's pre-translation state with a different amount of the human translator's parallel-text examples. A-0 receives only distilled conventions; A-1, A-3, A-5, A-10 receive that plus 1, 3, 5, 10 complete parallel chapters. The held-out chapter set must be nested (A-1 ⊂ A-3 ⊂ A-5 ⊂ A-10) so the dose isn't confounded by chapter choice. The design tests whether mimicry converges with sample size — and if so, with what shape of curve. Significant cost (5 arms × up to 5 passes each); phased rollout (A-0 and A-10 first as anchors) is the right shape if pursued. Deferred. A-craft was dropped.


The substance — the experiment's design, the arms, the personas, the translations, the analyses — is on the main pages: Method, The Personas, Observations, Findings, Translations, Appendix.