homeblogabout
  • rss

  • twitter

  • linkedin

© 2025

Field Notes

Field Notes are fast, from-the-trenches observations. Time-bound and may age poorly. Summarized from my real notes by . Optimized for utility. Not investment or legal advice.

Notebook background
░░░░░░░▄█▄▄▄█▄
▄▀░░░░▄▌─▄─▄─▐▄░░░░▀▄
█▄▄█░░▀▌─▀─▀─▐▀░░█▄▄█
░▐▌░░░░▀▀███▀▀░░░░▐▌
████░▄█████████▄░████
=======================
Field Note Clanker
=======================
⏺ Agent start
│
├── 1 data sources
└── Total 15.4k words
⏺ Spawning 1 Sub-Agents
│
├── GPT-5: Summarize → Web Search Hydrate
├── GPT-5-mini: Score (Originality, Relevance)
└── Return Good Notes
⏺ Field Note Agent
│
├── Sorted to 2 of 7 sections
├── Extracting 5 key signals
└── Posting Approval
⏺ Publishing
┌────────────────────────────────────────┐
│ Warning: Field notes are recursively │
│ summarized by agents. These likely age │
│ poorly. Exercise caution when reading. │
└────────────────────────────────────────┘

Field Notes - Nov 29, '25

Executive Signals

  • Agents are the new linters: pre-screen code, humans judge intent and risk
  • Compute over curation: generate variants, spend judgment choosing the winner
  • Work sized by minutes: fit tasks to today’s agent capability envelope
  • Previews before people: ephemeral tests catch breakage earlier than reviews
  • Governance beats automation: human accountability layer persists as models outperform

CEO

Keep the Human Accountability Layer

Even if agents surpass humans on code quality, accountability doesn’t move. Keep a human gate between stages and scope agent permissions by blast radius. Log decisions and artifacts so the audit trail stands up in front of a board or customer.

  • Require human sign-off for production merges
  • Log agent prompts, plans, and diffs in the PR
  • Restrict agent scopes and tokens by blast radius

Engineering

AI Review Before Human Review

Run each PR through 2–3 dissimilar code-review agents before a human look. Different models catch each other’s blind spots, reducing trivial comments. Humans then focus on intent, system effects, and customer requirements.

  • Add AI review checks to PRs; gate on passing lint, types, and tests
  • Re-run agents after fixes until feedback is nitpicks only
  • Reserve human review for intent and cross-system impacts

Best-of-N Makes Compute Disposable

For non-critical changes, ask agents to produce multiple plans or patches in parallel and select the best. Treat compute as cheap, judgment as scarce; most drafts get thrown away, by design.

  • Generate 3–5 variants for a task; select, don’t shepherd
  • Cap per-run spend; archive losing variants for recurring patterns
  • Use on refactors and UI glue, not data migrations

Size Work by Human Minutes, Not Tokens

Frontier models reliably complete work that takes a strong developer 30–60 minutes end-to-end, and the ceiling keeps rising. Slice tickets to match that success band; re-slice anything that stalls.

  • Write tickets a strong dev could finish in about one hour
  • If an agent loops twice or exceeds a timebox, split the task
  • Revisit chunk size quarterly as models improve

Preview Environments + Browser Automation as the PR Gate

Spin up an ephemeral environment per PR with a forked database. Run browser-driven smoke and critical-journey tests against that URL to fail fast before humans review broken flows.

  • Auto-deploy unique preview URLs from CI
  • Run scripted browser checks; attach logs and video on failure
  • Block merges on preview test failures

Agents.md + Custom Linters Prevent Style Drift

Codify house rules in an Agents.md that agents read first: folder conventions, “done” criteria, and non-negotiables. Backstop with custom lint rules for known foot-guns; agents will learn to satisfy them.

  • Put repo norms and prohibited patterns in Agents.md
  • Add bespoke lint rules for crash-prone edge cases
  • Run lints during agent execution and in CI

Daily Kaizen via Background Agent PRs

Schedule background agents to open small, safe PRs that raise hygiene: missing tests, tighter types, doc fixes across repos. Humans review and merge without stealing focus from feature work.

  • Ship 1–2 automated PRs per repo per day
  • Assign owners to triage before standup
  • Track acceptance rate and defect deltas

Let Agents Map the Codebase for Onboarding

Have a terminal agent traverse the repo and narrate data flows and component interactions, then convert that into system diagrams. Faster onboarding improves prompts and downstream results.

  • Prompt for end-to-end “user request to response” narratives
  • Render diagrams from the narrative and store them in-repo
  • Refresh the maps after major refactors
PreviousNov 26, 2025
NextNo future notes
Back to Blog