
Field Notes - Nov 29, '25
Executive Signals
- Agents are the new linters: pre-screen code, humans judge intent and risk
- Compute over curation: generate variants, spend judgment choosing the winner
- Work sized by minutes: fit tasks to today’s agent capability envelope
- Previews before people: ephemeral tests catch breakage earlier than reviews
- Governance beats automation: human accountability layer persists as models outperform
CEO
Keep the Human Accountability Layer
Even if agents surpass humans on code quality, accountability doesn’t move. Keep a human gate between stages and scope agent permissions by blast radius. Log decisions and artifacts so the audit trail stands up in front of a board or customer.
- Require human sign-off for production merges
- Log agent prompts, plans, and diffs in the PR
- Restrict agent scopes and tokens by blast radius
Engineering
AI Review Before Human Review
Run each PR through 2–3 dissimilar code-review agents before a human look. Different models catch each other’s blind spots, reducing trivial comments. Humans then focus on intent, system effects, and customer requirements.
- Add AI review checks to PRs; gate on passing lint, types, and tests
- Re-run agents after fixes until feedback is nitpicks only
- Reserve human review for intent and cross-system impacts
Best-of-N Makes Compute Disposable
For non-critical changes, ask agents to produce multiple plans or patches in parallel and select the best. Treat compute as cheap, judgment as scarce; most drafts get thrown away, by design.
- Generate 3–5 variants for a task; select, don’t shepherd
- Cap per-run spend; archive losing variants for recurring patterns
- Use on refactors and UI glue, not data migrations
Size Work by Human Minutes, Not Tokens
Frontier models reliably complete work that takes a strong developer 30–60 minutes end-to-end, and the ceiling keeps rising. Slice tickets to match that success band; re-slice anything that stalls.
- Write tickets a strong dev could finish in about one hour
- If an agent loops twice or exceeds a timebox, split the task
- Revisit chunk size quarterly as models improve
Preview Environments + Browser Automation as the PR Gate
Spin up an ephemeral environment per PR with a forked database. Run browser-driven smoke and critical-journey tests against that URL to fail fast before humans review broken flows.
- Auto-deploy unique preview URLs from CI
- Run scripted browser checks; attach logs and video on failure
- Block merges on preview test failures
Agents.md + Custom Linters Prevent Style Drift
Codify house rules in an Agents.md that agents read first: folder conventions, “done” criteria, and non-negotiables. Backstop with custom lint rules for known foot-guns; agents will learn to satisfy them.
- Put repo norms and prohibited patterns in Agents.md
- Add bespoke lint rules for crash-prone edge cases
- Run lints during agent execution and in CI
Daily Kaizen via Background Agent PRs
Schedule background agents to open small, safe PRs that raise hygiene: missing tests, tighter types, doc fixes across repos. Humans review and merge without stealing focus from feature work.
- Ship 1–2 automated PRs per repo per day
- Assign owners to triage before standup
- Track acceptance rate and defect deltas
Let Agents Map the Codebase for Onboarding
Have a terminal agent traverse the repo and narrate data flows and component interactions, then convert that into system diagrams. Faster onboarding improves prompts and downstream results.
- Prompt for end-to-end “user request to response” narratives
- Render diagrams from the narrative and store them in-repo
- Refresh the maps after major refactors