Field Notes - Nov 29, '25

Executive Signals

Agents are the new linters: pre-screen code, humans judge intent and risk
Compute over curation: generate variants, spend judgment choosing the winner
Work sized by minutes: fit tasks to today’s agent capability envelope
Previews before people: ephemeral tests catch breakage earlier than reviews
Governance beats automation: human accountability layer persists as models outperform

Even if agents surpass humans on code quality, accountability doesn’t move. Keep a human gate between stages and scope agent permissions by blast radius. Log decisions and artifacts so the audit trail stands up in front of a board or customer.

Require human sign-off for production merges
Log agent prompts, plans, and diffs in the PR
Restrict agent scopes and tokens by blast radius

Engineering

AI Review Before Human Review

Run each PR through 2–3 dissimilar code-review agents before a human look. Different models catch each other’s blind spots, reducing trivial comments. Humans then focus on intent, system effects, and customer requirements.

Add AI review checks to PRs; gate on passing lint, types, and tests
Re-run agents after fixes until feedback is nitpicks only
Reserve human review for intent and cross-system impacts

Best-of-N Makes Compute Disposable

For non-critical changes, ask agents to produce multiple plans or patches in parallel and select the best. Treat compute as cheap, judgment as scarce; most drafts get thrown away, by design.

Generate 3–5 variants for a task; select, don’t shepherd
Cap per-run spend; archive losing variants for recurring patterns
Use on refactors and UI glue, not data migrations

Size Work by Human Minutes, Not Tokens

Frontier models reliably complete work that takes a strong developer 30–60 minutes end-to-end, and the ceiling keeps rising. Slice tickets to match that success band; re-slice anything that stalls.

Write tickets a strong dev could finish in about one hour
If an agent loops twice or exceeds a timebox, split the task
Revisit chunk size quarterly as models improve

Preview Environments + Browser Automation as the PR Gate

Spin up an ephemeral environment per PR with a forked database. Run browser-driven smoke and critical-journey tests against that URL to fail fast before humans review broken flows.

Auto-deploy unique preview URLs from CI
Run scripted browser checks; attach logs and video on failure
Block merges on preview test failures

Agents.md + Custom Linters Prevent Style Drift

Codify house rules in an Agents.md that agents read first: folder conventions, “done” criteria, and non-negotiables. Backstop with custom lint rules for known foot-guns; agents will learn to satisfy them.

Put repo norms and prohibited patterns in Agents.md
Add bespoke lint rules for crash-prone edge cases
Run lints during agent execution and in CI

Daily Kaizen via Background Agent PRs

Schedule background agents to open small, safe PRs that raise hygiene: missing tests, tighter types, doc fixes across repos. Humans review and merge without stealing focus from feature work.

Ship 1–2 automated PRs per repo per day
Assign owners to triage before standup
Track acceptance rate and defect deltas

Let Agents Map the Codebase for Onboarding

Have a terminal agent traverse the repo and narrate data flows and component interactions, then convert that into system diagrams. Faster onboarding improves prompts and downstream results.

Prompt for end-to-end “user request to response” narratives
Render diagrams from the narrative and store them in-repo
Refresh the maps after major refactors