Field Notes - Dec 05, '25

Executive Signals

Agents are the new interns: schedule tiny merges that steadily raise baseline quality
Ensembles beat experts: diverse reviewers catch distinct defects before merge
Split planes, ship faster: local speed, cloud breadth prevent tool thrash
Noisy alerts : alert-to-PR pipelines convert detection into concrete fixes
Guardrails enable velocity: fenced automations move fast without compliance surprises
Time budgets over taste: workflows handle sleeps, retries, and durability

If you’re adding memory, guardrails, tools, and an operator studio, you’re rebuilding a framework. Pick one intentionally before bespoke adapters and scripts calcify. Favor a thin vendor adapter so protocol churn doesn’t cascade across your stack. Pilot in a sandbox with PMs and one backend service; expand only when review/merge times fall meaningfully.

Select frameworks with memory, guardrails, tool calling, simple RAG, workflows, HITL, and an operator studio
Keep MCP and similar protocols behind a thin adapter to absorb churn
Gate rollout on measurable cuts to review/merge time

When to Use Specialized AI Code Review Platforms

General models with redundancy carry you far. Bring in a specialized, org‑aware reviewer when throughput or variance overwhelms human cycles: think >30 PRs/week, >36 hours median time‑to‑merge, or reviewers as the bottleneck. Encode standards once; let the platform enforce, learn exceptions, and thread fixes. Keep human ownership on security and schema changes.

Add org‑aware reviewers when scale or variance degrades merge velocity
Encode standards centrally; use feedback to teach exceptions
Require human owners for security and schema diffs

Engineering

Automate One‑a‑Day PRs for Compound Gains

Use agentic “find‑and‑fix” loops to ship tiny, daily improvements: doc truthiness, missing tests, and slow queries. Maintain a repo map so agents chase cross‑service causes, not just noisy surfaces. Add a “do nothing if clean” rule so humans review and merge in minutes.

Schedule daily fixes per repo for docs, tests, and slowest queries
Maintain a cross‑service repo map to trace upstream causes
Enforce no‑op if clean; keep review effort measured in minutes

Run a Multi‑Model Code Review Gauntlet

Different reviewers catch different error classes. Put 2–3 distinct code reviewers (models/tools) on every PR, auto‑apply fixes, and re‑review until the signal drops. Gate on “no unresolved P1s,” CI green, and typecheck clean. Cap auto cycles at two, then a human ships or kills with a “ready to merge” pass.

Use architecture, program analysis/runtime, and UI/Docs reviewers in parallel
Gate merges on P1=0, CI green, and typecheck clean
Cap at two auto cycles; require a final “ready to merge” pass

Split Local IDE from Cloud Agents

Keep local dev fast and isolated; keep cloud automations cross‑repo and scheduled. Treat them as different planes until the market converges. Local gets agentic IDEs with per‑workspace worktrees so agents don’t collide. Cloud runners trigger from schedules and alerts and open PRs across repos. Hide model vendors behind a thin adapter you can swap in 4–6 months.

Isolate agent work in per‑workspace worktrees locally
Run cross‑repo automations on cloud runners via schedules/alerts
Abstract vendors behind a thin adapter for swap‑ability

Wire Observability Directly to PRs

Turn alerts into action. Pipe error rates and latency SLOs into an agent that files tickets, proposes diffs, and opens PRs. Review becomes the constraint, not detection. Include a nightly “optimize slowest query” job that preserves output shape while improving runtime. Rate‑limit fixes and require owner review on hot paths.

Build alert → ticket → PR as one automated path
Schedule nightly “optimize slowest query” with output‑shape guarantees
Rate‑limit agent fixes; require owner review on hot paths

Guardrails for Agentic Dev in Compliance Environments

Let agents move fast, but fence blast radius. Enforce tiered scanning: branches get lint/typecheck/quick scans; main and pre‑prod get full SAST/DAST/secret scans; prod deploys run policy checks with human sign‑off for sensitive paths. Protect config directories with mandatory human owners. Block agent edits to guardrail files, allow adjacent fixes.

Enforce tiered scans by environment and path sensitivity
Protect configs (linters, auth, encryption, migrations) with human ownership
Block agent edits to guardrail files; allow adjacent changes only

Pick Workflow Runners by Time Budget, Not Taste

Use workflow runners when tasks exceed an interactive session or need timers/checkpoints. Optimize for mental overhead per unit of throughput. If a job runs >15 minutes or relies on sleeps/retries, move it to workflows. If >60 minutes, chunk and persist state. Keep interactive tweaks local; reserve workflows for durable automations.

Move jobs >15 minutes or with timers into workflows; chunk >60 minutes
Prefer hosted runners with sleeps, retries, and HITL primitives
Keep interactive tweaks local; automate durable jobs in workflows