Field Notes - Oct 23, '25

Executive Signals

Determinism is the new agent: script the happy path, reserve LLMs for variance
Frontier proves, specialist ships: distill to cut p99 and unit economics
Chat is a dead end: prompt-controlled flows deliver outcomes and QA at scale
Time is the scarcest budget: single-threaded owners beat context-switch decay on cross-functional AI projects
ROI over rhetoric: AI framing opens doors, outcomes keep the lights on

Packaging automation as “AI” can unlock resourcing, but the architecture must reflect where an LLM adds value versus simple logic. Fund against measurable outcomes like hours saved, error rate, and SLA lift. Cap spend and time so experiments don’t sprawl.

Decide AI vs code at each step; document why
Cap per-run LLM spend and runtime; alert on overages
Ship a 1–2 sprint proof with before/after ops metrics

Capacity Guardrails For Cross-Functional Bets

Without explicit time, context switching kills AI projects. Assign single-threaded owners, publish allocations, and retire legacy work to protect focus. Demo weekly and delete work when capacity slips.

Protect 30–50% allocation for six weeks per workstream
Weekly demos with scope cuts, not deferrals
Track load on key ICs and backfill before new scope

Hire T-Shapes, Not Ten-Year Unicorns

No one has a decade of LLM experience. Optimize for T-shaped builders with one deep spike and broad product/infra instincts. Bias to first-principles proposals and weekly force-ranking to keep velocity.

Write roles for T-shapes with single-metric ownership
Reward first-principles designs over legacy patterns
Force-rank weekly; delete below-the-line work

Marketing

Turn The Homepage Into A Live Agent

Static sites waste intent. Use an agent plus retrieval to adapt copy, examples, and CTAs to visitor context with visible citations. Expose lightweight tool calls and reasoning to build trust, and degrade gracefully when retrieval is empty or models stall.

Store minimal session context and purge on end
Log prompts, tools, and sources for QA; surface citations inline
Fallback cleanly: static hero, cached answers, clear errors

Customer Success

Agent-Built QBRs, Human-Delivered Decisions

Unify customer, product, and marketing data, then schedule agents to assemble QBRs by segment. Humans present; agents compile. You trade gathering time for decision time and spotlight anomalies first.

Centralize account/product/owner data; define a QBR schema once
Run agents quarterly per segment; auto-distribute to owners
Lead with anomalies and KPI deltas; narrative second

Product

Vertical Agents, Prompt-Controlled Workflows

End users want outcomes, not prompting. Replace open-ended chat with structured intents, buttons, and forms. Prewrite prompts, pin context, and return typed outputs the business can QA. For narrow domains, you can ship a vertical agent fast by focusing on system prompts, tool design, and retrieval.

Replace free chat with intent collectors tied to your data model
Version and A/B prompts like code; measure answer accuracy
Fail closed to a human when confidence or coverage is low

Cheap Fraud Friction For Promotions

Rewards and rebates attract junk-email abuse. Combine deterministic rules with lightweight ML/LLM scoring at the edge. Pay high-confidence claims instantly, route uncertain ones to triage, and use labeled outcomes to tighten thresholds.

Block disposable domains, rate-limit by device/IP, dedupe by fingerprint
Risk score above threshold → manual review; below → instant fulfillment
Store labeled outcomes to retrain and refine thresholds

Engineering

Deterministic First, Agents For Uncertainty

Fully agentic browsers are flaky, slow, and costly for repeatable tasks. Script the ≥80% stable path with Playwright or Selenium, then invoke an LLM only when DOMs drift or labels vary. Running inside a user’s real browser can preserve sessions. Instrument every step and persist artifacts for audit and retries. Design idempotency so you can resume from step N.

Gate LLM use behind explicit triggers on “uncertain” nodes
Capture screens, DOM states, and HAR; attach replay links to tickets
Recover from step N without replaying 1..N−1

Two-Stage Rollout For AI Coding

Adopt AI coding as a capability. Stage 1: autocomplete to raise acceptance and speed. Stage 2: agent mode for multi-file changes behind PR review. Aim for fewer context switches, not auto-merges.

Pilot with a small cohort; target >50% suggestion acceptance in 2 weeks
Agents open PRs with rationale and tests; no direct pushes
Track revert rate and cycle time deltas per repo

Prove With Frontier, Ship A Distilled Specialist

Validate capability and ROI with a frontier model, then migrate to a smaller specialist via teacher–student distillation and quantization. Balance accuracy, p99 latency, and unit cost, and run the student on your own infra to stabilize performance.

Trigger migration at >100k daily calls or p99 need <300 ms
Generate task-specific synthetic data; train within ~1–2 pp of evals
Operate the student model on your infra to remove per-call profit