
Field Notes - Oct 23, '25
Executive Signals
- Determinism is the new agent: script the happy path, reserve LLMs for variance
- Frontier proves, specialist ships: distill to cut p99 and unit economics
- Chat is a dead end: prompt-controlled flows deliver outcomes and QA at scale
- Time is the scarcest budget: single-threaded owners beat context-switch decay on cross-functional AI projects
- ROI over rhetoric: AI framing opens doors, outcomes keep the lights on
CEO
AI As Air Cover, ROI As Contract
Packaging automation as “AI” can unlock resourcing, but the architecture must reflect where an LLM adds value versus simple logic. Fund against measurable outcomes like hours saved, error rate, and SLA lift. Cap spend and time so experiments don’t sprawl.
- Decide AI vs code at each step; document why
- Cap per-run LLM spend and runtime; alert on overages
- Ship a 1–2 sprint proof with before/after ops metrics
Capacity Guardrails For Cross-Functional Bets
Without explicit time, context switching kills AI projects. Assign single-threaded owners, publish allocations, and retire legacy work to protect focus. Demo weekly and delete work when capacity slips.
- Protect 30–50% allocation for six weeks per workstream
- Weekly demos with scope cuts, not deferrals
- Track load on key ICs and backfill before new scope
Hire T-Shapes, Not Ten-Year Unicorns
No one has a decade of LLM experience. Optimize for T-shaped builders with one deep spike and broad product/infra instincts. Bias to first-principles proposals and weekly force-ranking to keep velocity.
- Write roles for T-shapes with single-metric ownership
- Reward first-principles designs over legacy patterns
- Force-rank weekly; delete below-the-line work
Marketing
Turn The Homepage Into A Live Agent
Static sites waste intent. Use an agent plus retrieval to adapt copy, examples, and CTAs to visitor context with visible citations. Expose lightweight tool calls and reasoning to build trust, and degrade gracefully when retrieval is empty or models stall.
- Store minimal session context and purge on end
- Log prompts, tools, and sources for QA; surface citations inline
- Fallback cleanly: static hero, cached answers, clear errors
Customer Success
Agent-Built QBRs, Human-Delivered Decisions
Unify customer, product, and marketing data, then schedule agents to assemble QBRs by segment. Humans present; agents compile. You trade gathering time for decision time and spotlight anomalies first.
- Centralize account/product/owner data; define a QBR schema once
- Run agents quarterly per segment; auto-distribute to owners
- Lead with anomalies and KPI deltas; narrative second
Product
Vertical Agents, Prompt-Controlled Workflows
End users want outcomes, not prompting. Replace open-ended chat with structured intents, buttons, and forms. Prewrite prompts, pin context, and return typed outputs the business can QA. For narrow domains, you can ship a vertical agent fast by focusing on system prompts, tool design, and retrieval.
- Replace free chat with intent collectors tied to your data model
- Version and A/B prompts like code; measure answer accuracy
- Fail closed to a human when confidence or coverage is low
Cheap Fraud Friction For Promotions
Rewards and rebates attract junk-email abuse. Combine deterministic rules with lightweight ML/LLM scoring at the edge. Pay high-confidence claims instantly, route uncertain ones to triage, and use labeled outcomes to tighten thresholds.
- Block disposable domains, rate-limit by device/IP, dedupe by fingerprint
- Risk score above threshold → manual review; below → instant fulfillment
- Store labeled outcomes to retrain and refine thresholds
Engineering
Deterministic First, Agents For Uncertainty
Fully agentic browsers are flaky, slow, and costly for repeatable tasks. Script the ≥80% stable path with Playwright or Selenium, then invoke an LLM only when DOMs drift or labels vary. Running inside a user’s real browser can preserve sessions. Instrument every step and persist artifacts for audit and retries. Design idempotency so you can resume from step N.
- Gate LLM use behind explicit triggers on “uncertain” nodes
- Capture screens, DOM states, and HAR; attach replay links to tickets
- Recover from step N without replaying 1..N−1
Two-Stage Rollout For AI Coding
Adopt AI coding as a capability. Stage 1: autocomplete to raise acceptance and speed. Stage 2: agent mode for multi-file changes behind PR review. Aim for fewer context switches, not auto-merges.
- Pilot with a small cohort; target >50% suggestion acceptance in 2 weeks
- Agents open PRs with rationale and tests; no direct pushes
- Track revert rate and cycle time deltas per repo
Prove With Frontier, Ship A Distilled Specialist
Validate capability and ROI with a frontier model, then migrate to a smaller specialist via teacher–student distillation and quantization. Balance accuracy, p99 latency, and unit cost, and run the student on your own infra to stabilize performance.
- Trigger migration at >100k daily calls or p99 need <300 ms
- Generate task-specific synthetic data; train within ~1–2 pp of evals
- Operate the student model on your infra to remove per-call profit