homeblogabout
  • rss

  • twitter

  • linkedin

© 2025

Field Notes

Field Notes are fast, from-the-trenches observations. Time-bound and may age poorly. Summarized from my real notes by . Optimized for utility. Not investment or legal advice.

Notebook background
░░░░░░░▄█▄▄▄█▄
▄▀░░░░▄▌─▄─▄─▐▄░░░░▀▄
█▄▄█░░▀▌─▀─▀─▐▀░░█▄▄█
░▐▌░░░░▀▀███▀▀░░░░▐▌
████░▄█████████▄░████
=======================
Field Note Clanker
=======================
⏺ Agent start
│
├── 1 data sources
└── Total 24k words
⏺ Spawning 1 Sub-Agents
│
├── GPT-5: Summarize → Web Search Hydrate
├── GPT-5-mini: Score (Originality, Relevance)
└── Return Good Notes
⏺ Field Note Agent
│
├── Sorted to 4 of 7 sections
├── Extracting 5 key signals
└── Posting Approval
⏺ Publishing
┌────────────────────────────────────────┐
│ Warning: Field notes are recursively │
│ summarized by agents. These likely age │
│ poorly. Exercise caution when reading. │
└────────────────────────────────────────┘

Field Notes - Oct 17, '25

Executive Signals

  • Agents are the new apps: ship into chat, not standalone surfaces
  • Tall then small: validate with big models, then distill to cheap
  • Durability over orchestration: queues, retries, checkpoints beat premature platforms
  • Vibes before evals: founder runs matter until guardrails are needed
  • Alarms at night : agents triage first, humans only with blast radius

CEO

Pick Hated, Structured Work First

The fastest adoption comes from automating the repetitive, clearly bounded tasks everyone dreads. Start with queues that have crisp inputs and objective “done” states, keep a human as the final decision-maker, and enforce a kill switch if quality doesn’t exceed a 90% golden-path bar in two sprints.

  • Ask each function lead for their most‑hated weekly task; rank by volume × clarity
  • Constrain V1 to one source, one action, one decision gate; human approves
  • If golden paths miss twice, pivot to a clearer use case

Product

Ship Agents Where People Already Work

Agents stick when they live inside chat, email, or ticketing—not a new app. Design for strict platform deadlines and ephemeral conversations: acknowledge immediately, continue asynchronously, stream partials for risky steps, and log every tool call so sessions can resume after crashes.

  • Always reply before platform timeouts; queue follow‑ups in background
  • Stream intermediate results and request confirmation on destructive actions
  • Persist tool calls and state to enable resumable workflows

Vibes First, Evals Second

Early signal is qualitative: sit with the product and run real tasks until the experience feels right. When usage and contributors grow, move to CI‑backed evals that guard against regressions. Features that can’t pass golden‑path checks within two sprints should be cut or rethought.

  • Maintain 10–20 golden‑path tasks; demo them daily
  • Add CI evals once prompts/tools stabilize and there’s >1 builder
  • Remove or redesign features that fail two consecutive golden‑path runs

Engineering

Keep Orchestration Boring and Durable

Most failures come from platformizing too early. Favor a single service, single queue, and single datastore with durability patterns over multi‑agent schedulers. Implement sagas, idempotent tool endpoints, and at‑least‑once retries so work survives crashes and duplicates without surprises.

  • Enforce a 30‑day “no platform” rule; one service/queue/datastore
  • Checkpoint after every external call; recover with saga steps
  • Make retries idempotent from day one

Tool Protocols vs. Native Tools

Use a standard tool protocol only when third parties will build tools or you’re extending another agent. If you control both sides, native integrations win for latency, security, and simplicity. Prototype with a protocol if it speeds learning, but leave a shim for migration.

  • Choose native when security is strict, latency tight, schemas predictable
  • Keep a swappable shim to move between protocol and native
  • Audit tool permissions quarterly; least privilege by default

Model Strategy: Prove With Tall, Run With Small

When an agent underperforms, first validate ceiling with a slower, higher‑reasoning model. If it works, distill to a smaller, cheaper model while preserving behavior. Keep prompts and tools model‑agnostic, and set SLOs for cost, latency, and success before and after swaps.

  • Maintain “thinky” (quality) and “fast” (default) model paths
  • Version prompts with model‑specific tests in a registry
  • Track cost/task, p50/p95 latency, and success rate across swaps

Alerting Without Pager PTSD

Route anomalies to an agent first to gather context, correlate signals, and propose hypotheses with links and repro steps. Only escalate to humans when impact is clear. Tune thresholds for the agent’s time, not a person’s sleep, and continuously measure suppressed false positives.

  • All alerts → agent triage → human only with repro and blast radius
  • Require “can wait” vs. “wake now” labels with justification
  • Review suppressed false positives monthly and tighten rules

Making a Broken Fixer Agent Useful

Before giving up on a code/error‑fixing agent, raise its thinking ceiling and feed it better examples. Combine higher‑reasoning runs with retrieval of similar diffs, exemplar prompts, and plan→act→verify scaffolding that executes unit tests. If tall‑model + exemplars still fail within 24–48 hours, change the problem.

  • Build a retrieval corpus of past fixes (diff, error, tests) as a tool
  • Add structured plan→act→verify with automated test execution
  • Time‑box: if no gains in 1–2 days, pick a different target

Customer Success

Enterprise Agent Adoption Ladder

A durable rollout pattern: vendor builds the first agent with the customer, the second is customer‑built with vendor shadowing, the third is customer‑owned end‑to‑end. Optimize for capability transfer, not dependency, with explicit graduation criteria and shared internal practices.

  • Define graduation: owner, runbooks, on‑call, and KPI deltas
  • Stand up an internal agent guild and shared tool catalog
  • Measure weekly active users and % of workflow handled by the agent
PreviousOct 16, 2025
NextOct 19, 2025
Back to Blog