Field Notes - Sept 27, '25

Executive Signals

Seconds beat scope: first signal in seconds earns trust; deep checks can trail.
Outcomes over tokens: price by resolution; protect margins against model and browser costs.
Start big, then small: evals enable step-down routing without noticeable quality loss.
Safety is containment: injection is unsolved; enforce egress and hardened renderers.
Streams beat gates: show value live; capture emails on completion, not upfront.
IO is the bottleneck: choose non-billing wait compute and workflow engines for long tasks.

Willingness to pay is rising, but inference, browsers, and orchestration inflate COGS. Charge per outcome, not per token or minute. Design for healthy gross margins from day one and require fast payback on internal agents.

Tie tiers to measurable outputs; audit unit economics monthly
Target healthy gross margins after model, browser, and workflow costs
For internal agents, require sub-6-month payback on build and run

Marketing

Stream Value, Capture Email on Completion

Security buyers bounce on honeypot-style gates. Stream core results live and offer a practical “email me when complete” for multi-minute portions. Keep a sticky “Schedule a walkthrough” CTA. Blur only product-only sections, not the core value.

Remove upfront walls; convert on utility, not friction
Add “notify on completion” to capture work emails without stalling
Use blur-gates surgically for premium insights, not basics

Pre-cached Examples Kill Waiting

Zero initial wait with pre-rendered examples across common verticals. Show clean and messy cases, and explain scoring so prospects learn the rubric before scanning their own site.

Provide 6–10 examples across news, retail, healthcare, and finance
Label good and bad elements; mirror the live report UI
Rotate examples quarterly to reflect real-world patterns

A crisp variant outperforms raw cookie dumps: scan one page in parallel under Accept versus Reject from a few regions, then diff vendors that still load. Summarize offenders and severity, with a remediation path into the full product.

Parallelize regions and consent choices without deep clicking
Output vendors-by-choice-by-region with severity labels
Make it shareable with an executive summary

Product

Privacy Lite in 10 Seconds

Front-door value should be a privacy summary that returns first findings in seconds, then streams the rest. Deep PCI or HIPAA checks often require payment or login flows, so run them in the background or reserve for demos.

Target time-to-first-signal under 10 seconds
Complete light crawls in under 60 seconds
Reveal PCI or HIPAA checks only when the agent finishes

Stream Findings by Buyer Priority

Sequence outputs for decision value: scripts and vendors first, then response headers, data transfer map, cookies, and cookie categorization. Use WebSockets to stream and render the map early with local geo or IP lookups. Push LLM-heavy cookie classification to the background.

Ensure first payload in 5–10 seconds with visible progress
Emit raw cookie lists first; add categories later
Keep latency-sensitive enrichments local or cached

Payment Page Risk Scoring

On payment pages, classify third parties as authorized processors or risk. Unknown equals risk by default. Present a simple grade tied to unauthorized vendors, missing headers, and data egress destinations, with an exportable audit summary.

Maintain an editable allowlist; flag everything else
Show counts and top offenders with linked rationale
Offer an exportable summary for audits

Design the Unhappy Path First

Make the one-shot path excellent and use agentic loops only to auto-repair failures. Cap retries and wall-clock time, and surface partial results with a repairing state so users are not stuck in silence.

Limit to 2–3 repair attempts or 60–90 seconds wall-clock
Detect repeated error states and escalate or fall back
Emit structured error traces for post-mortems

Evals Plus Vibe Checks

Evals add to, not replace, fast human spot-checks. Build offline evals from real usage, retain traces and artifacts, and use A/B tests as online evals to gate major prompt or model changes.

Weekly review 100 random sessions by an accountable owner
Create an offline eval harness and retain traces for 90 days
Gate major changes behind A/B or canary evals

Engineering

Step Down Before Fine-tuning

Prove value with a slower, stronger model, then tighten prompts and step down to smaller, faster models. Fine-tune only after exhausting prompt specialization and routing to preserve quality while collapsing latency and cost.

Maintain an offline eval harness from day one
Target 2–5x latency and 3–10x cost cuts with under 2% quality loss
Lock prompts per task and route by task, not a mega-prompt

Contain Injection With Egress Controls

There is no safe prompt when untrusted data enters context. Use hardened renderers with allow-listed link and image prefixes, enforce HTTP egress allow-lists at host and path, sandbox tool execution, and audit tool outputs for secondary injection.

Block remote images and links by default; enforce prefix allow-lists
Enforce egress policies at HTTP host and path, not IP ranges
Log tool inputs and outputs; sample weekly for review

Scale IO-Bound Agents

Most agents are IO-bound. Use compute or pricing that does not bill while waiting, and add a workflow engine once tasks span hours or days. Build for retries and idempotency from the start.

Set per-step SLAs with dead-man timers and backoff
Persist step state and make each step idempotent
Expose pause, resume, and cancellation hooks

Crawl Cheap, Render Selectively

For web-researching agents, run a two-stage pipeline: cheap, large-scale crawling plus classification, then render only high-value pages with a headless browser. This keeps accuracy while cutting compute and bandwidth.

Keep the render ratio under 5% of crawled pages
Cache aggressively and dedupe by URL or content hash
Re-render on content or layout change signals

Avoid Premature Protocols

Agent-to-agent and tool protocols add drag when you own both sides. Treat another agent as a tool until you feel pain across multiple independent integrations, then add formal protocols with a thin adapter layer to avoid lock-in.

Defer protocols until at least three external integrations
Measure developer cycle time before and after abstraction
Keep adapters thin to reduce spec lock-in

Customer Success

Measure Agent ROI In The Queue

Internal agents that pre-digest context for humans can unlock big gains in triage-style queues. Throughput is the metric. Teams have seen roughly 50% more tickets processed per hour once agents gather artifacts and propose classifications, with humans making the final call.

Instrument tickets per hour and handle time before and after
Auto-attach evidence like scrapes and classifications to each ticket
Keep the human decision-maker and evaluate deflection monthly