
Field Notes - Sept 27, '25
Executive Signals
- Seconds beat scope: first signal in seconds earns trust; deep checks can trail.
- Outcomes over tokens: price by resolution; protect margins against model and browser costs.
- Start big, then small: evals enable step-down routing without noticeable quality loss.
- Safety is containment: injection is unsolved; enforce egress and hardened renderers.
- Streams beat gates: show value live; capture emails on completion, not upfront.
- IO is the bottleneck: choose non-billing wait compute and workflow engines for long tasks.
CEO
Price Outcomes, Protect Margins
Willingness to pay is rising, but inference, browsers, and orchestration inflate COGS. Charge per outcome, not per token or minute. Design for healthy gross margins from day one and require fast payback on internal agents.
- Tie tiers to measurable outputs; audit unit economics monthly
- Target healthy gross margins after model, browser, and workflow costs
- For internal agents, require sub-6-month payback on build and run
Marketing
Stream Value, Capture Email on Completion
Security buyers bounce on honeypot-style gates. Stream core results live and offer a practical “email me when complete” for multi-minute portions. Keep a sticky “Schedule a walkthrough” CTA. Blur only product-only sections, not the core value.
- Remove upfront walls; convert on utility, not friction
- Add “notify on completion” to capture work emails without stalling
- Use blur-gates surgically for premium insights, not basics
Pre-cached Examples Kill Waiting
Zero initial wait with pre-rendered examples across common verticals. Show clean and messy cases, and explain scoring so prospects learn the rubric before scanning their own site.
- Provide 6–10 examples across news, retail, healthcare, and finance
- Label good and bad elements; mirror the live report UI
- Rotate examples quarterly to reflect real-world patterns
Consent Audit Lite, Not Cookie Lists
A crisp variant outperforms raw cookie dumps: scan one page in parallel under Accept versus Reject from a few regions, then diff vendors that still load. Summarize offenders and severity, with a remediation path into the full product.
- Parallelize regions and consent choices without deep clicking
- Output vendors-by-choice-by-region with severity labels
- Make it shareable with an executive summary
Product
Privacy Lite in 10 Seconds
Front-door value should be a privacy summary that returns first findings in seconds, then streams the rest. Deep PCI or HIPAA checks often require payment or login flows, so run them in the background or reserve for demos.
- Target time-to-first-signal under 10 seconds
- Complete light crawls in under 60 seconds
- Reveal PCI or HIPAA checks only when the agent finishes
Stream Findings by Buyer Priority
Sequence outputs for decision value: scripts and vendors first, then response headers, data transfer map, cookies, and cookie categorization. Use WebSockets to stream and render the map early with local geo or IP lookups. Push LLM-heavy cookie classification to the background.
- Ensure first payload in 5–10 seconds with visible progress
- Emit raw cookie lists first; add categories later
- Keep latency-sensitive enrichments local or cached
Payment Page Risk Scoring
On payment pages, classify third parties as authorized processors or risk. Unknown equals risk by default. Present a simple grade tied to unauthorized vendors, missing headers, and data egress destinations, with an exportable audit summary.
- Maintain an editable allowlist; flag everything else
- Show counts and top offenders with linked rationale
- Offer an exportable summary for audits
Design the Unhappy Path First
Make the one-shot path excellent and use agentic loops only to auto-repair failures. Cap retries and wall-clock time, and surface partial results with a repairing state so users are not stuck in silence.
- Limit to 2–3 repair attempts or 60–90 seconds wall-clock
- Detect repeated error states and escalate or fall back
- Emit structured error traces for post-mortems
Evals Plus Vibe Checks
Evals add to, not replace, fast human spot-checks. Build offline evals from real usage, retain traces and artifacts, and use A/B tests as online evals to gate major prompt or model changes.
- Weekly review 100 random sessions by an accountable owner
- Create an offline eval harness and retain traces for 90 days
- Gate major changes behind A/B or canary evals
Engineering
Step Down Before Fine-tuning
Prove value with a slower, stronger model, then tighten prompts and step down to smaller, faster models. Fine-tune only after exhausting prompt specialization and routing to preserve quality while collapsing latency and cost.
- Maintain an offline eval harness from day one
- Target 2–5x latency and 3–10x cost cuts with under 2% quality loss
- Lock prompts per task and route by task, not a mega-prompt
Contain Injection With Egress Controls
There is no safe prompt when untrusted data enters context. Use hardened renderers with allow-listed link and image prefixes, enforce HTTP egress allow-lists at host and path, sandbox tool execution, and audit tool outputs for secondary injection.
- Block remote images and links by default; enforce prefix allow-lists
- Enforce egress policies at HTTP host and path, not IP ranges
- Log tool inputs and outputs; sample weekly for review
Scale IO-Bound Agents
Most agents are IO-bound. Use compute or pricing that does not bill while waiting, and add a workflow engine once tasks span hours or days. Build for retries and idempotency from the start.
- Set per-step SLAs with dead-man timers and backoff
- Persist step state and make each step idempotent
- Expose pause, resume, and cancellation hooks
Crawl Cheap, Render Selectively
For web-researching agents, run a two-stage pipeline: cheap, large-scale crawling plus classification, then render only high-value pages with a headless browser. This keeps accuracy while cutting compute and bandwidth.
- Keep the render ratio under 5% of crawled pages
- Cache aggressively and dedupe by URL or content hash
- Re-render on content or layout change signals
Avoid Premature Protocols
Agent-to-agent and tool protocols add drag when you own both sides. Treat another agent as a tool until you feel pain across multiple independent integrations, then add formal protocols with a thin adapter layer to avoid lock-in.
- Defer protocols until at least three external integrations
- Measure developer cycle time before and after abstraction
- Keep adapters thin to reduce spec lock-in
Customer Success
Measure Agent ROI In The Queue
Internal agents that pre-digest context for humans can unlock big gains in triage-style queues. Throughput is the metric. Teams have seen roughly 50% more tickets processed per hour once agents gather artifacts and propose classifications, with humans making the final call.
- Instrument tickets per hour and handle time before and after
- Auto-attach evidence like scrapes and classifications to each ticket
- Keep the human decision-maker and evaluate deflection monthly