
Field Notes - Nov 14, '25
Executive Signals
- Finite state over boolean: lifecycle enums prevent human-robot collisions, add timestamps, and trace retries
- Observability before integration: logs, screenshots, and error routes precede CRM hooks
- Schedules are contracts: frequent windows, backpressure, predictable SLAs beat anytime jobs
- Telemetry gates expansion: one-logo pilots, strict exit criteria, kill switches ready
CEO
Pilot One Logo With Hard Exits
Treat the compliance flow as a measured experiment, not a soft launch. Expansion is gated by telemetry, with scope frozen until the system proves itself. Every defect gets a fix or guardrail, and a kill switch routes new items to manual if error rates spike.
- Define exit criteria (example: >95% automated pass rate, <1% false positives, zero P1s for 7 days)
- Enforce a change freeze during the pilot window
- Maintain a kill switch to divert to manual on error spikes
Product
Time Windows and SLAs Beat “Run It Whenever”
Ops needs predictability and machines need guardrails. Publish a schedule and SLA, bias to small frequent sweeps with backpressure, and make the next run visible so humans know when to wait versus intervene.
- Target 95% of jobs in under 30 minutes; cap at 2 hours
- Constrain concurrency per partner; use exponential backoff on 429/5xx
- Expose next_run_at and eta in CRM list views
Engineering
Replace the CRM Boolean With a Finite State
A single “is automated” flag collapses the real lifecycle and invites human-robot collisions. Use a status enum that mirrors reality: queued → processing → succeeded/failed, plus manual_override. Add started_at, completed_at, and run_id to trace retries and measure performance.
- Replace booleans with a status enum plus manual_override and timestamps
- Auto-stale any “processing” job exceeding 2x P95; alert and requeue
- Lock human edits while processing; make transitions idempotent with single ownership
Ship Observability Before You Integrate
Hold the gate until runtime evidence exists. Log every step, capture screenshots on success and failure, wire error tracking, and pass basic security and quality gates before connecting anything to the CRM. Define page-worthy errors and on-call routing ahead of the first external trigger.
- Block integration behind zero Critical vulns and a passing quality gate
- Persist per-job logs and screenshots with 7–30 day retention
- Predefine error categories, alerting thresholds, and escalation paths