
Field Notes - Nov 12, '25
Executive Signals
- CSV is the new tech debt: temporary exports become permanent detours that cost twice
- Gates beat dates: binary readiness replaces optimism with controlled integration risk
- Telemetry before throughput: observability parity prevents silent regressions at scale
- Single trunk, fewer scars: one integration path preserves engineering attention
- Latency budgets, not vibes: per-flow SLAs guard quality as you parallelize
CEO
One Path to Production
When an external integration is not ready, CSV workarounds feel fast but force double-builds, harsh context switching, and later removal. Slip the date and keep a single trunk to production to protect engineering focus and long-term velocity.
- Only approve a workaround if it lives two or more quarters or unlocks half the launch value
- Make the primary integration the sole trunk; block stopgaps that will not ship to prod
- Write down the opportunity cost before approving any detour
Product
Gate Launches on External Readiness, Not Hope
A brand-new middleware or API turns dates into risk. Replace optimism with binary gates owned with the partner. Require sandbox credentials, documented schema and rate limits, sample payloads, and a passing end-to-end run on dev. Hold a readiness review and track the partner on a separate risk register with weekly slip and mitigation notes.
- No calendar date without all gates checked in a partner review
- Maintain an external risk register with a named owner and weekly updates
Define “Ready to Test” With the Stakeholder First
Date debates collapse without a shared test definition. Co-author the plan now: scope, success criteria, and evidence. Use a single testing ticket with per-flow coverage and expected artifacts. Pre-agree failure handling and signoff roles. Publish throughput targets ahead of UAT so everyone aligns on pace, not opinions.
- Create one test ticket covering flows, data sets, artifacts, and pass or fail gates
- Document auto retry, manual retry, or bug paths and who signs off
- Set targets for runs per hour and acceptable failures per day
Engineering
Light Up Observability Before You Add Scale
Environment parity and telemetry beat heroics. Move flows from local to a dev server early, add real error handling, and wire logs and screenshots to a channel. Land Slack and on-call alerts before adding more OEMs or users. Promote dev to stage to prod only when each tier meets the same observability bar.
- Define green dev: headless runs pass, logs and screenshots captured, alerts on first failure
- Ship alerting before onboarding additional OEMs or users
- Promote only when each tier clears the same observability bar
Set Runtime SLAs Per Automation Flow
Early runs show sub-minute local latency. Use that to set per-OEM P50 and P95 targets, max timeouts, and parallelism budgets. Fail fast and surface rich context in alerts. Batch or parallelize to hold P95, and treat regressions as release blockers. Add a daily latency dashboard and require deltas to be explained before promotion.
- Establish P50, P95, and max timeout per flow; fail fast with context
- Batch or parallelize to maintain P95 within target bands
- Review daily latency deltas on a dashboard prior to any promotion