
Field Notes - Dec 15, '25
Executive Signals
- Contracts beat code: enforce adapter schemas; block literals at deploy
- Capabilities, not surprises: model vendor quirks become first-class flags in workflows
- Blast radius over bravado: email submissions default to dry runs, explicit recipient control
- Hygiene beats heroics: idempotent schedulers with 30-minute E2E SLAs trump minute-cron chaos
- Deprecations are incidents: treat vendor config changes as rehearsed cutovers, not surprise outages
CEO
Deprecations Are Incidents
A monitoring config deprecation broke flows. Treat dependency and vendor deprecations as operational risk with a rehearsed cutover path, not ad hoc fire drills. Pin versions, surface deprecation notices in CI, and keep rollback artifacts for critical integrations.
- Run a weekly dependency risk review; block breaking or critical diffs in CI
- Stage config changes in pre-prod for 48 hours with drift detection
- Maintain rollback artifacts per critical integration: config and last-known-good version
Engineering
Kill Hard-Coded Drift in Adapters
Adapters were reading static values instead of config, passing small tests but failing at scale and across tenants. Make adapter behavior contractual. Define explicit JSON or YAML schemas, snaptest across OEM variants, and instrument for literal reads to catch drift quickly.
- Block deploys if any required field resolves to a literal
- Add snapshot tests across 3–5 OEM variants; fail CI on diff
- Emit config-miss and literal-read counters; page on any non-zero in production
Model Vendor Quirks as Capabilities
Some portals omit reference IDs or status checks. Model these as explicit capabilities so workflows branch deterministically. When absent, emit a synthetic submission_id and persist flags back to the CRM. For low-observability combos, route to human recon.
- Generate submission_id when missing: hash(portal, dealer, submitted_at)
- Write capability flags to the CRM to drive downstream branching
- Queue human review on “no reference ID + no status check” cases
Control the Blast Radius of Email-Based Submissions
Email-triggered submissions can surprise external recipients. Default to dry runs, pre-notify test dealers, and store recipient lists as data. Validate at submit time and throttle first live sends to reduce risk.
- Gate live sends behind notify_recipients=true; default to dry-run logs
- Validate recipients at submit-time (MX check, max CC ≤ 10) and persist on the submission
- Maintain a test-dealer routing table; enforce 24-hour pre-notice before first live
Scheduler Hygiene Beats Heroics
Minute-level crons create duplicate work and noisy alerts. Prefer a 15-minute, idempotent cadence with deterministic keys, safe retries, and a single end-to-end latency SLA. Alert on SLA breach, not every failure, and test against CRM sandboxes.
- Use deterministic job keys; target dead-letter ratio < 0.5 percent
- Enforce an E2E latency SLA ≤ 30 minutes; alert on breach
- Purge with backoff policies; validate scheduler behavior in sandboxes before production