
Field Notes - Dec 12, '25
Executive Signals
- Cron is the new footgun: separate schedule from work; isolate retries and failures
- Evidence beats logs: before/after proof shortens audits and customer escalations
- Release gates, not vibes: five clean runs de-risk integrations before scale
- Concurrency risk : queue age is the alarm bell
- IDs everywhere: tag jobs and artifacts to collapse investigation time
Customer Success
Prove Submission with Evidence and Reference Capture
Each run should screenshot pre‑submit and post‑confirm, parse the confirmation page’s reference number, and write it with status back to the CRM. This closes the audit trail and turns escalations into lookups, not hunts.
- Fail the job if reference ID or post‑submit evidence is missing; retry with backoff
- Persist artifacts with run ID and timestamp; link from the CRM record
- Version selectors and regex per OEM so UI changes don’t silently break capture
Engineering
Cron Schedules, Queue Execution
Separate scheduling from execution. A stateless Kubernetes CronJob only enqueues OEM tasks; workers consume from the queue and run adapters. You gain per‑OEM cadence control, safe retries, and failure isolation without blocking the schedule. See Kubernetes CronJobs and dead‑letter queue patterns for the primitives you’ll lean on. (Example docs: Kubernetes CronJob; Amazon SQS DLQ.) (kubernetes.io)
- Tag jobs with OEM and dealer IDs; dead‑letter failures with reason codes
- Alert if queue age > 2× schedule interval or a worker dies mid‑run
- Call the scheduler “done” after 24 hours with zero orphaned jobs and no overlapping locks
Release Gate: Five Clean End‑to‑End Runs per OEM
Define “done” as five consecutive clean runs per adapter covering fill → submit → evidence → reference capture → CRM update. This forces method reuse, surfaces edge cases early, and stabilizes before scaling cadence.
- If any step fails, reset the count and fix root cause before retrying
- Baseline time‑to‑submit and success rate during verification; watch for drift after go‑live
- After templating common methods, expect ~5 verification runs to bring a new OEM to parity