Field Notes - Jan 12, '26

Executive Signals

Failure paths are features: design graceful handoffs before automation masks real onboarding gaps
Edge cases over happy paths: corpus-driven QA beats per-brand demos and surface hidden regressions
Timeouts aren't causes: classify first failure, route humans only when necessary
CRM is the black box: write canonical reasons, not noise, on every attempt
Retries are a budget: spend them on transients, pause on human fixes

New accounts without a recipient email will fail predictably. Treat this as a designed path: detect it early, surface a precise status in the system of record, and hand off cleanly to operations. This keeps automation reliable without prematurely building a full onboarding flow.

Mirror the vendor error in the CRM to remove ambiguity, then let ops add the missing recipient and requeue. Clear states and one-click remediation reduce cycle time and noisy alerting.

Implement a preflight: if recipient is missing, set status to Needs Recipient and skip auto-retries
Write the vendor error verbatim to CRM notes on every attempt
Provide a one-click Add recipient → Retry flow; log the resolution reason

Engineering

Edge-Case Corpus Beats Per-Brand Happy Paths

Phase 1 overfit to a single test per vendor and missed edge conditions. For Phase 2, build a test corpus that mixes valid and invalid emails with portal and email patterns so status checks are fully testable without vendor involvement. Treat resubmission as lower-testability and isolate it.

Codify the surface area before implementation and hold releases to corpus results, not per-vendor demos. This shifts quality from anecdote to evidence.

Lock an OEM-by-channel matrix (portal vs email) before implementation
Create fixtures for malformed, missing, and delayed emails; target at least 80% edge-path coverage
Release only on corpus pass, not per-vendor “green path” checks

Timeouts Hide Root Cause: Classify Before Retrying

Generic “timed out” notes inflate MTTR and bounce issues between teams. If the run fails waiting for an email field, the root cause is missing recipient, not unknown timeout. Classify on the first failure, write the reason to the CRM, then decide between retry and human action.

Retries should annotate every attempt so history is diagnostic, not decorative. Only transient failures earn another try.

Map first failure to canonical reason codes (Missing Recipient, Portal Auth, Rate Limit)
Persist the code plus human-readable detail to the system of record on each attempt
Auto-retry only for transients; otherwise pause and assign to ops

Executive Signals

Failure paths are features: design graceful handoffs before automation masks real onboarding gaps

Edge cases over happy paths: corpus-driven QA beats per-brand demos and surface hidden regressions

Timeouts aren't causes: classify first failure, route humans only when necessary

CRM is the black box: write canonical reasons, not noise, on every attempt

Retries are a budget: spend them on transients, pause on human fixes

Customer Success

Mirror the vendor error in the CRM to remove ambiguity, then let ops add the missing recipient and requeue. Clear states and one-click remediation reduce cycle time and noisy alerting.

Implement a preflight: if recipient is missing, set status to Needs Recipient and skip auto-retries

Write the vendor error verbatim to CRM notes on every attempt

Provide a one-click Add recipient → Retry flow; log the resolution reason

Engineering

Codify the surface area before implementation and hold releases to corpus results, not per-vendor demos. This shifts quality from anecdote to evidence.

Lock an OEM-by-channel matrix (portal vs email) before implementation

Create fixtures for malformed, missing, and delayed emails; target at least 80% edge-path coverage

Release only on corpus pass, not per-vendor “green path” checks

Retries should annotate every attempt so history is diagnostic, not decorative. Only transient failures earn another try.

Map first failure to canonical reason codes (Missing Recipient, Portal Auth, Rate Limit)

Persist the code plus human-readable detail to the system of record on each attempt

Auto-retry only for transients; otherwise pause and assign to ops

Field Notes

Field Notes - Jan 12, '26

Executive Signals

Customer Success

Design Onboarding Failure Paths

Engineering

Edge-Case Corpus Beats Per-Brand Happy Paths

Timeouts Hide Root Cause: Classify Before Retrying

Field Notes

Field Notes - Jan 12, '26

Executive Signals

Customer Success

Design Onboarding Failure Paths

Engineering

Edge-Case Corpus Beats Per-Brand Happy Paths

Timeouts Hide Root Cause: Classify Before Retrying