Field Notes - Nov 20, '25

Executive Signals

Contexts over processes: durable scale without memory blowups in headless automation
Latency is capacity: shaved seconds beat new servers under concurrency caps
Ephemeral workers, smaller blast radius: clearer autoscaling and better spot economics under bursty loads
Demos as gates: end-to-end runs set contracts, not feature parades

Use a near-term demo retro as an integration checkpoint, not a feature parade. Run the real flow end-to-end to surface failure modes early, then lock the next tranche behind SLAs and schema contracts. The goal is observable readiness: p95 runtime within thresholds, error rates under control, and no manual shims outside the runbook.

Freeze field and enum names ≥5 business days prior; publish a mapping doc
Keep 2–3 toggleable test records per integration to validate idempotence and retries
Define pass/fail: p95 runtime target, <2% errors, zero manual steps

Engineering

One Browser, Many Contexts For Scale

Multiple headless processes per VM exhaust memory and crash under load. Pool a single browser instance and schedule jobs across isolated contexts (tabs). Drive concurrency from the queue, enforce strict timeouts, and aggressively recycle contexts to keep RSS predictable.

Cap concurrent contexts to ~1–2 per vCPU; tune down if p95 RSS climbs
Recreate a context after N tasks or any crash; capture crash/timeout telemetry
Disable GPU and extensions; enforce queue backpressure over fire-and-forget

Cut Step Time Before Adding Servers

When infra concurrency is capped, latency reductions compound capacity faster than hardware. Dropping a key search step from ~30s+ to ~5s yielded ~6x step throughput. Apply this across adapters before scaling machines.

Set per-step budgets (target p95 ≤ 8s; end-to-end ≤ 60s) and fail fast on regressions
Cache hot selectors/results, remove duplicate queries, and pre-warm sessions
Track jobs/hour = concurrency × 3600 ÷ avg_job_seconds; fix the top offender first

Isolate Headless Jobs With Ephemeral Workers

A single shared VM creates a noisy-neighbor failure domain. Move headless work to ephemeral workers (containers or spot instances) behind a queue. You gain fault isolation, simpler autoscaling, and better unit economics for bursty workloads.

Scale workers 0→N on backlog and CPU; hard-cap per-worker contexts and memory
Use preemptible/spot only with idempotent jobs and checkpointed progress
Kill-and-replace any worker breaching time/memory limits; alert if p95 errors >2%

Executive Signals

Contexts over processes: durable scale without memory blowups in headless automation

Latency is capacity: shaved seconds beat new servers under concurrency caps

Ephemeral workers, smaller blast radius: clearer autoscaling and better spot economics under bursty loads

Demos as gates: end-to-end runs set contracts, not feature parades

Product

Freeze field and enum names ≥5 business days prior; publish a mapping doc

Keep 2–3 toggleable test records per integration to validate idempotence and retries

Define pass/fail: p95 runtime target, <2% errors, zero manual steps

Engineering

Cap concurrent contexts to ~1–2 per vCPU; tune down if p95 RSS climbs

Recreate a context after N tasks or any crash; capture crash/timeout telemetry

Disable GPU and extensions; enforce queue backpressure over fire-and-forget

Set per-step budgets (target p95 ≤ 8s; end-to-end ≤ 60s) and fail fast on regressions

Cache hot selectors/results, remove duplicate queries, and pre-warm sessions

Track jobs/hour = concurrency × 3600 ÷ avg_job_seconds; fix the top offender first

Scale workers 0→N on backlog and CPU; hard-cap per-worker contexts and memory

Use preemptible/spot only with idempotent jobs and checkpointed progress

Kill-and-replace any worker breaching time/memory limits; alert if p95 errors >2%

Field Notes

Field Notes - Nov 20, '25

Executive Signals

Product

Demo Retro As An Integration Gate

Engineering

One Browser, Many Contexts For Scale

Cut Step Time Before Adding Servers

Isolate Headless Jobs With Ephemeral Workers

Field Notes

Field Notes - Nov 20, '25

Executive Signals

Product

Demo Retro As An Integration Gate

Engineering

One Browser, Many Contexts For Scale

Cut Step Time Before Adding Servers

Isolate Headless Jobs With Ephemeral Workers