Playbook: Advanced Reliability
Load when work involves fanout, paging through large datasets, concurrency,
checkpoints, retries, waits, long-running jobs, or large artifacts.
Default Loop
- Decide whether the shape is a loop, fanout, child flow, table query, or
external async job. - Bound the work:
- max pages
- max items
- timeout
- retry policy
- concurrency limit
- checkpoint key
- Persist data that can grow:
- each page
- final row set
- generated files/media
- external exports
- Return compact proof from each unit:
- counts
- cursor/checkpoint
- resource refs
- failed ids
- retryable/non-retryable status
- Give each side-effectful unit an idempotency key or dedupe key before adding
retries, fanout, or reruns. - Test one page or one shard first, then a representative bounded batch, then
the full intended run.
Paging Shapes
Use explicit loops for larger datasets. Suitable shapes include:
- a packaged step or
:function-backed step that pages a provider API up to
max-pagesormax-items - table resource queries with explicit
:page - child flows for page-level work when each page is heavy or side-effectful
- checkpoint state that advances only after durable writes succeed
Use flow/poll for external async job completion, not generic data paging.
Fanout And Child Flows
Use fanout or child flows when units should be isolated, retried, observed, or
installed independently. Keep parent flows focused on orchestration and summary
output. Use grouped flow metadata for workspace display, but runtime linkage
comes from flow/call-flow, :fanout, or another orchestration primitive.
Work Unit Contract
Before expanding a loop or fanout, name the unit and keep the parent/child
contract compact:
- input map
- output map
- persisted artifacts
- idempotency or dedupe key
- side effects
- retryable failure shape