Operate

Persisted Results And Resource Refs

Breyta persistence guide for storing large step outputs as res:// references and retrieving artifacts safely.

Goal

Store large step outputs as resource refs and pass compact references across steps.

Quick Answer

Use :persist when output size is uncertain, return :ref in flow output, and inspect content with breyta resources read <res://...>.
In practice, :persist is a common default for data-producing steps because many real outputs exceed inline thresholds quickly.
For streaming HTTP downloads and other temporary HTTP response blobs, prefer :persist {:type :blob :tier :ephemeral} on the :http step instead of relying on the retained default.

Why Use :persist

Use persistence when a step can produce large or unbounded output:

  • avoid bloating inline workflow state
  • keep downstream step params shareable and small
  • surface retrievable artifacts via breyta resources ...
  • avoid frequent rework from crossing the 256 KB inline threshold during iteration

Default Posture

For non-trivial flows, default to :persist for steps that can return variable or growing payloads (:db, :http, :llm), then pass refs downstream.
Treat persist tier as a separate choice:

  • use :tier :ephemeral for temporary streamed HTTP downloads, exports, and other short-lived response blobs created by :http
  • keep the retained default for non-streaming persists and for artifacts that should remain durable, reusable, or user-discoverable beyond the immediate run

In practice, many flows need both decisions:

  • :persist answers "should this stay out of inline workflow state?"
  • :tier answers "is this a temporary artifact or a durable retained artifact?"

Minimal Pattern

{:flow
 '(let [rows (flow/step :db :query-orders
                {:connection :warehouse
                 :database :postgres
                 :sql "select * from orders where created_at >= now() - interval '1 day'"})
        persisted (flow/step :function :persist-rows
                    {:input {:rows rows}
                     :persist {:type :blob}})]
    {:rows-ref (:ref persisted)})}

Persisted step returns:

{:ref "res://..."}

Loading Persisted HTTP Responses In Function Steps

When an :http step persists a large response as a blob, pass the whole step result into the downstream function step and mark that field in :load. This restores the persisted HTTP response before your function code runs.

'(let [resp (flow/step :http :generate-image
               {:connection :image-api
                :method :post
                :path "/images/generations"
                :persist {:type :blob}})
       img (flow/step :function :decode-image
              {:input {:resp resp}
               :load [:resp]
               :persist {:type :blob
                         :filename "image.jpeg"
                         :content-type "image/jpeg"}
               :code '(fn [{:keys [resp]}]
                        (-> resp
                            :body
                            :data
                            first
                            :b64_json
                            breyta.sandbox/base64-decode-bytes))})]
   {:image img})

Use this pattern when the HTTP response body is too large to survive inline transfer to the next step.

Prefer :tier :ephemeral on the persisted HTTP response when the response is a temporary workflow artifact. The downstream function step can still persist the derived file, but that derived persist stays on the retained tier today:

'(let [resp (flow/step :http :generate-image
               {:connection :image-api
                :method :post
                :path "/images/generations"
                :persist {:type :blob
                          :tier :ephemeral}})
       img (flow/step :function :decode-image
              {:input {:resp resp}
               :load [:resp]
               :persist {:type :blob
                         :filename "image.jpeg"
                         :content-type "image/jpeg"}
               :code '(fn [{:keys [resp]}]
                        (-> resp
                            :body
                            :data
                            first
                            :b64_json
                            breyta.sandbox/base64-decode-bytes))})]
   {:image img})

Blob Path Templates

Use :path to express the relative storage subpath and keep :filename as the leaf name:

(flow/step :function :persist-report
  {:input {:tenant-id tenant-id
           :report-id report-id
           :rows rows}
   :code '(fn [{:keys [rows]}] rows)
   :persist {:type :blob
             :path "exports/{{input.tenant-id}}"
             :filename "report-{{input.report-id}}.json"}})

For plain :persist writes without :slot, runtime stores the artifact under its managed prefix:

workspaces/<ws>/persist/<flow>/<step>/<uuid>/exports/<tenant-id>/report-<report-id>.json

Notes:

  • :path is relative only; do not include a leading / or ..
  • :path and :filename support {{...}} interpolation from resolved step params (input.*, data.*, query.*, etc.) plus runtime fields like workspace-id, flow-slug, and step-id
  • Existing slash-bearing :filename flows still work, but new flows should prefer the explicit :path + :filename split

Installer-Configured Storage Scopes

When installers should control where persisted artifacts land, declare a :blob-storage slot and point :persist :slot at that slot.
For connected persists, the installer-configured storage root becomes the write base under the runtime workspace:

workspaces/<ws>/storage/<configured-root>/<persist-path>/<filename>

That is the full platform path shape for connected persists. Breyta does not add hidden <flow>/<step>/<uuid> segments after the configured storage root.

Author the slot once:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :config {:prefix {:default "reports"
                               :label "Folder prefix"
                               :description "Stored under this folder in the selected storage connection."
                               :placeholder "reports/customer-a"}}}]}

Use it from :persist:

(flow/step :http :download-report
  {:connection :reports-api
   :path "/exports/latest"
   :response-as :bytes
   :persist {:type :blob
             :slot :archive
             :path "{{input.tenant-id}}/{{input.run-date}}"
             :filename "summary-{{input.report-id}}.pdf"}})

With storage root reports/customer-a, that write lands at:

workspaces/<ws>/storage/reports/customer-a/<tenant-id>/<run-date>/summary-<report-id>.pdf

Use the same slot from a runtime resource picker:

{:kind :form
 :collect :run
 :fields [{:key :report
           :label "Archived report"
           :field-type :resource
           :slot :archive
           :accept ["application/pdf"]}]}

Notes:

  • every installer-owned :blob-storage slot automatically adds a required setup control for the storage root
  • authors can customize that control with :config {:prefix ...}, but cannot disable it
  • the chosen root is saved in bindings.<slot>.config.root
  • :persist :path stays relative to the configured root rather than repeating it in the step
  • connected persists write exactly under workspaces/<ws>/storage/<configured-root>/...
  • runtime resource pickers reuse the same resolved connection + root, so the author does not wire the prefix twice
  • current writes remain platform-backed; the slot is the authored contract and the runtime binding controls which storage target backs it
  • end-user installations derive a private default root such as installations/<profile-id>/reports; shared roots require an explicit override
  • sharing is an installer choice: two flows share when installers point them at the same backend and storage root
  • slot names stay local to each flow; the concrete storage location is the actual sharing boundary
  • persisted blob resources are canonical :file resources, so resource fields default correctly without an explicit :resource-types
  • :source remains as a legacy/internal picker-routing field, but the preferred authored model is to bind pickers by :slot

End-To-End Producer And Consumer Example

Use this pattern when Flow A writes files and Flow B later works on those files.

Producer flow:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :config {:prefix {:default "reports"
                               :label "Folder prefix"}}}]
 :flow
 '(let [download (flow/step :http :download-report
                   {:connection :reports-api
                    :response-as :bytes
                    :persist {:type :blob
                              :slot :archive
                              :path "{{input.customer-id}}/{{input.run-date}}"
                              :filename "summary-{{input.report-id}}.pdf"}})]
    {:download download})}

Consumer flow:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :prefers [{:flow :report-producer
                        :slot :archive}]
             :config {:prefix {:default "reports"
                               :label "Folder prefix"}}}
            {:kind :form
             :collect :run
             :fields [{:key :report
                       :label "Archived report"
                       :field-type :resource
                       :slot :archive
                       :accept ["application/pdf"]}]}]
 :flow
 '(let [input (flow/input)]
    {:report (:report input)})}

By default, two end-user installations stay isolated because each one derives its own private root, such as installations/<producer-profile-id>/reports and installations/<consumer-profile-id>/reports.
They share only if both installations are explicitly configured with the same root:

{:bindings {:archive {:binding-type :connection
                      :connection-id "platform"
                      :config {:root "reports/acme"}}}}

and the producer run uses:

{:customer-id "cust-77"
 :run-date "2026-03-24"
 :report-id "rep-42"}

the stored object path is:

workspaces/<ws>/storage/reports/acme/cust-77/2026-03-24/summary-rep-42.pdf

The consumer does not need to know that path. Its runtime picker simply scopes to the same concrete storage location behind :slot :archive.

If you know one producer flow is the intended upstream lane, add :prefers to the consumer slot.
That records the intended sharing relationship, but it does not auto-select or persist the consumer root.
To share, the installer still must explicitly save the same connection + root on both installations.

Keep these boundaries in mind:

  • the producer writes through its own local installer-owned slot such as :archive
  • the consumer reads through its own local installer-owned slot such as :archive
  • the installer decides whether those slots share by choosing the same or different storage roots
  • if two flows point at the same storage location, they share both the utility and the overwrite risk

Resource Types

Use the resource-type split like this:

  • :persist {:type :blob ...} creates :file resources
  • uploads create :file resources
  • :persist {:type :kv ...} creates :result resources
  • captured run and step outputs stay :result

Because persisted blobs and uploads are both :file, most resource picker fields can omit :resource-types entirely.
Add :resource-types only when you need something narrower than the default file picker, for example [:result] for structured run outputs.

Reading Persisted Content

breyta resources workflow list <workflow-id>
breyta resources read <res://...>
breyta resources search "transcript"

Commands that help:

  • breyta resources search "<query>" [--type result|file] [--content-sources file,result]
  • breyta resources search "<query>" [--storage-backend gcs] [--storage-root reports/acme] [--path-prefix exports/2026]
  • breyta resources list [--types file] [--storage-root reports/acme] [--path-prefix exports/2026]
  • breyta resources workflow step <workflow-id> <step-id>
  • breyta resources get <res://...>
  • breyta resources url <res://...>

Use storage filters like this:

  • storage-backend narrows by backend family, such as gcs
  • storage-root narrows to the installer-configured root, such as reports/acme
  • path-prefix narrows further inside that root, relative to it, such as exports/2026
  • path-prefix is relative to the configured root, not the full workspaces/<ws>/storage/... object path

That means a platform-backed persisted file stored at:

workspaces/ws-acme/storage/reports/acme/exports/2026/summary.pdf

is searchable with:

  • --storage-backend gcs
  • --storage-root reports/acme
  • --path-prefix exports/2026

Resource Search Indexing

How persisted artifacts become searchable in breyta resources search:

  • search indexes metadata fields (display name, URI/path context, tags, source label)
  • connected persists also index normalized storage scope fields so search and pickers can filter by backend, root, and relative path
  • text content indexing is enabled only for text-like payloads
  • :tier :ephemeral blobs are metadata-indexed by default (raw content is not extracted)
  • binary blobs are discoverable by metadata/path context, but raw binary content is not full-text indexed
  • indexed text is bounded by size/character limits for stability

For connected persists, Breyta stores both the full path and normalized storage fields:

Indexed fieldMeaningExample
pathFull physical object path, useful for broad search/debug contextworkspaces/ws-acme/storage/reports/acme/exports/2026/summary.pdf
storage_backendBackend familyplatform
storage_rootInstaller-configured root inside that backendreports/acme
path_under_rootRelative path below the rootexports/2026/summary.pdf

That split is intentional:

  • free-text search can still match the full path
  • storage-root and path-prefix use the normalized fields instead of requiring the full workspace storage path
  • the same backend/root/relative-path contract can extend to future storage backends without changing authored filters

Persist Blob Tiers

  • :retained (default): 50MB max persisted write size, 12 month default retention
  • :ephemeral: 4GB max persisted write size, short-lived streaming tier (optimized for HTTP downloads)

Which Tier Should You Use?

Use :tier :ephemeral when the blob is a temporary streamed HTTP artifact:

  • HTTP downloads and exports
  • large API responses that are only being handed to a downstream step
  • streamed response bodies that do not need durable retention

Keep the retained default when the blob is meant to last as a durable product, or when the persist is coming from a non-streaming step:

  • user-facing deliverables that should remain searchable later
  • derived files persisted from :function, :db, or other non-streaming steps
  • installer-managed shared storage under a stable business path
  • artifacts that another flow or operator is expected to revisit after the run completes

Short rule of thumb:

  • :persist without :tier defaults to retained storage
  • use :tier :ephemeral only on temporary streamed HTTP persists
  • if you need the artifact to behave like a durable file, keep the retained default or use an explicit storage slot/root

:persist :search-index Overrides

Use :search-index under :persist to customize indexed text/metadata for persisted artifacts (especially binary blobs), without changing stored payload bytes.

Target shape:

{:persist {:type :blob
           :path "invoices/{{input.customer-id}}"
           :filename "invoice.pdf"
           :search-index {:text "invoice-id=INV-123 vendor=Acme total=4500"
                          :tags ["invoice" "acme" "emea"]
                          :source-label "Invoice PDF from SAP import"
                          :include-raw-content? false}}}

Intended precedence:

  • :search-index.text overrides default indexed content text
  • :search-index.tags overrides/augments indexed tags
  • :search-index.source-label overrides derived source label
  • :search-index.include-raw-content? controls whether default extracted text is also included when available

Open The Same Artifact In Breyta Web

In API mode JSON output, resource responses can include optional webUrl links that point to the artifact context in Breyta Web:

  • breyta resources workflow list <workflow-id> --format json -> data.items[].webUrl (and meta.webUrl for a primary destination)
  • breyta resources search "<query>" --format json -> data.items[].webUrl and data.items[].display-name
  • breyta resources get <res://...> --format json -> data.webUrl (and usually meta.webUrl)
  • breyta resources url <res://...> --format json -> signed data.url plus optional data.webUrl/meta.webUrl

Quick extraction pattern:

breyta resources get <res://...> --format json | jq -r '.meta.webUrl // .data.webUrl // empty'

Cross-Flow State Handoff With KV

For shared state between runs/flows, pair result persistence with KV writes:

  1. persist large step output as res://...
  2. write a compact KV record that points to that ref
  3. read KV in downstream flows and resolve ref only when needed
'(let [payload (flow/step :http :collect
                 {:connection :source-api
                  :method :get
                  :path "/records"
                  :persist {:type :blob}})
       _kv (flow/step :kv :record-latest
             {:type :kv
              :operation :set
              :key "records:latest"
              :value {:ref (:ref payload)}
              :ttl 604800})
       latest (flow/step :kv :load-latest
                {:type :kv
                 :operation :get
                 :key "records:latest"})]
   {:latest (:value latest)})

This keeps orchestration payloads small while still giving operators a durable pointer to the latest artifact.

Design Rules

  • persist early when output size is uncertain
  • return refs instead of heavy payloads in final output
  • pass refs explicitly; don’t hide them in nested structures
  • treat persisted artifacts as durable run history
  • persist when payloads can grow, are reused across steps, or need operator inspection after completion
  • for cross-run/cross-flow lookup, store lightweight pointers in KV instead of duplicating large objects

Troubleshooting

  • downstream steps fail with large payloads: persist the producer output and re-run
  • resource not found: list workflow resources and match step-id to workflowId
  • resources commands require authenticated API mode
  • debug persisted refs by listing workflow resources, finding the producing step-id, and reading the target res:// URI

Related

As of Mar 30, 2026