Docs
Operate

Persisted Results And Resource Refs

Breyta persistence guide for storing large step outputs as res:// references and retrieving artifacts safely.

Goal

Store large step outputs as resource refs and pass compact references across steps.

Quick Answer

Use :persist when output size is uncertain, pass refs downstream, and inspect content with breyta resources read <res://...>.
For rows, :persist {:type :table ...} creates a queryable table resource for later :table steps and breyta resources table ....
For blobs, choose the tier deliberately: retained/default for durable or user-visible artifacts; :tier :ephemeral for temporary streamed HTTP responses.

Persistence is storage, not presentation. A res://... ref is a compact handle
for downstream steps and debugging. When a person should see the result, render
the resource through a final output viewer: usually a Markdown report with
breyta-resource fences, or a deliberate :table, :image, :video,
:download, or :raw viewer. Persisted JSON resources can also render through
:view :json inside Markdown. See Output Artifacts.

Why Use :persist

Use persistence when a step can produce large or unbounded output:

  • avoid bloating inline workflow state
  • keep downstream step params shareable and small
  • surface retrievable artifacts via breyta resources ...
  • avoid frequent rework from crossing the 512 KB inline threshold during iteration
  • keep row-oriented operational data in a queryable table resource instead of pushing whole rowsets through workflow history

The important hard numbers for authors are:

  • inline step results are intended to stay under 512 KB
  • unpersisted step results hard-fail around 1 MB
  • database result payloads are capped at 1 MB
  • retained blob persists can write up to 50 MB
  • ephemeral streamed blob persists can write up to 4 GB
  • HTTP body loads from refs are capped at 10 MB retained or 20 MB ephemeral

So for data-heavy outputs that can exceed inline limits, return :persist refs
instead of passing the whole value through workflow state.

Storage Tier Decision

:persist has two choices: :type chooses :blob, :table, or :kv; :tier chooses the blob storage tier where supported.

TierHow to request itUse whenTypical cap
retainedomit :tier or use :tier :retaineddurable, reusable, searchable, user-visible, or needed after the run50 MB write cap
ephemeral:tier :ephemeraltemporary streamed HTTP downloads, exports, generated media, or API response bodies4 GB streaming write cap

Current support boundary: :tier :ephemeral is for streaming HTTP blob persists.
Function, table, and KV persists use the retained/default path today. Retain the
final curated artifact or table; keep intermediate HTTP blobs ephemeral.

Document Fetch, Preserve, Extract, Display

For PDFs and other document files, separate four jobs:

  • fetch: use :http with :response-as :bytes
  • preserve: add :persist {:type :blob ...} so the file has a res:// ref
  • extract: use an external document extraction API or an LLM/tool that explicitly supports that file type
  • display: return a Markdown report with a download/resource embed

Breyta does not currently expose a built-in PDF text extraction primitive.
Do not assume that persisting a PDF makes its raw text available to functions or
search. Persisted PDFs are discoverable by metadata/path context; add
:persist {:search-index {:text ...}} when you already have trusted extracted
text.

If an HTTP step returns binary bytes, do not pass those bytes to table rows as
text. Persist the blob, pass the resource ref, then call an explicit extraction
service before writing extracted fields to a table.

Default Posture

For non-trivial flows, default to :persist for steps that can return variable or growing payloads (:db, :http, :llm), then pass refs downstream.
When the result is a collection of rows that should stay editable or queryable later, prefer :persist {:type :table ...} over keeping the full rowset inline.

For derived tables, prefer flow/step :table with {:op :materialize-join ...}
over pulling rows into :function and hand-writing joins.

Minimal Pattern

Blob persist:

{:flow
 '(let [download (flow/step :http :download-orders
                   {:url "https://api.example.com/orders.csv"
                    :response-as :bytes
                    :persist {:type :blob
                              :tier :ephemeral
                              :filename "orders.csv"
                              :content-type "text/csv"}})]
    {:download-uri (:uri download)})}

Persisted blob results include the resource ref fields used by resource APIs,
viewers, tables, and downstream loaders:

{:uri "res://..."
 :resource-uri "res://..."
 :blob-ref {...}}

Table persist:

{:flow
 '(let [orders (flow/step :http :fetch-orders
                 {:url "https://api.example.com/orders"
                  :accept :json
                  :persist {:type :table
                            :table "orders"
                            :rows-path [:body :items]
                            :write-mode :upsert
                            :key-fields [:id]}})]
    {:orders-table orders
     :orders-table-uri (:uri orders)})}

Table persists return a resource ref with table/write metadata:

{:type :resource-ref
 :uri "res://v1/ws/ws-123/result/table/tbl_..."
 :content-type "application/vnd.breyta.table+json"
 :preview {:table-name "orders"
           :write-mode :upsert
           :rows-written 100}
 :write {:mode :upsert
         :rows-written 100}}

upsert is incremental: it updates matching key rows and inserts new key rows,
but it does not remove rows omitted from a later write. For "latest snapshot"
tables, include a run/batch key in :key-fields or :partitioning, then query
the current batch. There is no scoped replace/delete-by-group mode yet.

Human-Readable Table Output Recipe

Use this pattern when the final output should open as a real Breyta table artifact, not just show a text report with a table inside it.

  1. Build row maps with stable storage keys.
  2. Add human-facing column metadata with :columns.
  3. Persist the rows with :persist {:type :table ...}.
  4. Return the persisted table step result as the :breyta.viewer/value for a :table viewer.
'(let [run-id (str "run-" (flow/now-ms))
       comparison-table
       (flow/step :function :build-comparison-table
                  {:input {:rows comparison-rows
                           :run-id run-id}
                   :code '(fn [{:keys [rows run-id]}]
                            {:rows
                             (mapv (fn [row]
                                     {:run_id run-id
                                      :paragraph (:paragraph row)
                                      :original (:original row)
                                      :cleaned (:cleaned row)
                                      :changed (:changed row)})
                                   rows)})
                   :persist {:type :table
                             :table (str "transcript-comparison-" run-id)
                             :rows-path [:rows]
                             :write-mode :upsert
                             :key-fields [:run_id :paragraph]
                             :indexes [{:field :run_id}
                                       {:field :changed}]
                             :columns [{:column :paragraph
                                        :display-name "Paragraph"}
                                       {:column :original
                                        :display-name "Original"}
                                       {:column :cleaned
                                        :display-name "Cleaned"}
                                       {:column :changed
                                        :display-name "Changed"}]}})]
   {:breyta.viewer/kind :table
    :breyta.viewer/options {:title "Original vs cleaned"}
    :breyta.viewer/value comparison-table})

Inline maps like {:rows [...] :columns [...]} are not table artifacts. A real table artifact has a table content type and a res://.../result/table/... URI.

When the table belongs inside a narrative report, keep the final output as a
Markdown viewer and embed the persisted table with a breyta-resource fence.
That lets the surrounding text, filtered table snapshot, aggregate chart, and
download affordance render in document order without exposing the res:// URI
to end users. Use Output Artifacts for the full
Markdown resource embed syntax.

Verification loop:

breyta runs show <workflow-id> --pretty
breyta resources read <table-uri> --limit 25 --offset 0

Check that the final output table item contains :type :resource-ref, :content-type "application/vnd.breyta.table+json", and non-zero :preview :rows-written or :preview :row-count.

materialize-join remains incremental in v1:

  • destination writes use :append or :upsert
  • there is no snapshot/replace mode yet
  • joins read the current materialized row state of source tables, so :recompute first when derived source values must be refreshed

The same rule applies to ordinary table persists: a smaller rerun does not
delete rows from an earlier larger run unless the flow models each run as its own
batch/partition and reads the latest batch.

Loading Persisted HTTP Responses In Function Steps

For large persisted HTTP responses, pass the whole step result into the
downstream function input and mark that field in :load. Use
:tier :ephemeral for temporary intermediates; omit :tier when the HTTP blob
itself is durable.

'(let [resp (flow/step :http :generate-image
               {:connection :image-api
                :method :post
                :path "/images/generations"
                :persist {:type :blob
                          :tier :ephemeral}})
       img (flow/step :function :decode-image
              {:input {:resp resp}
               :load [:resp]
               :persist {:type :blob
                         :filename "image.jpeg"
                         :content-type "image/jpeg"}
               :code '(fn [{:keys [resp]}]
                        (-> resp
                            :body
                            :data
                            first
                            :b64_json
                            breyta.sandbox/base64-decode-bytes))})]
   {:image img})

Use this pattern when the HTTP response body is too large to survive inline transfer to the next step.

Blob Path Templates

Use :path to express the relative storage subpath and keep :filename as the leaf name:

(flow/step :function :persist-report
  {:input {:tenant-id tenant-id
           :report-id report-id
           :rows rows}
   :code '(fn [{:keys [rows]}] rows)
   :persist {:type :blob
             :path "exports/{{input.tenant-id}}"
             :filename "report-{{input.report-id}}.json"}})

For plain :persist writes without :slot, runtime stores the artifact under its managed prefix:

workspaces/<ws>/persist/<flow>/<step>/<uuid>/exports/<tenant-id>/report-<report-id>.json

Notes:

  • :path is relative only; do not include a leading / or ..
  • :path and :filename support {{...}} interpolation from resolved step params (input.*, data.*, query.*, etc.) plus runtime fields like workspace-id, flow-slug, and step-id
  • Existing slash-bearing :filename flows still work, but new flows should prefer the explicit :path + :filename split

Installer-Configured Storage Scopes

When installers should control where persisted artifacts land, declare a :blob-storage slot and point :persist :slot at that slot.
For connected persists, the installer-configured storage root becomes the write base under the runtime workspace:

workspaces/<ws>/storage/<configured-root>/<persist-path>/<filename>

That is the full platform path shape for connected persists. Breyta does not add hidden <flow>/<step>/<uuid> segments after the configured storage root.

Author the slot once:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :config {:prefix {:default "reports"
                               :label "Folder prefix"
                               :description "Stored under this folder in the selected storage connection."
                               :placeholder "reports/customer-a"}}}]}

Use it from :persist:

(flow/step :http :download-report
  {:connection :reports-api
   :path "/exports/latest"
   :response-as :bytes
   :persist {:type :blob
             :slot :archive
             :path "{{input.tenant-id}}/{{input.run-date}}"
             :filename "summary-{{input.report-id}}.pdf"}})

With storage root reports/customer-a, that write lands at:

workspaces/<ws>/storage/reports/customer-a/<tenant-id>/<run-date>/summary-<report-id>.pdf

Use the same slot from a runtime resource picker:

{:invocations {:default
               {:inputs [{:name :report
                          :label "Archived report"
                          :type :resource
                          :slot :archive
                          :accept ["application/pdf"]}]}}}

Notes:

  • installer-owned :blob-storage slots add a required setup control for the storage root
  • authors can customize the default/prefix with :config {:prefix ...}, but cannot disable setup
  • :persist :path stays relative to the configured root
  • resource pickers reuse the same resolved connection/root, so authors do not wire the prefix twice
  • end-user installations default to isolated roots; shared roots require explicit installer configuration
  • persisted blob resources are canonical :file resources
  • prefer binding invocation resource inputs by :slot

End-To-End Producer And Consumer Example

Use this pattern when Flow A writes files and Flow B later works on those files.
For example, an influencer research flow can write a retained CSV to a private
installer-scoped folder, and an outreach flow can let the same user pick that
CSV from the run form resource picker instead of downloading and uploading CSV files manually.

Producer flow:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :config {:prefix {:default "reports"
                               :label "Folder prefix"}}}]
 :flow
 '(let [download (flow/step :http :download-report
                   {:connection :reports-api
                    :response-as :bytes
                    :persist {:type :blob
                              :slot :archive
                              :path "{{input.customer-id}}/{{input.run-date}}"
                              :filename "summary-{{input.report-id}}.pdf"}})]
    {:download download})}

Consumer flow:

{:requires [{:slot :archive
             :type :blob-storage
             :label "Archive storage"
             :prefers [{:flow :report-producer
                        :slot :archive}]
             :config {:prefix {:default "reports"
                               :label "Folder prefix"}}}]
 :invocations {:default
               {:inputs [{:name :report
                          :label "Archived report"
                          :type :resource
                          :slot :archive
                          :accept ["application/pdf"]}]}}
 :flow
 '(let [input (flow/input)]
    {:report (:report input)})}

By default, two end-user installations stay isolated because each one derives its own private root, such as installations/<producer-installation-id>/reports and installations/<consumer-installation-id>/reports.
They share only if both installations are explicitly configured with the same root:

{:bindings {:archive {:binding-type :connection
                      :connection-id "platform"
                      :config {:root "reports/acme"}}}}

and the producer run uses:

{:customer-id "cust-77"
 :run-date "2026-03-24"
 :report-id "rep-42"}

the stored object path is:

workspaces/<ws>/storage/reports/acme/cust-77/2026-03-24/summary-rep-42.pdf

The consumer does not need to know that path. Its runtime picker simply scopes to the same concrete storage location behind :slot :archive.

For public UX, prefer this resource picker handoff when a downstream flow should
reuse an artifact from a prior run. Keep manual upload as a fallback, but do not
make users browse all workspace resources or copy res:// values by hand.

If you know one producer flow is the intended upstream lane, add :prefers to the consumer slot.
That records the intended sharing relationship, but it does not auto-select or persist the consumer root.
To share, the installer still must explicitly save the same connection + root on both installations.

Keep these boundaries in mind:

  • the producer writes through its own local installer-owned slot such as :archive
  • the consumer reads through its own local installer-owned slot such as :archive
  • the installer decides whether those slots share by choosing the same or different storage roots
  • if two flows point at the same storage location, they share both the utility and the overwrite risk

Resource Types

Use the resource-type split like this:

  • :persist {:type :blob ...} creates :file resources
  • uploads create :file resources
  • :persist {:type :kv ...} creates :result resources
  • :persist {:type :table ...} creates :result resources backed by the :persist-table adapter
  • captured run and step outputs stay :result

Because persisted blobs and uploads are both :file, most resource picker fields can omit :resource-types entirely.
Add :resource-types only when you need something narrower than the default file picker, for example [:result] for structured run outputs or persisted table resources.

Table Resources

Table resources are persisted results with a bounded table-like query/edit surface.

In the resource panel, partitioned table families render as grouped table resources:

  • the family remains the primary resource identity
  • when tablePartition is omitted, the panel defaults to the newest :date-bucket table or the first bounded table for other strategies
  • if an explicit tablePartition is missing, the panel shows a clear warning instead of silently falling back
  • the panel itself is read-only; use breyta resources table ... or flow/step :table for imports and mutations
  • the panel keeps CSV export for the currently previewed table
  • the panel keeps Copy Markdown for the currently visible preview page
  • table selection rerenders in place inside the current panel or sidepeek
  • most family metadata sits behind a compact info tooltip so the preview stays focused on rows and columns

On run pages, table resource refs open the primary table preview by default. Use artifactUri=... for another resource.

Human-readable table output should include the persisted table ref. Inline :rows / :columns / :schema / :query maps are not table resources.

Create on first write:

(flow/step :http :fetch-orders
  {:url "https://example.com/orders"
   :persist {:type :table
             :table "orders"
             :rows-path [:body :items]
             :write-mode :upsert
             :key-fields [:order-id]
             :indexes [{:field :status}
                       {:field :customer-id}]}})

Use the dedicated :table step later:

(flow/step :table :open-orders
  {:op :query
   :table {:ref orders-ref}
   :where [[:status := "open"]]
   :sort [[:order-id :asc]]
   :page {:mode :offset
          :limit 25
          :offset 0}})

Query paging contract:

  • :page is required for :table {:op :query ...}
  • :table {:ref <resource-ref>} is canonical; bare refs work for simple reads.
  • :page.mode must be explicit as :offset or :cursor
  • cursor paging requires explicit :sort
  • the first cursor page omits :page.cursor

You can also author or evolve logical columns later:

(flow/step :table :define-customer-name
  {:op :set-column
   :table {:ref orders-ref}
   :column :customer-name
   :definition {:semantic-type :text
                :computed {:type :lookup
                           :reference-column :customer-id
                           :field :name}}})

set-column backfills bounded tables. Use :recompute only to rerun derived/reference values. For partitioned families, pass partition scope on :table.

Dynamic enum columns keep stored values stable while letting authors control rendered labels:

(flow/step :table :define-status-enum
  {:op :set-column
   :table {:ref orders-ref}
   :column :status
   :definition {:display-name "Status"
                :enum {:options [{:id "open"
                                  :name "Open"
                                  :aliases ["OPEN" "Open"]}
                                 {:id "in-progress"
                                  :name "In progress"
                                  :aliases ["IN_PROGRESS" "In Progress"]}]}}})

Enum behavior:

  • :enum implies type-hint "enum"
  • writes, :update-cell, CSV import, and :recompute normalize incoming scalar values to stable ids
  • matching accepts existing ids, names, and aliases
  • unknown values dynamically grow the enum definition with a normalized id and a derived display name
  • stored rows, :query, :get-row, and CSV export keep the normalized ids
  • the web table preview and Copy Markdown render enum names instead of raw ids

Display formatting is render-only:

  • column :format metadata and sparse :update-cell-format overrides can render relative-time, date, timestamp / date-time, and currency
  • the web table preview and Copy Markdown apply those formats to the currently visible page
  • CLI/API query surfaces and CSV export keep canonical raw values

Resource refs are also first-class cell values:

  • store canonical {:type :resource-ref :uri ...} maps in row data when a cell should point at another resource
  • the web table preview renders those cells as clickable resource chips and opens the target resource in the same panel or sidepeek
  • Copy Markdown uses the rendered label for the currently visible page
  • CLI/API query surfaces and CSV export keep the canonical raw resource-ref value

Table resources can also be used as invocation inputs. Declare a :resource
input filtered to :result resources and, when you want only persisted tables,
add the table MIME type:

{:invocations {:default
               {:inputs [{:name :source-table
                          :label "Source table"
                          :type :resource
                          :resource-types [:result]
                          :accept ["application/vnd.breyta.table+json"]}]}}
 :flow
 '(let [input (flow/input)
        preview (flow/step :table :preview-source
                  {:op :query
                   :table (:source-table input)
                   :page {:mode :offset
                          :limit 25}})]
    {:source-table (:source-table input)
     :preview preview})}

Important boundaries:

  • paged by default
  • query-like operations stay bounded and scoped to one table or an explicit bounded partition subset
  • no implicit all-partitions scans from the family root
  • joins only through bounded :materialize-join
  • no raw SQL
  • no cross-workspace reads

Key v1 table-family limits:

  • 500 table resources (families) per workspace
  • 50_000 live rows per concrete table inside a family
  • 200 columns per table
  • 16 promoted/index fields per table
  • 128 partitions per family
  • 16 partitions touched per write
  • 12 selected partitions per query/aggregate/export
  • 24 selected partitions per preview/read/schema
  • 256 max partition key bytes
  • 64 KB max cell size
  • 256 KB max row payload
  • 256 MB max table size
  • 2 GB max workspace table DB size
  • 1_000 rows per write
  • 1_000 rows per query page
  • 10_000 max query scan window via page.offset + page.limit
  • 200 max aggregate groups

If you need materially more than that, use a dedicated database/query backend instead of expanding table-resource workarounds flow-by-flow.

Current design guidance when one logical dataset approaches bounded-table limits:

  • keep 50_000 live rows per concrete table or partition as a real boundary
  • use first-class :partitioning when the data naturally partitions by region, tenant, source, or a date bucket and most reads/writes stay within one partition or a small bounded subset
  • keep the family root as the schema/metadata owner and select partition scope explicitly for query-like operations instead of expecting implicit all-partitions scans
  • use separate explicit tables when the data truly represents different datasets or lifecycles, not just as a workaround for missing partition support
  • if the workload mainly needs wide cross-partition scans, arbitrary joins, or general database behavior, prefer a dedicated :db step and external database/query backend

Reading Persisted Content

breyta resources workflow list <workflow-id>
breyta resources read <res://...>
breyta resources search "transcript" --limit 10

Commands that help:

  • breyta resources search "<query>" [--limit 10] [--type result|file] [--content-sources file,result]
  • breyta resources search "<query>" [--limit 10] [--storage-backend gcs] [--storage-root reports/acme] [--path-prefix exports/2026]
  • breyta resources list [--types file] [--storage-root reports/acme] [--path-prefix exports/2026]
  • breyta resources workflow step <workflow-id> <step-id>
  • breyta resources get <res://...>
  • breyta resources read <res://table-uri> [--limit 100] [--offset 0]
  • breyta resources table query <res://table-uri> --page-mode offset --limit 100
  • breyta resources table query <res://table-uri> --page-mode cursor --sort-json '[[\"order-id\",\"asc\"]]'
  • breyta resources table get-row <res://table-uri> --row-id <row-id> or --key order-id=ord-1
  • breyta resources table get-row <res://table-uri> --key meeting-key=m1 --key agenda-item-number=1
  • breyta resources table aggregate <res://table-uri> --group-by currency --metrics-json '[...]'
  • breyta resources table aggregate <res://table-uri> --group-by-json '[...]' --metrics-json '[...]'
  • breyta resources table schema <res://table-uri>
  • breyta resources table export <res://table-uri> [--out orders.csv]
  • breyta resources table import <res://table-uri> --file orders.csv --write-mode append|upsert
  • breyta resources table import orders-import --file orders.csv --write-mode upsert --key-fields order-id [--index-fields status]
  • breyta resources table update-cell <res://table-uri> --key order-id=ord-1 --column status --value closed
  • breyta resources table update-cell-format <res://table-uri> --key order-id=ord-1 --column amount --format-json '{\"display\":\"currency\",\"currency\":\"USD\"}'
  • breyta resources table set-column <res://table-uri> --column customer-name --computed-json '{...}'
  • breyta resources table set-column <res://table-uri> --column status --enum-json '{...}'
  • breyta resources table recompute <res://table-uri> --limit 1000 --offset 0
  • breyta resources url <res://...>

For blobs, resources read returns a compact content preview by default. For table URIs, it returns a bounded preview page and pagination metadata. Use --full only when the complete payload is required.
Switch to resources table ... when you need the richer query, export, import, aggregate, or single-cell edit surface.
For enum columns, CLI/API query and export surfaces return the stored normalized ids; the web table preview and Copy Markdown render the configured names.
For partitioned families, single-cell edits stay within the selected partition and cannot change the partition-driving field; use a normal write/upsert when a row should land in a different table.

The bounded aggregate surface also supports:

  • group ordering via order-by-json
  • truncation visibility via hasMore
  • metric-local filters via where
  • scalar arg-max / arg-min metrics for "latest/highest row value per group" cases
  • having-json for post-group filtering
  • bounded collect-set metrics
  • group-by-json for date bucket and numeric-bin specs
  • percentile and median for bounded distribution reporting

Use storage filters like this:

  • storage-backend narrows by backend family, such as gcs
  • storage-root narrows to the installer-configured root, such as reports/acme
  • path-prefix narrows further inside that root, relative to it, such as exports/2026
  • path-prefix is relative to the configured root, not the full workspaces/<ws>/storage/... object path

That means a platform-backed persisted file stored at:

workspaces/<ws>/storage/reports/acme/exports/2026/summary.pdf

is searchable with:

  • --storage-backend gcs
  • --storage-root reports/acme
  • --path-prefix exports/2026

Resource Search Indexing

How persisted artifacts become searchable in breyta resources search:

  • search indexes metadata fields (display name, URI/path context, tags, source label)
  • connected persists also index normalized storage scope fields so search and pickers can filter by backend, root, and relative path
  • text content indexing is enabled only for text-like payloads
  • :tier :ephemeral blobs are metadata-indexed by default (raw content is not extracted)
  • binary blobs are discoverable by metadata/path context, but raw binary content is not full-text indexed
  • indexed text is bounded by size/character limits for stability

For connected persists, Breyta stores both the full path and normalized storage fields:

Indexed fieldMeaningExample
pathFull physical object path, useful for broad search/debug contextworkspaces/<ws>/storage/reports/acme/exports/2026/summary.pdf
storage_backendBackend familyplatform
storage_rootInstaller-configured root inside that backendreports/acme
path_under_rootRelative path below the rootexports/2026/summary.pdf

That split is intentional:

  • free-text search can still match the full path
  • storage-root and path-prefix use the normalized fields instead of requiring the full workspace storage path
  • the same backend/root/relative-path contract can extend to future storage backends without changing authored filters

:persist :search-index Overrides

Use :search-index under :persist to customize indexed text/metadata for persisted artifacts (especially binary blobs), without changing stored payload bytes.

Target shape:

{:persist {:type :blob
           :path "invoices/{{input.invoice.customer-id}}"
           :filename "invoice.pdf"
           :search-index {:text "invoice-id=INV-123 vendor=Acme total=4500"
                          :tags ["invoice" "acme" "emea"]
                          :source-label "Invoice PDF from SAP import"
                          :include-raw-content? false}}}

Intended precedence:

  • :search-index.text overrides default indexed content text
  • :search-index.tags overrides/augments indexed tags
  • :search-index.source-label overrides derived source label
  • :search-index.include-raw-content? controls whether default extracted text is also included when available

Find the same persisted artifact later with flow/step :search:

'(let [artifact (flow/step :function :persist-invoice
                  {:input {:invoice invoice}
                   :code '(fn [{:keys [invoice]}] invoice)
                   :persist {:type :blob
                             :path "invoices/{{input.invoice.customer-id}}"
                             :filename "invoice.json"
                             :content-type "application/json"
                             :search-index {:text "invoice-id=INV-123 vendor=Acme total=4500"
                                            :tags ["invoice" "acme" "emea"]
                                            :source-label "Invoice bundle for Acme"
                                            :include-raw-content? true}}})
       hits (flow/step :search :find-invoice
              {:query "invoice-id=INV-123"
               :targets [:resources]
               :limit 5
               :hydrate {:enabled true
                         :top-k 1
                         :max-chars 12000}})]
   {:artifact artifact
    :hits hits})

Quick operator/debug loop:

breyta resources search "invoice-id=INV-123" --limit 10

Open The Same Artifact In Breyta Web

In API mode JSON output, resource responses can include optional webUrl links that point to the artifact context in Breyta Web:

  • breyta resources workflow list <workflow-id> --format json -> data.items[].webUrl (and meta.webUrl for a primary destination)
  • breyta resources search "<query>" --format json -> data.items[].webUrl and data.items[].displayName
  • breyta resources get <res://...> --format json -> data.webUrl (and usually meta.webUrl)
  • breyta resources url <res://...> --format json -> signed data.url plus optional data.webUrl/meta.webUrl

Quick extraction pattern:

breyta resources get <res://...> --format json | jq -r '.meta.webUrl // .data.webUrl // empty'

Signed URLs vs public preview links

breyta resources url <res://...> returns a temporary signed URL for direct
resource access. It is useful for operators, runtime handoff, and debugging, but
it is not the production outreach/share mechanism for external viewers.

Use public artifact preview links when someone should open a read-only artifact
without logging in:

POST /api/resources/shares
GET /public/artifact-previews/:token
GET /public/artifact-previews/:token/download
DELETE /api/resources/shares/:token

Send X-Breyta-Workspace: <workspace-id> on the authenticated create and
revoke API calls.

Public preview links are unlisted, revocable, optionally expiring, and render a
sanitized artifact page. The page hides workspace/run/debug metadata, raw
resource refs, private resource-content proxy URLs, common signed storage URLs,
and private resource actions. Use this for creator outreach or external review
flows where the recipient should see the opportunity or report, not the
workspace internals.

When the share request sets allowDownload: true and the shared artifact is
text-like, such as EDN, Markdown, CSV, JSON, XML, JavaScript, form data, or
plain text, the share response includes a token-scoped publicDownloadUrl. That route serves a
bounded attachment through the same revocable/expiring token. It does not expose
private resource-content URLs or signed storage URLs, and it does not serve
binary media or sandboxed HTML previews.

Cross-Flow State Handoff With KV

For shared state between runs/flows, pair result persistence with KV writes:

  1. persist large step output as res://...
  2. write a compact KV record that points to that ref
  3. read KV in downstream flows and resolve ref only when needed
'(let [payload (flow/step :http :collect
                 {:connection :source-api
                  :method :get
                  :path "/records"
                  :persist {:type :blob}})
       _kv (flow/step :kv :record-latest
             {:operation :set
              :key "records:latest"
              :value {:uri (:uri payload)}
              :ttl 604800})
       latest (flow/step :kv :load-latest
                {:operation :get
                 :key "records:latest"})]
   {:latest (:value latest)})

This keeps orchestration payloads small while still giving operators a durable pointer to the latest artifact.

Design Rules

  • persist early when output size is uncertain
  • return refs instead of heavy payloads in final output
  • pass refs explicitly; don’t hide them in nested structures
  • treat persisted artifacts as durable run history
  • persist when payloads can grow, are reused across steps, or need operator inspection after completion
  • use :persist {:type :table ...} when row-shaped data should stay queryable/editable as a resource later
  • for cross-run/cross-flow lookup, store lightweight pointers in KV instead of duplicating large objects

Troubleshooting

  • downstream steps fail with large payloads: persist the producer output and re-run
  • resource not found: list workflow resources and match step-id to workflowId
  • resources commands require authenticated API mode
  • debug persisted refs by listing workflow resources, finding the producing step-id, and reading the target res:// URI

Related

As of May 20, 2026