Step Table (`:table`)

Quick Answer

Use flow/step :table to query, inspect, export, edit, and evolve an existing table resource without exposing raw SQL.

Table creation happens through :persist {:type :table ...} on another step. The :table step is the bounded runtime surface for working with that persisted table later.

Worker mental model:

a normal flow step returns row-shaped data
:persist {:type :table ...} tells the runtime to write those rows into a table family, creating it on first write
the persisted step result becomes the canonical table {:type :resource-ref :uri ...} handle
later :table steps consume that ref; include it in final flow output when you want the run/resource UI to expose it directly

For partitioned table families, query-like ops should select partition scope explicitly:

use the family root for :schema
use {:ref <resource-uri> :partitions {:key "..."}} or {:ref <resource-uri> :partitions {:keys ["..." "..."]}} for :query, :get-row, :aggregate, :export, :update-cell, :update-cell-format, :set-column, and :recompute
do not rely on implicit all-partitions scans

Canonical Shape

Common fields:

Field	Type	Required	Notes
`:type`	keyword	Yes	Must be `:table`
`:op`	keyword/string	Yes	One of `:query`, `:get-row`, `:aggregate`, `:schema`, `:export`, `:update-cell`, `:update-cell-format`, `:set-column`, `:recompute`, `:materialize-join`
`:table`	map/ref	Usually	Single-table target. Use `{:ref <resource-uri>}`; add `:partitions` when needed. Bare refs work for simple ops. `:materialize-join` uses `:left`, `:right`, and `:into`.
`:expect`	keyword	No	Standard step output expectation
`:provider-opts`	map	No	Escape hatch options for advanced runtime integration

Per-op fields:

Op	Required fields	Optional fields	Notes
`:query`	`:table`, `:page`	`:select`, `:where`, `:sort`	Paged by default; `:page.mode` is explicit
`:get-row`	`:table` plus `:row-id` or `:key`	none	Fetch one row by stable id or key fields
`:aggregate`	`:table`	`:where`, `:group-by`, `:metrics`, `:having`, `:order-by`, `:limit`	Single-table aggregates only
`:schema`	`:table`	none	Returns columns, key/index fields, and stats
`:export`	`:table`	`:format`, `:select`, `:where`, `:sort`	V1 export format is `:csv`
`:update-cell`	`:table`, `:column`, `:value`, plus `:row-id` or `:key`	none	Updates one canonical cell value
`:update-cell-format`	`:table`, `:column`, plus `:row-id` or `:key`	`:format`	Sparse formatting override; omit/clear format to remove override
`:set-column`	`:table`, `:column`	`:definition`	Create/update one logical column definition, including semantic/computed/reference/enum metadata; existing rows are backfilled automatically
`:recompute`	`:table`	`:where`, `:limit`, `:offset`	Recompute materialized computed/reference columns for existing rows
`:materialize-join`	`:left`, `:right`, `:on`, `:into`	`:join-type`, `:project`, `:op-id`	Build or refresh a destination table from a bounded key-based join; the same contract is also exposed through `breyta resources table materialize-join` for end-to-end validation

Predicate, Sort, And Metric Shapes

;; Predicates
[:status := "open"]
[:amount :>= 100]
[:title :contains "invoice"]
;; Agent/tool JSON may use {"field":"status","op":"=","value":"open"}.

;; Sort
[:updated-at :desc]
;; Agent/tool JSON may use {"field":"updated-at","direction":"desc"}.

;; Metrics
{:op :count :as :count}
{:op :sum :field :amount :as :total-amount}
{:op :count :where [[:status := "open"]] :as :open-count}
{:op :arg-max :field :order-id :order-field :amount :as :largest-order-id}
{:op :collect-set :field :currency :limit 5 :as :currencies}
{:op :percentile :field :amount :p 0.95 :as :p95-amount}
{:op :median :field :amount :as :median-amount}

;; Group-by bucket spec
{:field :created-at
 :bucket {:op :date-trunc :unit :month}
 :as :created-month}
{:field :amount
 :bucket {:op :numeric-bin :size 10}
 :as :amount-bin}

Supported predicate ops:

:=
:!=
:>
:>=
:<
:<=
:contains

Supported aggregate metrics:

:count
:sum
:avg
:min
:max
:count-distinct
:arg-max
:arg-min
:collect-set
:percentile
:median

Aggregate notes:

:order-by can reference group keys and metric aliases
:having can reference group keys and metric aliases
aggregate responses include :limit and :has-more when truncation is possible
metric-local :where enables bounded conditional metrics such as count-if and sum-if
:arg-max / :arg-min return the metric :field value from the row with the highest/lowest :order-field, with deterministic row-id tie-breaking
:collect-set returns bounded distinct values in deterministic order
:percentile uses continuous interpolation over the numeric values in the metric field with :p in the range 0.0..1.0
:median is the 0.5 percentile over the numeric metric field
bucketed :group-by supports {:bucket {:op :date-trunc :unit :day|:week|:month}} and {:bucket {:op :numeric-bin :size <positive-number>}}
numeric-bin group keys return the inclusive lower bound of the bucket as a number

Limits And Behavior

Table resources (families) per workspace max: 500
Live rows per concrete table max: 50_000
Columns per table max: 200
Promoted/index fields per table max: 16
Partitions per family max: 128
Partitions touched per write max: 16
Selected partitions per query/aggregate/export max: 12
Selected partitions per read/schema max: 24
Partition key bytes max: 256
Cell max: 64 KB
Row payload max: 256 KB
Table max: 256 MB
Workspace table DB max: 2 GB
Rows per write max: 1000
Query page size max: 1000
Query scan window max: 10000 rows via page.offset + page.limit
Aggregate group max: 200
collect-set default item max per metric: 10
collect-set absolute item max per metric: 25
percentile requires numeric :p between 0.0 and 1.0
Single-table only
No arbitrary joins
:materialize-join is the only bounded join-like exception, and it always materializes into a destination table
No cross-workspace reads
No arbitrary SQL

Dedicated :materialize-join limits:

Inline :left {:rows ...} max: 1000 rows
Table-source window max per side: 10000 rows via :limit and :offset
Output row max: 10000
Join key max: 4
Projected right-field max: 64

The :table step is intentionally bounded. It is meant to feel like a table resource primitive, not a general database query engine.

Partitioned table families are first-class. Use :partitioning on :persist {:type :table ...} when data naturally splits by region, tenant, source, or date bucket and most reads/writes stay in one small partition set.

Design guidance when a dataset approaches bounded-table limits:

keep 50_000 live rows per concrete table or partition as a real boundary
keep the family root as the schema/metadata owner and select partition scope explicitly for query-like operations instead of expecting implicit all-partitions scans
use separate explicit tables when the data truly represents different datasets or lifecycles, not just as a workaround for missing partition support
if the workload mainly needs wide cross-partition scans, arbitrary joins, or general database behavior, prefer a dedicated :db step and an external database/query backend

The most common failures here are:

write rejected because the table would exceed 50_000 rows
write rejected because the table family would exceed 128 partitions
write rejected because a new observed column would exceed 200 columns
cell rejected because it exceeds 64 KB
query rejected because limit > 1000
query rejected because page.offset + page.limit > 10000
query/export rejected because the selected partition subset exceeds the bounded family limit

Create-On-Write Companion Pattern

Create the table by persisting rows on first write:

(flow/step :http :fetch-orders
  {:url "https://example.com/orders"
   :persist {:type :table
             :table "orders"
             :rows-path [:body :items]
             :write-mode :upsert
             :key-fields [:order-id]
             :indexes [{:field :status}
                       {:field :customer-id}]
             :columns [{:column :customer-id
                        :semantic-type :reference
                        :reference {:table "customers"
                                    :remote-field :customer-id}}
                       {:column :customer-name
                        :semantic-type :text
                        :computed {:type :lookup
                                   :reference-column :customer-id
                                   :field :name}}]}})

That step returns a table resource ref:

{:type :resource-ref
 :uri "res://v1/ws/ws-123/result/table/tbl_..."
 :preview {:table-name "orders"
           :write-mode :upsert
           :rows-written 100}
 :write {:mode :upsert
         :rows-written 100}}

Creation notes:

there is no separate "create table resource" step or registration call for workers
the runtime creates or updates the table family as part of the persisted write
downstream steps should keep passing the returned resource ref, not the original raw rows, when they mean "the persisted table"

Return A Table As Final Output

When the user should see a real table artifact in the run output, return the persisted table step result from a :breyta.viewer/kind :table viewer. This is different from returning a map that happens to contain :rows or :columns.

'(let [run-id (str "run-" (flow/now-ms))
       comparison-table
       (flow/step :function :build-comparison-table
                  {:input {:rows comparison-rows
                           :run-id run-id}
                   :code '(fn [{:keys [rows run-id]}]
                            {:rows
                             (mapv (fn [row]
                                     {:run_id run-id
                                      :paragraph (:paragraph row)
                                      :original (:original row)
                                      :cleaned (:cleaned row)
                                      :changed (:changed row)})
                                   rows)})
                   :persist {:type :table
                             :table (str "transcript-comparison-" run-id)
                             :rows-path [:rows]
                             :write-mode :upsert
                             :key-fields [:run_id :paragraph]
                             :columns [{:column :paragraph
                                        :display-name "Paragraph"}
                                       {:column :original
                                        :display-name "Original"}
                                       {:column :cleaned
                                        :display-name "Cleaned"}
                                       {:column :changed
                                        :display-name "Changed"}]}})]
   {:breyta.viewer/kind :table
    :breyta.viewer/options {:title "Original vs cleaned"}
    :breyta.viewer/value comparison-table})

The table viewer value should be the resource ref returned by the persisted step:

{:type :resource-ref
 :uri "res://v1/ws/ws-123/result/table/tbl_..."
 :content-type "application/vnd.breyta.table+json"
 :preview {:rows-written 44}}

If the output page or run sidepeek says that no tables are available, verify the final output is not just inline preview data. Use breyta resources read <table-uri> to confirm that the table resource has rows.

If the table is part of a larger written report, use the Markdown output
pattern instead: return a :markdown viewer envelope and embed the persisted
table with a fenced breyta-resource block. Markdown table embeds can select
columns, filter/sort rows, render aggregate charts, and add a separate
:view :download fence for CSV source export. See
Output Artifacts.

Query it later with :table:

(flow/step :table :open-orders
  {:op :query
   :table {:ref orders-ref}
   :select [:order-id :status :amount]
   :where [[:status := "open"]]
   :sort [[:order-id :asc]]
   :page {:mode :offset
          :limit 25
          :offset 0}})

Cursor-paged forward scan:

(flow/step :table :scan-orders
  {:op :query
   :table {:ref orders-ref}
   :select [:order-id :status]
   :sort [[:order-id :asc]]
   :page {:mode :cursor
          :limit 250}})

Query rules:

:page is required for :op :query
:table {:ref <resource-ref>} is canonical; bare refs work for simple ops.
:page.mode must be :offset or :cursor
:page.mode :offset accepts :offset and does not accept :cursor
:page.mode :cursor accepts :cursor, requires explicit :sort, and does not accept :offset
the first cursor page omits :page.cursor
cursor-paged :page.total-count is optional

Canonical Examples

Get one row:

(flow/step :table :load-order
  {:op :get-row
   :table {:ref orders-ref}
   :key {:order-id "ord-1"}})

Aggregate:

(flow/step :table :sales-by-currency
  {:op :aggregate
   :table {:ref orders-ref}
   :group-by [:currency]
   :metrics [{:op :count
              :where [[:status := "open"]]
              :as :open-count}
             {:op :sum :field :amount :as :total-amount}
             {:op :arg-max
              :field :order-id
              :order-field :amount
              :as :largest-order-id}]
   :order-by [[:total-amount :desc]]
   :limit 20})

(flow/step :table :sales-by-month
  {:op :aggregate
   :table {:ref orders-ref}
   :group-by [{:field :created-at
               :bucket {:op :date-trunc
                        :unit :month}
               :as :created-month}]
   :metrics [{:op :count :as :count}
             {:op :collect-set
              :field :currency
              :limit 5
              :as :currencies}]
   :having [[:count :>= 2]]
   :order-by [[:created-month :asc]]
   :limit 12})

(flow/step :table :amount-distribution
  {:op :aggregate
   :table {:ref orders-ref}
   :group-by [{:field :amount
               :bucket {:op :numeric-bin
                        :size 10}
               :as :amount-bin}]
   :metrics [{:op :count :as :count}
             {:op :percentile
              :field :amount
              :p 0.95
              :as :p95-amount}
             {:op :median
              :field :amount
              :as :median-amount}]
   :order-by [[:amount-bin :asc]]
   :limit 12})

Export:

(flow/step :table :export-orders
  {:op :export
   :table {:ref orders-ref}
   :format :csv
   :select [:order-id :status :amount]})

By default, in-flow :export returns the CSV text inline. Add top-level
:persist {:type :blob ...} when the export should become a downloadable
resource ref:

(flow/step :table :export-orders-csv
  {:op :export
   :table {:ref orders-ref}
   :format :csv
   :select [:order-id :status :amount]
   :persist {:type :blob
             :filename "orders.csv"
             :content-type "text/csv"}})

Materialize a joined destination table:

(flow/step :table :materialize-orders-with-customers
  {:op :materialize-join
   :left {:rows [{:order-id "ord-1" :customer-id "cust-1" :status "open"}]}
   :right {:table "customers"
           :select [:customer-id :name :domain]}
   :join-type :left
   :on [{:left-field :customer-id
         :right-field :customer-id}]
   :project {:keep-left :all
             :right-fields [{:field :name :as :customer-name}
                            {:field :domain :as :customer-domain}]}
   :into {:table "orders-enriched"
          :write-mode :upsert
          :key-fields [:order-id]
          :index-fields [:customer-name :status]}})

materialize-join is incremental materialization in v1:

:into :write-mode is :append or :upsert
there is no snapshot or :replace mode yet
existing destination rows are not deleted automatically when the source set shrinks

The same applies to normal table persists with :write-mode :upsert: matching
keys are updated, new keys are inserted, and omitted rows stay in the table. If a
flow represents a current snapshot, include a run/batch key in the table keys or
partitioning and query the latest batch. Do not expect a smaller rerun to delete
rows from an earlier larger extraction.

materialize-join also uses the current materialized row state of source tables:

computed/reference column values already materialized into the source table are joinable/projectable
the join does not re-evaluate computed expressions dynamically
run :recompute first if source-table derived values need to be refreshed before the join

Update one value:

(flow/step :table :close-order
  {:op :update-cell
   :table {:ref orders-ref}
   :key {:order-id "ord-1"}
   :column :status
   :value "closed"})

For partitioned table families, :update-cell cannot modify the partition-driving field. Single-cell updates stay within the explicitly selected partition; moving a row to a different partition requires a normal write/upsert into the target table.

Update one formatting override:

(flow/step :table :format-amount
  {:op :update-cell-format
   :table {:ref orders-ref}
   :key {:order-id "ord-1"}
   :column :amount
	   :format {:display "currency"
	            :currency "USD"}})

Display formatting is render-only:

column :format metadata and sparse :update-cell-format overrides can render relative-time, date, timestamp / date-time, and currency
the web table preview and Copy Markdown apply those formats to the currently visible page
:query, :get-row, and CSV export keep canonical raw values

Resource refs are also first-class cell values:

store canonical {:type :resource-ref :uri ...} maps in row data when a cell should point at another resource
the web table preview renders those cells as clickable resource chips and opens the target resource in the same panel or sidepeek
Copy Markdown uses the rendered label for the currently visible page
:query, :get-row, and CSV export keep the canonical raw resource-ref value

Author one logical column:

(flow/step :table :define-order-summary
  {:op :set-column
   :table {:ref orders-ref}
   :column :order-summary
   :definition {:semantic-type :text
                :computed {:type :expr
                           :expr {:op :concat
                                  :args [{:field :customer-name}
                                         " / "
                                         {:field :status}]}}}})

set-column automatically recomputes existing rows for bounded tables. Use :recompute later only when you want to rerun derived/reference values after some other change.

Dynamic enum columns:

(flow/step :table :define-status-enum
  {:op :set-column
   :table {:ref orders-ref}
   :column :status
   :definition {:display-name "Status"
                :enum {:options [{:id "open"
                                  :name "Open"
                                  :aliases ["OPEN" "Open"]}
                                 {:id "in-progress"
                                  :name "In progress"
                                  :aliases ["IN_PROGRESS" "In Progress"]}]}}})

Enum behavior:

:enum implies type-hint "enum"
writes, :update-cell, CSV import, and :recompute normalize incoming scalar values to stable ids
matching accepts existing ids, names, and aliases
unknown values dynamically grow the enum definition with a normalized id and a derived display name
stored row values, :query, :get-row, and CSV export keep the normalized ids
the web table preview and Copy Markdown render enum names instead of raw ids

Result Shapes

Representative responses:

;; :query
{:table-name "orders"
 :rows [{:order-id "ord-1" :status "open"}]
 :count 1
 :page {:mode :offset
        :limit 25
        :offset 0
        :total-count 107
        :has-more true
        :next-offset 25
        :prev-offset nil}}

;; :query cursor page
{:table-name "orders"
 :rows [{:order-id "ord-1" :status "open"}]
 :count 1
 :page {:mode :cursor
        :limit 250
        :page-size 1
        :has-more true
        :next-cursor "..."}}

;; :aggregate
{:results [{:currency "USD" :count 2 :total-amount 150.0}]
 :count 1}

;; :schema
{:table-name "orders"
 :key-fields ["order-id"]
 :index-fields ["status" "customer-id"]
 :columns [...]}

;; :update-cell
{:row {:order-id "ord-1" :status "closed"}}

;; :update-cell-format
{:format {:display "currency" :currency "USD"}}

;; :set-column
{:column {:column-name "order-summary"
          :semantic-type "text"
          :computed {:type "expr"}}
 :rows-updated 100
 :auto-recomputed true}

;; :recompute
{:rows-scanned 100
 :rows-updated 100
 :limit 1000
 :offset 0}

;; :materialize-join
{:table-name "orders-enriched"
 :rows-written 100
 :join {:join-type :left
        :matched-rows 90
        :unmatched-left-rows 10}
 :preflight {:left-row-count 100
             :right-row-count 50
             :right-duplicate-key-count 0
             :destination-duplicate-key-count 0}}

CLI Pairing

The CLI mirrors the shipped table operations:

breyta resources read <res://table-uri> --limit 25 --offset 0 --partition-key month-2026-03
breyta resources table query <res://table-uri> --limit 25 --offset 0 --partition-keys month-2026-03,month-2026-04
breyta resources table query <res://table-uri> --page-mode cursor --limit 250 --sort-json '[["order-id","asc"]]' --partition-key month-2026-03
breyta resources table get-row <res://table-uri> --key order-id=ord-1 --partition-key month-2026-03
breyta resources table aggregate <res://table-uri> --group-by currency --metrics-json '[{"op":"count","where":[["status","=","open"]],"as":"open-count"},{"op":"sum","field":"amount","as":"total-amount"},{"op":"arg-max","field":"order-id","order-field":"amount","as":"largest-order-id"}]' --order-by-json '[["total-amount","desc"]]' --partition-key month-2026-03
breyta resources table aggregate <res://table-uri> --group-by-json '[{"field":"created-at","bucket":{"op":"date-trunc","unit":"month"},"as":"created-month"}]' --metrics-json '[{"op":"count","as":"count"},{"op":"collect-set","field":"currency","limit":5,"as":"currencies"}]' --having-json '[["count",">=",2]]' --order-by-json '[["created-month","asc"]]'
breyta resources table aggregate <res://table-uri> --group-by-json '[{"field":"amount","bucket":{"op":"numeric-bin","size":10},"as":"amount-bin"}]' --metrics-json '[{"op":"count","as":"count"},{"op":"percentile","field":"amount","p":0.95,"as":"p95-amount"},{"op":"median","field":"amount","as":"median-amount"}]' --order-by-json '[["amount-bin","asc"]]'
breyta resources table schema <res://table-uri> --partition-key month-2026-03
breyta resources table export <res://table-uri> --out orders.csv --partition-key month-2026-03
breyta resources table import <res://table-uri> --file orders.csv --write-mode append --partition-key month-2026-03
breyta resources table import orders-import --file orders.csv --write-mode upsert --key-fields order-id --index-fields status
breyta resources table update-cell <res://table-uri> --key order-id=ord-1 --column status --value closed --partition-key month-2026-03
breyta resources table update-cell-format <res://table-uri> --key order-id=ord-1 --column amount --format-json '{"display":"currency","currency":"USD"}' --partition-key month-2026-03
breyta resources table set-column <res://table-uri> --column order-summary --semantic-type text --computed-json '{"type":"expr","expr":{"op":"concat","args":[{"field":"customer-name"}," / ",{"field":"status"}]}}' --partition-keys month-2026-03,month-2026-04
breyta resources table set-column <res://table-uri> --column status --enum-json '{"options":[{"id":"open","name":"Open","aliases":["OPEN","Open"]},{"id":"in-progress","name":"In progress","aliases":["IN_PROGRESS","In Progress"]}]}' --partition-keys month-2026-03,month-2026-04
breyta resources table recompute <res://table-uri> --limit 1000 --offset 0 --partition-key month-2026-03
breyta resources table materialize-join --left-json '{"table":{"ref":"res://...orders"}}' --right-json '{"table":{"ref":"res://...customers"}}' --on-json '[{"left-field":"customer-id","right-field":"customer-id"}]' --project-json '[{"field":"name","as":"customer-name"}]' --into-json '{"table":"joined-orders","write-mode":"upsert","key-fields":["order-id"]}'

Step Table (:table)