#What is streaming export?
13 messages · Page 1 of 1 (latest)
Hi @naive bluff. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Streaming Export in Convex
What is it?
Streaming export allows you to continuously sync your Convex data to external databases or analytics platforms. It's useful for workloads Convex isn't optimized for, such as: [Streaming Export]
- Analytics (e.g., Databricks, Snowflake)
- Flexible/complex querying (e.g., ElasticSearch)
- Machine learning training
⚠️ Streaming export requires a Convex Pro plan.
How does it work under the hood?
The streaming export API works in two phases: [Streaming Export API]
- Full snapshot (
GET /api/list_snapshot): First, it walks a consistent point-in-time snapshot of your documents, paginating through all records. - Delta sync (
GET /api/document_deltas): After the snapshot, it continuously fetches changes (inserts, updates, deletions) ordered by a_ts(timestamp) field.
Deletions are represented as JSON objects with _id, _ts, and _deleted: true. The delta endpoint is cursor-based — you pass a cursor from the previous call to get the next batch of changes. [Document Deltas]
So to answer your specific questions:
- It does not stream one change at a time — it uses paginated batches via cursor.
- The delay/latency is not explicitly specified in the documentation.
Can you build it manually (without Airbyte/Fivetran)?
Yes! The underlying HTTP APIs are public and documented, so you can call them directly. Here's the pseudocode pattern: [Document Deltas]
# Step 1: Full snapshot
def list_full_snapshot():
snapshot_values = []
snapshot = None
cursor = None
while True:
result = api.list_snapshot(cursor, snapshot)
snapshot_values.extend(result.values)
(cursor, snapshot) = (result.cursor, result.snapshot)
if not result.hasMore:
break
return (snapshot_values, result.snapshot)
# Step 2: Ongoing delta sync
def delta_sync(delta_cursor):
delta_values = []
while True:
result = api.document_deltas(delta_cursor)
delta_values.extend(result.values)
cursor = result.cursor
if not result.hasMore:
break
return (delta_values, delta_cursor)
(snapshot_values, delta_cursor) = list_full_snapshot()
(delta_values, delta_cursor) = delta_sync(delta_cursor)
# Save delta_cursor for the next sync
You authenticate these requests using your deployment's Deploy key via the header Authorization: Convex <access_key>. [Streaming Export API]
So if your destination isn't supported by Airbyte or Fivetran, you can write a custom integration by polling these HTTP endpoints and pushing data to your destination yourself.
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@naive bluff Kapa.ai is still learning and improving, please let me know how I did by reacting below.
So this means not the convex does the syncing (like via workpool, actions, etc), but instead the destionation pulls data? @mystic mauve
Let me check the knowledge base for you again. One moment! :books:
Yes, that's correct. The streaming export API is a pull-based model — the destination (or your custom integration) is responsible for calling Convex's HTTP endpoints (/api/list_snapshot and /api/document_deltas) to fetch data. Convex exposes these endpoints, and the consumer polls them using cursors to get new changes. [Streaming Export API]
This is exactly how Fivetran and Airbyte work with Convex — they act as the puller, calling these APIs on a schedule and syncing the results to the destination. [Streaming Export]
So if you're building a custom integration for an unsupported destination, you would be responsible for:
- Periodically calling the Convex HTTP endpoints
- Persisting the cursor between runs
- Writing the fetched data to your destination
Convex itself does not push data anywhere in this setup.
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@naive bluff Kapa.ai is still learning and improving, please let me know how I did by reacting below.
And is this direction better, or a convex as push model? I mean pushing changes through convex with workpool, etc? @mystic mauve
Let me check the knowledge base for you again. One moment! :books:
Great question. The knowledge sources actually discuss this tradeoff directly in the context of push-based vs. pull-based (work stealing) architectures. Here's how it breaks down:
Pull-based (Destination pulls from Convex) — Benefits
- No load balancing or service discovery needed — the consumer simply calls the API when ready. [Work Stealing]
- Workers control their own pace — they decide when and how much data to fetch, which helps with backpressure and avoiding overload.
- Multiplexed subscriptions — multiple consumers can independently pull the same data stream.
Pull-based — Challenges
- Polling overhead — you need to periodically call the endpoints even if nothing has changed.
- Failures are harder to detect — unlike an open HTTP connection, there's no automatic signal if the consumer dies. [Work Stealing]
- Latency — there's an inherent delay between a change happening and the consumer polling for it.
Push-based (Convex pushes changes via actions, scheduled functions, etc.) — Benefits
- Lower latency — you can react to changes immediately as they happen (e.g., via reactive queries triggering actions).
- Standard request/response model — easier to reason about errors and attribution. [Work Stealing]
Push-based — Challenges
- Load balancing complexity — you need to manage where and how fast you push data.
- Connection lifecycle — if the destination is down, you need retry logic and error handling yourself.
Which is better?
It depends on your use case. The knowledge sources suggest: [Making the Call]
- Pull-based is better for analytics/data warehouse syncing (your streaming export use case), where throughput and eventual consistency matter more than low latency, and where the destination controls its own ingestion rate.
- Push-based is better for user-facing, low-latency workloads where you need to react immediately to changes.