What is streaming export? | Convex Community | Page 1

naive bluff Mar 4, 2026, 1:19 PM

#

How does it works under the hood? Is it streaming all changes one by one? Or waits for batches? Whats the delay? Is it possible to build it manually? Like airbyte does not have support for one of my destionation, so what are the possibilities?

mystic mauveBOT Mar 4, 2026, 1:19 PM

#

Hi @naive bluff. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Streaming Export in Convex

What is it?

Streaming export allows you to continuously sync your Convex data to external databases or analytics platforms. It's useful for workloads Convex isn't optimized for, such as: [Streaming Export]

Analytics (e.g., Databricks, Snowflake)
Flexible/complex querying (e.g., ElasticSearch)
Machine learning training

⚠️ Streaming export requires a Convex Pro plan.

How does it work under the hood?

The streaming export API works in two phases: [Streaming Export API]

Full snapshot (GET /api/list_snapshot): First, it walks a consistent point-in-time snapshot of your documents, paginating through all records.
Delta sync (GET /api/document_deltas): After the snapshot, it continuously fetches changes (inserts, updates, deletions) ordered by a _ts (timestamp) field.

Deletions are represented as JSON objects with _id, _ts, and _deleted: true. The delta endpoint is cursor-based — you pass a cursor from the previous call to get the next batch of changes. [Document Deltas]

So to answer your specific questions:

It does not stream one change at a time — it uses paginated batches via cursor.
The delay/latency is not explicitly specified in the documentation.

Can you build it manually (without Airbyte/Fivetran)?

Yes! The underlying HTTP APIs are public and documented, so you can call them directly. Here's the pseudocode pattern: [Document Deltas]

#

# Step 1: Full snapshot
def list_full_snapshot():
    snapshot_values = []
    snapshot = None
    cursor = None
    while True:
        result = api.list_snapshot(cursor, snapshot)
        snapshot_values.extend(result.values)
        (cursor, snapshot) = (result.cursor, result.snapshot)
        if not result.hasMore:
            break
    return (snapshot_values, result.snapshot)

# Step 2: Ongoing delta sync
def delta_sync(delta_cursor):
    delta_values = []
    while True:
        result = api.document_deltas(delta_cursor)
        delta_values.extend(result.values)
        cursor = result.cursor
        if not result.hasMore:
            break
    return (delta_values, delta_cursor)

(snapshot_values, delta_cursor) = list_full_snapshot()
(delta_values, delta_cursor) = delta_sync(delta_cursor)
# Save delta_cursor for the next sync

You authenticate these requests using your deployment's Deploy key via the header Authorization: Convex <access_key>. [Streaming Export API]

So if your destination isn't supported by Airbyte or Fivetran, you can write a custom integration by polling these HTTP endpoints and pushing data to your destination yourself.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@naive bluff Kapa.ai is still learning and improving, please let me know how I did by reacting below.

naive bluff Mar 4, 2026, 1:22 PM

#

So this means not the convex does the syncing (like via workpool, actions, etc), but instead the destionation pulls data? @mystic mauve

mystic mauveBOT Mar 4, 2026, 1:22 PM

#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, that's correct. The streaming export API is a pull-based model — the destination (or your custom integration) is responsible for calling Convex's HTTP endpoints (/api/list_snapshot and /api/document_deltas) to fetch data. Convex exposes these endpoints, and the consumer polls them using cursors to get new changes. [Streaming Export API]

This is exactly how Fivetran and Airbyte work with Convex — they act as the puller, calling these APIs on a schedule and syncing the results to the destination. [Streaming Export]

So if you're building a custom integration for an unsupported destination, you would be responsible for:

Periodically calling the Convex HTTP endpoints
Persisting the cursor between runs
Writing the fetched data to your destination

Convex itself does not push data anywhere in this setup.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@naive bluff Kapa.ai is still learning and improving, please let me know how I did by reacting below.

naive bluff Mar 4, 2026, 1:23 PM

#

And is this direction better, or a convex as push model? I mean pushing changes through convex with workpool, etc? @mystic mauve

mystic mauveBOT Mar 4, 2026, 1:23 PM

#

Let me check the knowledge base for you again. One moment! :books:

#

Great question. The knowledge sources actually discuss this tradeoff directly in the context of push-based vs. pull-based (work stealing) architectures. Here's how it breaks down:

Pull-based (Destination pulls from Convex) — Benefits

No load balancing or service discovery needed — the consumer simply calls the API when ready. [Work Stealing]
Workers control their own pace — they decide when and how much data to fetch, which helps with backpressure and avoiding overload.
Multiplexed subscriptions — multiple consumers can independently pull the same data stream.

Pull-based — Challenges

Polling overhead — you need to periodically call the endpoints even if nothing has changed.
Failures are harder to detect — unlike an open HTTP connection, there's no automatic signal if the consumer dies. [Work Stealing]
Latency — there's an inherent delay between a change happening and the consumer polling for it.

Push-based (Convex pushes changes via actions, scheduled functions, etc.) — Benefits

Lower latency — you can react to changes immediately as they happen (e.g., via reactive queries triggering actions).
Standard request/response model — easier to reason about errors and attribution. [Work Stealing]

Push-based — Challenges

Load balancing complexity — you need to manage where and how fast you push data.
Connection lifecycle — if the destination is down, you need retry logic and error handling yourself.

Which is better?

#

It depends on your use case. The knowledge sources suggest: [Making the Call]

Pull-based is better for analytics/data warehouse syncing (your streaming export use case), where throughput and eventual consistency matter more than low latency, and where the destination controls its own ingestion rate.
Push-based is better for user-facing, low-latency workloads where you need to react immediately to changes.

For syncing to an external data warehouse or BI tool, the pull-based streaming export API is the more natural fit — it's exactly what Fivetran and Airbyte use, and it decouples Convex's performance from the destination's availability.

#What is streaming export?

Streaming Export in Convex

What is it?

How does it work under the hood?

Can you build it manually (without Airbyte/Fivetran)?

So if your destination isn't supported by Airbyte or Fivetran, you can write a custom integration by polling these HTTP endpoints and pushing data to your destination yourself.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Convex itself does not push data anywhere in this setup.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

Pull-based (Destination pulls from Convex) — Benefits

Pull-based — Challenges

Push-based (Convex pushes changes via actions, scheduled functions, etc.) — Benefits

Push-based — Challenges

Which is better?

For syncing to an external data warehouse or BI tool, the pull-based streaming export API is the more natural fit — it's exactly what Fivetran and Airbyte use, and it decouples Convex's performance from the destination's availability.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: