#?

28 messages · Page 1 of 1 (latest)

cold willow
#

i have an etl process that requires a table to be clear before data are loaded wgat is the best practice for it

modest ravenBOT
#

Hi @cold willow. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

For clearing a table before loading data in an ETL process, Convex offers a few approaches:

Option 1: CLI (Recommended for ETL)

Use npx convex import with the --replace flag and an empty file. This is the most efficient one-liner:

npx convex import --replace --table <tableName> --format jsonLines /dev/null -y

For production, add --prod:

npx convex import --prod --replace --table <tableName> --format jsonLines /dev/null -y

This effectively clears the table by "importing" an empty dataset in replace mode. [YOLO MVP Tips]

If you need to clear all tables at once:

for tableName in `npx convex data`; do npx convex import --table $tableName --replace -y --format jsonLines /dev/null; done

Option 2: Dashboard

You can also clear a table via the Convex Dashboard by clicking the overflow menu on the data page and selecting "Clear Table". This deletes all documents without removing the table itself. [Dashboard Data]

Option 3: Programmatic (via Mutation)

If you need to clear a table programmatically as part of your ETL logic, you can write a mutation that queries and deletes all documents:

export const clearAll = mutation(async (ctx) => {
  const docs = await ctx.db.query("yourTable").collect();
  await Promise.all(docs.map((doc) => ctx.db.delete(doc._id)));
});

[Testing with Local Backend]


#

For a typical ETL pipeline, the CLI approach (--replace with an empty file) is the most practical since it's scriptable and works well for both dev and production deployments.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cold willow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cold willow
#

@modest raven assuming that there are hundred of thousands of rows to clear, what is the best practice

modest ravenBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

For clearing hundreds of thousands of rows, the CLI import approach is still the best practice, as it bypasses Convex's transaction limits entirely:

npx convex import --replace --table <tableName> --format jsonLines /dev/null -y

This is the recommended approach because it doesn't go through a mutation transaction, so it won't hit Convex's per-transaction read/write limits. [Data Import]


Why not a programmatic mutation?

If you try to delete hundreds of thousands of rows via a mutation, you will hit Convex's transaction limits. Specifically:

  • There is a 4,096 read limit per function execution, and each delete also requires a read, so you'd hit the ceiling quickly with large tables. [Too Many Reads]
  • Mutations enforce transaction limits to prevent long-lived transactions that could grind the database to a halt.

If you must do it programmatically

If for some reason you need to clear the table via code (e.g., as part of a larger workflow), the correct pattern is to use recursive scheduled mutations with pagination, processing a safe batch size (e.g., 500 documents) per mutation run:

import { internalMutation } from "./_generated/server";
import { internal } from "./_generated/api";

const MiB = 1 << 20;
#

export const clearTable = internalMutation({
args: {},
handler: async (ctx) => {
const docs = await ctx.db.query("yourTable").paginate({ cursor: null, numItems: 500 });
for (const doc of docs.page) {
await ctx.db.delete(doc._id);
const metrics = await ctx.meta.getTransactionMetrics();
if (
metrics.bytesRead.used > 4 * MiB ||
metrics.bytesWritten.used > 2 * MiB ||
metrics.databaseQueries.remaining < 500
) {
await ctx.scheduler.runAfter(0, internal.yourFile.clearTable);
break;
}
}
},
});


[[Write Performance](<https://docs.convex.dev/database/writing-data#write-performance-and-limits>)]

---

**Bottom line:** For an ETL process with hundreds of thousands of rows, use the CLI `--replace` with an empty file. It's the most efficient, scalable, and safe approach.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
#

@cold willow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cold willow
#

@modest raven but how do imake the insert funcs wait till clear is done

modest ravenBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Great question. Since the CLI --replace import is a synchronous command, the simplest approach is to sequence your ETL steps in your shell script — the import command blocks until completion, so you just run your insert logic after it:

# Step 1: Clear the table (blocks until done)
npx convex import --replace --table <tableName> --format jsonLines /dev/null -y

# Step 2: Load new data (only runs after clear is complete)
npx convex import --table <tableName> newdata.jsonl

If you're doing this programmatically (via mutations)

If you need to orchestrate the clear + insert entirely in code, the key constraint is that each mutation is its own transaction — you cannot span a clear and an insert across one atomic transaction for hundreds of thousands of rows due to transaction limits. [Write Performance]

The recommended pattern is to use the Convex Workflow component, which lets you define sequential steps with durability guarantees — step 2 (insert) only runs after step 1 (clear) completes:

// Pseudocode using the Workflow component
export const etlWorkflow = workflow.define({
  handler: async (ctx) => {
    await ctx.runMutation(internal.yourFile.clearTable, {});  // Step 1
    await ctx.runMutation(internal.yourFile.bulkInsert, { data });  // Step 2
  },
});

The Workflow component tracks state and retries automatically, so if the clear step fails partway through, it won't proceed to the insert. [Durable Workflows]


#

Bottom line: For a CLI-based ETL, shell sequencing is sufficient since npx convex import is synchronous. For a fully programmatic approach, the Workflow component is the safest way to ensure the insert waits for the clear to finish.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cold willow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cold willow
#

@modest raven how do i wrap this in a type:

modest ravenBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Your question seems incomplete — "how do i wrap this in a type" doesn't specify what you'd like to type. Could you clarify what you're referring to? For example, are you asking about:

  • Typing the workflow handler (e.g., annotating the return type of workflow.define)
  • Typing a mutation or action context (e.g., MutationCtx, ActionCtx)
  • Typing arguments using validators and Infer
  • Typing document shapes using Doc<"tableName">

Based on the conversation so far, here are the most likely things you might want to type:

Typing a workflow handler

The docs recommend always annotating the return type of the handler to avoid type cycles: [Workflow Usage]

export const etlWorkflow = workflow.define({
  args: { /* ... */ },
  handler: async (ctx, args): Promise<void> => {
    // ...
  },
});

Typing a mutation context helper

import { MutationCtx } from "./_generated/server";

export async function clearTableHelper(ctx: MutationCtx) {
  const docs = await ctx.db.query("yourTable").collect();
  for (const doc of docs) {
    await ctx.db.delete(doc._id);
  }
}

[TypeScript Helpers]

Typing arguments with Infer

import { Infer, v } from "convex/values";

const rowValidator = v.object({ name: v.string(), value: v.number() });
type Row = Infer<typeof rowValidator>;

[TypeScript Helpers]

Please share more details about what you're trying to type and I'll give you a more precise answer!

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cold willow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cold willow
#

@modest raven how do i wrap this in a type:
routes: defineTable({
route_id: v.string(),
route_short_name: v.string(),
route_long_name: v.string(),
route_desc: v.nullable(v.string()),
route_type: v.number(),
agency_id: v.string(),
route_color: v.nullable(v.string()),
route_text_color: v.nullable(v.string()),
}).index("by_route_id", ["route_id"]),

modest ravenBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You can extract a TypeScript type from that table definition using the Infer type helper from convex/values, or by using the generated Doc type.

Option 1: Using Infer on a validator (recommended for reuse)

Define the validator separately, then use Infer to extract the type:

import { defineSchema, defineTable } from "convex/server";
import { v, Infer } from "convex/values";

const routeValidator = v.object({
  route_id: v.string(),
  route_short_name: v.string(),
  route_long_name: v.string(),
  route_desc: v.nullable(v.string()),
  route_type: v.number(),
  agency_id: v.string(),
  route_color: v.nullable(v.string()),
  route_text_color: v.nullable(v.string()),
});

// Extracted TypeScript type
export type Route = Infer<typeof routeValidator>;

export default defineSchema({
  routes: defineTable(routeValidator).index("by_route_id", ["route_id"]),
});

[Extracting TypeScript types]

Option 2: Using the generated Doc type

Once your schema is defined and you've run npx convex dev, Convex generates a Doc type you can use anywhere in your codebase:

import { Doc } from "../convex/_generated/dataModel";

// Includes system fields _id and _creationTime
type Route = Doc<"routes">;

[TypeScript types]


#

Which to use?

  • Use Infer when you want a type without system fields (e.g., for inserting data or passing partial objects around).
  • Use Doc<"routes"> when you want the full document type including _id and _creationTime (e.g., for displaying data returned from a query). [Doc type]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@cold willow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

cold willow
#

@modest raven why cant i do this:
import { Doc } from "./_generated/dataModel";

export const batchInsertRoutes = internalMutation({
args: {
routes: v.array(v.doc("routes")),
},
handler: async (ctx, args) => {
for (const route of args.routes) {
await ctx.db.insert("routes", route);
}
},
});

modest ravenBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

The issue is that v.doc("routes") is not a built-in validator in the standard convex/values v object. The standard v does not have a .doc() method — that's only available via the convex-helpers library or the typedV utility. [Helpers for schema]

How to fix it

Option 1: Use v.object(...) with your route fields directly

The cleanest approach is to export your route fields from your schema and reuse them:

// convex/schema.ts
export const routeFields = {
  route_id: v.string(),
  route_short_name: v.string(),
  route_long_name: v.string(),
  route_desc: v.nullable(v.string()),
  route_type: v.number(),
  agency_id: v.string(),
  route_color: v.nullable(v.string()),
  route_text_color: v.nullable(v.string()),
};

export default defineSchema({
  routes: defineTable(routeFields).index("by_route_id", ["route_id"]),
});
// convex/routes.ts
import { internalMutation } from "./_generated/server";
import { v } from "convex/values";
import { routeFields } from "./schema";

export const batchInsertRoutes = internalMutation({
  args: {
    routes: v.array(v.object(routeFields)),
  },
  handler: async (ctx, args) => {
    for (const route of args.routes) {
      await ctx.db.insert("routes", route);
    }
  },
});

[Argument Validation without Repetition]

Option 2: Use convex-helpers doc validator

If you want v.doc()-like behavior, install convex-helpers and use its doc helper:

import { doc } from "convex-helpers/validators";
import schema from "./schema";
#

export const batchInsertRoutes = internalMutation({
args: {
routes: v.array(doc(schema, "routes")),
},
handler: async (ctx, args) => {
for (const route of args.routes) {
await ctx.db.insert("routes", route);
}
},
});


Note that `doc(schema, "routes")` includes system fields (`_id`, `_creationTime`), so you'd need to strip those before inserting. [[Helpers for schema](<https://stack.convex.dev/argument-validation-without-repetition#helpers-for-schema-definition--validation>)]

---

**Bottom line:** Use **Option 1** (exporting `routeFields` from your schema) — it's the idiomatic Convex pattern with no extra dependencies and works perfectly for bulk inserts.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------