#What is the suggested way to do web scraping from within a server function?

4 messages · Page 1 of 1 (latest)

wet elm Nov 18, 2025, 10:47 PM

What is the suggested way to do web scraping from within a server function, if at all reasonable?

weary cliffBOT Nov 18, 2025, 10:47 PM

Hi @wet elm. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

Convex doesn’t have a single “scraping API”, but it does support doing web scraping from actions, and there are examples that show the recommended patterns.

1. Use an action (not a query/mutation)

Network requests like fetch are only allowed in actions, not in queries or mutations. Scraping examples in the Convex docs and blog are implemented as actions that:

fetch HTML from a URL
Parse it (cheerio or custom HTML → text/markdown)
Store results via ctx.runMutation or ctx.db inside an internal mutation. [Server functions; Runtimes]

Example pattern (scraping one page in an action):

export const fetchSingle = internalAction({
  args: { url: v.string() },
  handler: async (ctx, { url }) => {
    const response = await fetch(url);
    const text = parsePage(await response.text());
    if (text.length > 0) {
      await ctx.runMutation(internal.ingest.load.updateDocument, { url, text });
    }
  },
});

And the mutation does the database work (chunking, etc.). [Vector search ingest]

Another example uses cheerio to scrape GitHub pages inside an action: [GitHub scraping]

import * as cheerio from "cheerio";

const getGithubRepoPageData = async (owner: string, name: string) => {
  const html = await fetch(`<https://github.com/${owner}/${name}`>).then(res => res.text());
  const $ = cheerio.load(html);
  // ...select & parse counts...
};

2. Prefer helpers over `ctx.runAction` when staying in one runtime

If your scraping logic lives entirely in actions using the same runtime, Convex recommends putting the scraping logic into plain TypeScript helper functions and calling them directly, rather than chaining ctx.runAction calls. runAction should mainly be used when you actually need to cross runtimes (Convex ↔ Node). [Use runAction only when using a different runtime]

Pattern:

Action A:
- Uses fetch + parsing helpers
- Calls ctx.runMutation(...) to store results
Shared helper scrapeSinglePage(ctx, { url }) holds the scraping logic and is called directly from actions.

3. Reasonableness / limits

The docs and Stack posts explicitly use Convex actions for scraping websites and sitemaps, so this is a supported and reasonable pattern, as long as:

It’s done from actions (not queries/mutations).
You keep work per action within Convex’s function resource limits (e.g. batch or paginate, as shown in the GitHub scraping tutorial). [Dealing with flaky data]

In short: yes, web scraping from server functions is reasonable, and the suggested way is to implement it in actions using `fetch` + parsing, with database writes delegated to mutations, and helpers instead of nested `runAction` calls.

#What is the suggested way to do web scraping from within a server function?

1. Use an action (not a query/mutation)

2. Prefer helpers over ctx.runAction when staying in one runtime

3. Reasonableness / limits

In short: yes, web scraping from server functions is reasonable, and the suggested way is to implement it in actions using fetch + parsing, with database writes delegated to mutations, and helpers instead of nested runAction calls.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

2. Prefer helpers over `ctx.runAction` when staying in one runtime

In short: yes, web scraping from server functions is reasonable, and the suggested way is to implement it in actions using `fetch` + parsing, with database writes delegated to mutations, and helpers instead of nested `runAction` calls.