#Token efficiency for Facebook automation/scrapping

1 messages · Page 1 of 1 (latest)

nocturne notch
#

Hey everyone — been running OpenClaw on a Mac mini M4 for a few weeks and loving the concept, but I'm getting destroyed on API costs and need some advice.

The setup:

  • Scheduled cron task fires at 7am daily
  • Agent uses Chrome browser relay to scan 4 Facebook groups for construction leads
  • Takes 1-3 snapshots per group to read posts
  • Reads/writes a JSON dedup file
  • Sends results via Telegram
  • Model: Claude Sonnet (trying to move to Haiku)

The problem:
One scan run is costing me $5-6 every time. I just watched it burn $5.57 in a single run. For what should be a simple read-and-report task, that's insane.

What I suspect:
Full page snapshots of Facebook are dumping massive amounts of DOM/HTML into the context. 4 groups Ă— 2-3 snapshots = potentially hundreds of thousands of tokens per run. I also don't know if conversation history is being carried across tool calls within the same session.

My questions:

  1. Is there a way to limit what a snapshot captures — text only, visible content only, etc.?
  2. Is there a context trimming or transcript hygiene setting I should be using for scheduled tasks?
  3. Does each cron-triggered task start with a fresh context or does it carry history?
  4. Has anyone solved the browser snapshot token problem for scraping-style tasks?

Happy to share my full skill/prompt if helpful. Just trying to get this under $0.50 a run where it should be. Thanks 🦞

stoic rampartBOT
#

Yep — $5–6/run is totally consistent with “Facebook-sized” browser snapshots getting shoved into the model context.

A few concrete answers + the fixes that usually get this down by an order of magnitude:


1) Can you limit what a snapshot captures?

Yes. The browser tool supports much smaller snapshot modes than the default “AI snapshot of the whole page”.

From the browser docs (snapshot options): https://docs.openclaw.ai/tools/browser

The big levers:

  • Use efficient mode (compact role snapshot preset):
    • CLI: openclaw browser snapshot --efficient
    • Config default: browser.snapshotDefaults.mode: "efficient" (so callers that don’t specify a mode stop doing huge snapshots)
#
  • Scope the snapshot to just the feed container (this is the real win on FB):

    • CLI example from docs: openclaw browser snapshot --selector "#main" --interactive
    • In practice for FB you want to find the element that contains the posts (often something like the [role="feed"] area) and snapshot only that, not the whole page chrome/sidebars/comments/etc.
  • Reduce depth / compact output

    • CLI: --compact --depth 4 (or similar)
    • This prevents the “entire universe of nested divs” problem.
  • Text/inspection-only mode

    • --format aria --limit 200 gives you a capped accessibility tree (no refs; read-only). Useful if you only need to read text and don’t need to click.

Key point: OpenClaw isn’t sending raw HTML, but snapshots can still be massive on FB because the accessibility/UI tree contains tons of text + controls.


2) Context trimming / transcript hygiene for scheduled tasks?

Two different problems:

#

A) “Does my job carry history across days?”

Depends how you scheduled it (see Q3). If you use isolated cron, you already get clean runs (no transcript carry-over).

B) “Within a single run, do multiple snapshots accumulate?”

Yes. Tool results become part of the turn context. So if your agent does:

  • snapshot group 1
  • snapshot group 1 again
  • snapshot group 2
  • snapshot group 2 again
    …you’ve now stuffed all of that into the same run’s context.

So the main “hygiene” move is: don’t produce giant tool outputs in the first place (efficient + selector, or extract text via evaluate).

If you want better cost visibility while tuning: /usage full will append token/$ footers to replies. Docs: https://docs.openclaw.ai/reference/api-usage-costs


#

3) Does each cron-triggered task start fresh?

If you’re using isolated cron jobs: yes — each run starts a fresh session id.

From the cron jobs docs (Isolated jobs section): https://docs.openclaw.ai/automation/cron-jobs

“Isolated jobs run a dedicated agent turn in session cron:<jobId> … Each run starts a fresh session id (no prior conversation carry-over).”

If you scheduled it as a main session cron job (system event + heartbeat), then it does run in (and grow) your main session context.


4) Has anyone solved the “browser snapshot token problem” for scraping tasks?

The patterns that actually work:

  1. Stop snapshotting the whole page.
    • --efficient + --selector <feed container> is usually the fastest win.