#Local firecrawl
1 messages · Page 1 of 1 (latest)
Rex Horizon, first thing: please grab openclaw --version. Firecrawl behavior has moved around, and version mismatch is a common source of weird web-tool behavior.
I checked the current docs/source. The important bits:
- OpenClaw’s built-in Firecrawl config is under
plugins.entries.firecrawl.config.webFetch.*. - Firecrawl fallback only runs when a Firecrawl key is available via
plugins.entries.firecrawl.config.webFetch.apiKeyorFIRECRAWL_API_KEY. web_fetchorder is: local readability → Firecrawl fallback → basic HTML cleanup.- Firecrawl proxy mode is
basic | stealth | auto. OpenClaw sendsproxy: "auto"andstoreInCache: truefor Firecrawl scrape requests. If Firecrawl retries with stealth, that retry is inside Firecrawl, so OpenClaw may only show one Firecrawl request. - Current docs/source restrict Firecrawl
baseUrltohttps://api.firecrawl.dev, so a plain local Firecrawl URL is not something the bundled plugin appears to accept. If you are using local Firecrawl, we should confirm how you wired it in.
Docs:
https://docs.openclaw.ai/tools/firecrawl
https://docs.openclaw.ai/tools/web-fetch
https://docs.openclaw.ai/cli/logs
Testing path:
- Confirm config:
openclaw --version
openclaw config get tools.web.fetch.provider
openclaw config get plugins.entries.firecrawl.config.webFetch
openclaw config validate
- Tail logs in one terminal:
openclaw logs --follow --plain --local-time
- In another terminal, force a
web_fetchagainst a URL where plain fetch/readability is likely to fail or return junk. Ask OpenClaw very explicitly:
Use web_fetch on <URL>. Return the tool details: extractor, status, finalUrl, warning, and first 500 chars.
What you want to see in the tool result:
extractor: "readability"means Firecrawl was not needed.extractor: "firecrawl"means Firecrawl fallback was used.extractor: "raw-html"means Firecrawl did not run or did not return usable content.
- To test proxy behavior directly, use/ask for the explicit plugin tool, not generic
web_fetch:
Use firecrawl_scrape on <URL> with proxy "basic", then use firecrawl_scrape on the same URL with proxy "stealth", storeInCache false, and compare status/extractor/warning.
That removes the ambiguity around whether local readability succeeded before Firecrawl was ever tried.
- If you are expecting local Firecrawl logs to show traffic, check whether OpenClaw is actually configured to hit local Firecrawl. Based on current docs/source, the bundled Firecrawl plugin only allows
https://api.firecrawl.devas the base host. If your local Firecrawl logs are empty, that is probably why.
Paste the output of the three config commands plus the relevant openclaw logs --plain lines around the test run, and I can help read whether it used readability, Firecrawl fallback, or explicit firecrawl_scrape.
I'm running with version 2025.5.12 and its a firecrawl MCP server, the actually firecrawl is running under docker on a local server on windows called beelink
Got it. That means my earlier built-in Firecrawl-provider notes are mostly the wrong layer. For your setup the flow is:
OpenClaw agent → MCP client → Firecrawl MCP server → Firecrawl API/container on beelink → optional proxy behavior
So the retry question depends on whether the agent makes a second MCP tool call or the Firecrawl MCP server/container retries internally.
Also: did you mean 2026.5.12? If it really is 2025.5.12, I’d treat that as very old and upgrade before spending too much time chasing behavior.
Do this first:
openclaw --version
openclaw mcp list
openclaw mcp show firecrawl --json
openclaw config get mcp.servers --json
That confirms whether OpenClaw is using stdio, SSE, or streamable HTTP, and whether it’s pointing at beelink / the MCP wrapper you expect. Docs for that bit: https://docs.openclaw.ai/cli/mcp
Then tail OpenClaw logs while running one controlled prompt:
openclaw logs --follow --plain --local-time
In the test prompt, remove ambiguity:
Use the Firecrawl MCP tool on <URL> once with the normal/basic proxy mode if available.
Then use it again with enhanced/stealth proxy mode if available.
Return the exact MCP tool names called and the arguments used, especially proxy/enhanced/proxyMode.
If the MCP server exposes a proxy argument, this should produce two visible tool calls with different args. If it doesn’t, OpenClaw can’t magically force proxy behavior except by whatever prompt language the MCP tool understands.
On the Windows/beelink side, watch the actual container logs during the same test:
docker ps
docker logs -f <firecrawl_container_name>
or if you’re using compose:
docker compose logs -f
What to look for:
- OpenClaw logs show one MCP call, Firecrawl logs show one request: no retry happened.
- OpenClaw logs show two MCP calls with different proxy args: OpenClaw/agent performed the retry.
- OpenClaw logs show one MCP call, Firecrawl logs show basic then stealth/enhanced internally: retry is happening inside Firecrawl/MCP, not OpenClaw.
- OpenClaw logs show the MCP tool never called: the model chose another tool or the MCP tools are not exposed to that agent/tool profile.
One nuance from the docs: openclaw mcp show/list only proves config exists; it doesn’t prove the server is reachable. The real proof is the paired OpenClaw log + MCP/Firecrawl container log during a forced two-call test.
Paste the openclaw mcp show firecrawl --json output with secrets removed, plus 20-30 log lines around the test, and I can help read exactly where the retry is or isn’t happening.
{
"command": "npx",
"args": [
"-y",
"firecrawl-mcp"
],
"env": {
"FIRECRAWL_API_URL": "http://beelink:3002",
"FIRECRAWL_API_KEY": "local-bypass",
"FIRECRAWL_SCRAPE_OPTIONS": "{"onlyMainContent":true,"formats":["markdown"],"proxy":"auto"}",
"FIRECRAWL_TIMEOUT": "90000"
},
"enabled": true
}
{
"sql-production": {
"command": "npx",
"args": [
"-y",
"@bilims/mcp-sqlserver"
],
"env": {
"SQLSERVER_HOST": "beelink",
"SQLSERVER_USER": "PiClaw",
"SQLSERVER_PASSWORD": "OPENCLAW_REDACTED",
"SQLSERVER_DATABASE": "NCSSQLStats"
},
"enabled": true
},
"firecrawl": {
"command": "npx",
"args": [
"-y",
"firecrawl-mcp"
],
"env": {
"FIRECRAWL_API_URL": "http://beelink:3002",
"FIRECRAWL_API_KEY": "OPENCLAW_REDACTED",
"FIRECRAWL_SCRAPE_OPTIONS": "{"onlyMainContent":true,"formats":["markdown"],"proxy":"auto"}",
"FIRECRAWL_TIMEOUT": "90000"
},
"enabled": true
},
"chart": {
"command": "npx",
"args": [
"-y",
"@antv/mcp-server-chart"
],
"enabled": true
}
}
This config explains why you’re not convinced it’s retrying: in the current firecrawl-mcp package I checked, FIRECRAWL_SCRAPE_OPTIONS is not read by the MCP server code. Same for FIRECRAWL_TIMEOUT as far as the MCP wrapper is concerned.
The MCP firecrawl_scrape tool does support a proxy argument directly:
"proxy": "basic" | "stealth" | "enhanced" | "auto"
But your env var:
"FIRECRAWL_SCRAPE_OPTIONS": "{"onlyMainContent":true,"formats":["markdown"],"proxy":"auto"}"
probably is not doing anything unless the self-hosted Firecrawl container itself happens to read it, which I doubt from this path. The MCP server is passing only the actual MCP tool arguments to client.scrape(...).
So the practical test is:
Use the firecrawl_scrape MCP tool on <URL> with these exact args:
{
"url": "<URL>",
"formats": ["markdown"],
"onlyMainContent": true,
"proxy": "basic"
}
Then call firecrawl_scrape again on the same URL with:
{
"url": "<URL>",
"formats": ["markdown"],
"onlyMainContent": true,
"proxy": "enhanced"
}
Return the exact tool arguments used for both calls.
If you want Firecrawl’s internal auto behavior, use:
{
"proxy": "auto"
}
But if you want to prove enhanced is being attempted, force the second call to "enhanced" or "stealth".
Also worth changing your MCP config so it doesn’t rely on ignored env defaults. The config can stay simple:
{
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_URL": "http://beelink:3002",
"FIRECRAWL_API_KEY": "local-bypass"
},
"enabled": true
}
Then put scrape options in the prompt/tool call, not env. After changing MCP config, restart the gateway/session so the stdio MCP server is respawned.
One more pragmatic thing: npx -y firecrawl-mcp floats to latest each startup. For debugging, pin it temporarily so behavior doesn’t move under you:
"args": ["-y", "firecrawl-mcp@3.16.0"]
For logs, watch both sides during the forced two-call test:
OpenClaw:
openclaw logs --follow --plain --local-time
Beelink:
docker compose logs -f
Expected outcomes:
- Two MCP calls with
proxy: basicthenproxy: enhanced: the agent is doing the retry. - One MCP call with
proxy: auto: only Firecrawl/container can be doing internal retry. - MCP args have no
proxy: your env default is not being applied, so enhanced retry is not actually requested from OpenClaw/MCP.
Actually I got openclaw to modify the MCP to accept these parameters, this is the result of the test
Both scrapes completed:
Basic proxy ✅ — Full BBC homepage scraped successfully (200 OK, 1 credit used). Content includes all the main sections: politics, sport, entertainment, food, health, money, and trending iPlayer/Sounds picks.
Enhanced proxy ❌ — Failed with document_antibot error (aborted after exceeding retry limit). The enhanced proxy likely triggered BBC's anti-bot protections.
Key takeaway: For BBC, the basic proxy works fine. The enhanced proxy is more likely to get blocked by sites with aggressive anti-scraping measures.
Yep, that result is useful: it proves the MCP path is accepting the proxy parameter and making distinct Firecrawl requests.
I’d read it this way:
- The retry mechanism is now testable because
basicandenhancedproduce different outcomes. - For BBC specifically,
enhancedis not an upgrade. It’s a different fingerprint/path, and it’s trippingdocument_antibot. - Your fallback rule should probably be basic first, enhanced only on specific failure classes, not “enhanced is always better.”
For your agent instruction, I’d make it explicit and conservative:
When using Firecrawl MCP:
1. First call firecrawl_scrape with proxy "basic".
2. If the result is empty, incomplete, HTTP 403/429/5xx, timeout, or a clear blocking/antibot error, retry once with proxy "enhanced".
3. If enhanced fails with document_antibot but basic succeeded, keep the basic result and do not keep retrying.
4. In the final answer, report which proxy modes were tried and the status/error for each.
For logging, the thing to preserve is the exact MCP args and result status/error. The summary you pasted is good, but for future debugging you want a line like:
firecrawl_scrape basic -> status 200, credits 1
firecrawl_scrape enhanced -> document_antibot, retry limit exceeded
That gives you enough evidence to tell whether OpenClaw retried, whether the MCP passed the proxy mode, and whether Firecrawl or the target site rejected it.
These are by firecrawl rules, please update
If Firecrawl is blocked by a site, access is denied, or the scrape/search returns a bot-protection error:
- retry the same Firecrawl request with enhanced/stealth proxy enabled, or
proxy: "auto"if the tool supports proxy mode - keep
storeInCache: trueif available - only continue if the retry returns usable, cited results
I’d update the rule slightly so it matches what you just observed and avoids throwing away a good basic result:
Firecrawl retry rule:
- Start with the normal/basic Firecrawl request unless the user explicitly asks for enhanced/stealth.
- If Firecrawl is blocked by a site, access is denied, or the scrape/search returns a bot-protection/antibot error, retry the same Firecrawl request once with enhanced/stealth proxy enabled, or proxy: "auto" if that is the best supported proxy mode for the tool.
- Keep storeInCache: true if available.
- If the retry returns usable, cited results, use the retry result.
- If basic returned usable results and enhanced/stealth fails, keep the basic result and report that enhanced/stealth failed.
- If neither attempt returns usable, cited results, stop and report both failure modes instead of retry-looping.
The extra bit matters because BBC showed basic can work while enhanced gets document_antibot. Enhanced is a fallback, not automatically superior.
I also have searxng as part of the firecrawl docker, so will firecrawl search also use make use of proxies, is there any value in that?
Short version: Firecrawl search and Firecrawl scrape are different layers.
In the Firecrawl MCP package, firecrawl_search accepts scrapeOptions, and those scrape options can include proxy. That means proxy mode can apply when Firecrawl is also scraping the search results, for example:
{
"query": "site:example.com thing to find",
"limit": 5,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true,
"proxy": "auto"
}
}
But that is not necessarily the same as “the SearXNG search query itself uses enhanced proxies.” If your self-hosted Firecrawl is using SearXNG internally, whether SearXNG’s outbound requests use proxies is controlled by the Firecrawl/SearXNG container setup, not by the MCP proxy argument unless your modified Firecrawl layer explicitly wires that through.
Is there value? Some, but only in specific cases:
- For normal search discovery: probably low value. SearXNG already exists to query/search across engines, and proxying every search can add latency, instability, and weird regional results.
- For search + scraped result content: yes,
scrapeOptions.proxycan help if the result pages block content extraction. - For search engines blocking SearXNG itself: maybe, but that should be fixed in SearXNG/outbound network config, not by Firecrawl scrape proxy settings.
My recommendation:
- Use
firecrawl_searchwithoutscrapeOptionsfor discovery. - Then call
firecrawl_scrapeon the chosen URLs withproxy: "basic". - Retry scrape with
proxy: "enhanced"/"stealth"/"auto"only on block/antibot/403/429/empty-content failures. - Only use
firecrawl_search.scrapeOptions.proxywhen you intentionally want Firecrawl to scrape returned results as part of the search call.
That keeps search fast and makes proxy usage observable instead of hidden inside a combined search+scrape call.
So no need to make any changes then?
Probably no Firecrawl/SearXNG config change for search, no.
The one change I would make is to the agent rule, not the Docker stack:
Use Firecrawl search for discovery without scrapeOptions by default.
For selected result URLs, use firecrawl_scrape with proxy "basic".
If scrape is blocked/403/429/antibot/empty, retry once with proxy "enhanced" or "auto".
Report which proxy modes were tried.
And I’d remove or ignore FIRECRAWL_SCRAPE_OPTIONS in the MCP env unless your custom MCP definitely reads it. Put proxy, formats, onlyMainContent, and storeInCache in the actual tool args instead. That makes the behavior visible in logs and avoids thinking a default applied when it didn’t.
So: no change needed for SearXNG proxies unless you see SearXNG itself getting blocked. Keep the proxy fallback on the scrape step.
so that is in addition to the retry rule
Yes. Think of them as two rules:
Search/discovery rule
Use firecrawl_search for discovery without scrapeOptions by default.
Then scrape only the selected result URLs.
Scrape retry rule
For firecrawl_scrape, start with proxy "basic".
If blocked/access denied/403/429/antibot/empty or incomplete content, retry the same scrape once with proxy "enhanced", "stealth", or "auto" depending on what the tool supports.
Keep storeInCache: true if available.
Use the retry result only if it returns usable, cited content.
If basic worked and enhanced fails, keep the basic result and report that enhanced failed.
Do not retry-loop.
So yes: the search rule sits before the retry rule. It keeps broad discovery cheap and clean; the retry rule only kicks in once you’re scraping actual pages.