#Firecrawl and openclaw

1 messages · Page 1 of 1 (latest)

queen sphinx
#

I have firecrawl running on a local machine called beelink and I can see openclaw sometimes asked it to run fircrawl agent, what is that?

wraith wharfBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

hallow yewBOT
# queen sphinx I have firecrawl running on a local machine called beelink and I can see opencla...

That’s likely not a separate OpenClaw “agent” in the personal-assistant sense. Firecrawl is a bundled web extraction/search provider that OpenClaw can call when an agent uses web tools.

OpenClaw can use Firecrawl in three places:

  • web_search provider
  • explicit tools: firecrawl_search / firecrawl_scrape
  • web_fetch fallback when local Readability extraction isn’t enough

So if you see traffic from OpenClaw to Firecrawl on your Beelink, it usually means some session asked to fetch/search/scrape a page and Firecrawl was selected or auto-detected from your web config / FIRECRAWL_API_KEY.

Docs:
https://docs.openclaw.ai/tools/firecrawl
https://docs.openclaw.ai/tools/web-fetch

One nuance: current docs say Firecrawl baseUrl overrides are restricted to the official https://api.firecrawl.dev host, so if you’re self-hosting Firecrawl locally and OpenClaw is actually calling that local instance, I’d want your openclaw --version and the relevant web config before saying exactly which code path you’re on.

queen sphinx
#

Yes I have firecrawl set up as an MCP, it works really well for scrapes, but I;m not sure that firecrawl agent is working

hallow yewBOT
# queen sphinx Yes I have firecrawl set up as an MCP, it works really well for scrapes, but I;m...

If scrapes work through the MCP, the MCP server itself is probably reachable. The “Firecrawl agent” bit is likely one of two things:

  • OpenClaw native Firecrawl plugin tools: firecrawl_search / firecrawl_scrape
  • A tool exposed by your Firecrawl MCP server, with whatever name that MCP package gives it

OpenClaw’s docs don’t describe a separate built-in “Firecrawl agent” as its own agent. Native Firecrawl support is web search, scrape, and web_fetch fallback:
https://docs.openclaw.ai/tools/firecrawl

For troubleshooting, first thing: what does openclaw --version show?

Then I’d check:

  • run /mcp show firecrawl if your install has commands.mcp enabled, or inspect your mcp.servers.firecrawl config
  • check the gateway logs around the failed call for the exact MCP tool name and error
  • make sure your tool allowlist/profile is not hiding bundle MCP tools; docs mention bundle-mcp can be denied/allowed as a plugin key: https://docs.openclaw.ai/plugins/bundles

Don’t paste API keys, but the server name, transport shape, OpenClaw version, and the exact log line would tell us whether this is OpenClaw not exposing the MCP tool, or the Firecrawl MCP tool failing after it’s called.

queen sphinx
#

firecrawl is running within docker, does thata usually contain the agent service?

hallow yewBOT
# queen sphinx firecrawl is running within docker, does thata usually contain the agent service...

Usually, no. The Docker self-host setup gives you the Firecrawl API/worker/browser-scraping pieces, but Firecrawl’s own docs say self-hosting does not support all cloud endpoints; specifically /agent and /browser are listed as not supported for self-hosting.

So if your MCP scrape tools work but anything named “agent” does not, that fits the expected Firecrawl self-host limitation rather than an OpenClaw problem.

Firecrawl self-host docs:
https://docs.firecrawl.dev/contributing/self-host

For OpenClaw, the reliable path with self-hosted Firecrawl is the scrape/search-style MCP tools you already have working. If a model tries to call a Firecrawl “agent” tool, it may be seeing a tool exposed by the MCP package but the backing Firecrawl Docker API can’t actually fulfill that endpoint.

queen sphinx
#

Is it possible to add a similar agent service, say using LangChain, CrewAI, or Autogen?

hallow yewBOT
# queen sphinx Is it possible to add a similar agent service, say using LangChain, CrewAI, or A...

Yes, but I’d frame it as “add a tool/service OpenClaw can call,” not “bolt another autonomous agent into OpenClaw’s core.”

Practical options:

  • MCP wrapper: best fit. Run your LangChain/CrewAI/AutoGen service separately and expose a small MCP server with tools like research_site, crawl_and_summarize, compare_pages, etc. OpenClaw can then call those tools from sessions.
  • HTTP API + plugin/tool: if you want tighter OpenClaw integration, expose your service over HTTP and write/register an OpenClaw plugin tool around it.
  • External ACP-style agent: possible for heavier agent runtimes, but that’s more moving parts and usually overkill unless you need full bidirectional agent sessions.

I’d avoid exposing a generic “run arbitrary agent task” tool at first. Give it narrow, typed tools with bounded inputs/outputs, otherwise you get slow nested-agent behavior and weird failure modes. For Firecrawl-like use, a focused MCP tool that does “scrape + extract + maybe summarize/structure” is the sane version.

Relevant OpenClaw docs:
https://docs.openclaw.ai/plugins/bundles
https://docs.openclaw.ai/tools/slash-commands#mcp-updates
https://docs.openclaw.ai/tools/acp-agents

And version still matters here: have them check openclaw --version before copying configs from docs or examples.

queen sphinx
#

Would it be possible to add it in to the firewall container instead so that the MCP server thinks it is the firewall agent?

hallow yewBOT
# queen sphinx Would it be possible to add it in to the firewall container instead so that the ...

Assuming “firewall” = “Firecrawl”: I wouldn’t put it inside the Firecrawl container or try to make the MCP server “think” it’s the Firecrawl agent.

Technically you could fork/extend the Docker image, add routes, or run a reverse proxy that answers /agent-style calls. But that’s brittle: you’d be pretending to implement a Firecrawl cloud endpoint that self-hosted Firecrawl explicitly doesn’t support, and updates may break it.

Cleaner pattern:

  • keep Firecrawl as-is for scrape/search
  • run your LangChain/CrewAI/AutoGen service as a sidecar container
  • expose it as its own MCP server/tool, or put a tiny MCP facade in front of both Firecrawl + your agent service

So instead of “Firecrawl agent,” make tools like:

  • firecrawl_scrape
  • research_with_agent
  • crawl_then_extract_entities
  • crawl_then_answer

That gives OpenClaw a real tool contract without spoofing Firecrawl internals. If you want one endpoint for convenience, use Docker Compose networking and an MCP gateway/facade container that calls Firecrawl at http://firecrawl:3002 and your agent at http://agent-service:....

queen sphinx
#

Which one of LangChain/CrewAI/AutoGen is better?

hallow yewBOT
# queen sphinx Which one of LangChain/CrewAI/AutoGen is better?

For this use case, I’d pick LangChain/LangGraph first.

Short version:

  • LangChain / LangGraph: best default if you want a reliable service behind MCP. More plumbing, but better control over state, retries, tool calls, structured outputs, and production-ish workflows.
  • CrewAI: nice for quick “team of agents” demos and role-based workflows. I’d use it when the workflow is naturally a few named roles, but I’m less excited about it as a durable backend service.
  • AutoGen: strongest if you specifically want multi-agent conversation/research loops. Powerful, but easier to overbuild and harder to bound cleanly behind a simple MCP tool.

My recommendation: build a small LangGraph service as a sidecar, expose 2-4 narrow MCP tools, and keep Firecrawl doing the crawling. Example tools: crawl_then_summarize, crawl_then_extract_schema, research_topic.

The important bit is not the framework, it’s keeping the MCP contract boring and bounded. OpenClaw should call a clear tool and get a clear result, not supervise a whole second unbounded agent swarm.

queen sphinx
#

Does langchain work with proxies?

hallow yewBOT
# queen sphinx Does langchain work with proxies?

Yes. LangChain can work with proxies, but the proxy is usually configured at the underlying client/tool layer, not as one universal LangChain switch.

Common cases:

  • LLM/API calls: set standard env vars like HTTP_PROXY, HTTPS_PROXY, and NO_PROXY, or configure the provider’s SDK/client if it supports explicit proxy settings.
  • Web scraping/fetch tools: configure the scraper/browser/HTTP client directly. For Python this is often requests, httpx, Playwright, Selenium, Firecrawl, etc.
  • Firecrawl specifically: Firecrawl has its own proxy behavior; OpenClaw’s native Firecrawl path uses Firecrawl’s proxy: "auto" behavior for scrape/fetch-style requests.

So yes, a LangGraph/LangChain sidecar can sit behind a corporate proxy, use outbound proxies for model APIs, or call Firecrawl through your Docker network. I’d keep proxy config explicit in the sidecar container env, for example HTTPS_PROXY=... and NO_PROXY=firecrawl,localhost,127.0.0.1, so local Docker calls don’t accidentally route out through the proxy.

queen sphinx
#

is there any overlap between firecrawl and langchange?

hallow yewBOT
# queen sphinx is there any overlap between firecrawl and langchange?

A little overlap, but they sit at different layers.

Firecrawl is mainly a web data/extraction service:

  • scrape pages
  • search/crawl sites
  • handle JS-heavy or bot-hostile pages
  • return markdown/text/structured page content

LangChain/LangGraph is mainly an orchestration framework:

  • call LLMs
  • route between tools
  • keep workflow state
  • retry/fallback
  • turn scraped content into summaries, answers, structured JSON, reports, etc.

The overlap is “web research”: LangChain can use loaders/tools to fetch webpages, and Firecrawl has higher-level extraction/search features. But I’d still use them together rather than choose one to replace the other:

hallow yewBOT
queen sphinx
#

I have an openclaw agent that searches job boards for leads, it's working fine just using firecrawl MCP, would there be a benefit of adding langchain too?

hallow yewBOT
# queen sphinx I have an openclaw agent that searches job boards for leads, it's working fine j...

If it’s already working well with Firecrawl MCP, I probably would not add LangChain yet.

LangChain/LangGraph helps when you need workflow structure that OpenClaw + Firecrawl alone is not handling cleanly, for example:

  • deduping leads across runs with more explicit state
  • scoring/ranking leads with a repeatable pipeline
  • extracting a strict schema from messy job posts
  • multi-step enrichment, like job post -> company site -> LinkedIn/about page -> fit score
  • retries/fallbacks when one board blocks or returns weak data
  • producing a normalized output for a CRM/database

If the current OpenClaw agent is: search job boards -> scrape postings -> summarize/filter leads, Firecrawl MCP is enough. Adding LangChain now would mostly add another service to run, debug, and secure.

My rule of thumb: wait until you can name the pain. If the pain is “the workflow has too many conditional steps/state/retries,” add LangGraph. If the pain is “I need better scraping,” keep improving Firecrawl/proxies/selectors/prompts first.

queen sphinx
#

Would it help me get past job boards that are blocking?

hallow yewBOT
# queen sphinx Would it help me get past job boards that are blocking?

Not really. LangChain won’t magically get past blocks; it only orchestrates steps. The blocking happens at the scraping/browser/network layer.

For job boards, I’d be careful here: if a site is intentionally blocking scraping or its terms disallow it, don’t try to bypass that. Better options:

  • use official APIs, RSS feeds, job board export/search alerts, or partner feeds where available
  • reduce request rate, cache aggressively, and avoid repeated scraping of the same pages
  • target company career pages directly when they permit crawling
  • use Firecrawl/browser tooling for pages that are just JS-heavy or flaky, not for evading access controls
  • keep a denylist for boards that block or prohibit automated access

Firecrawl can help with normal extraction pain, but LangChain mostly helps after content is available: dedupe, rank, enrich, and structure leads. If “blocked by the board” is the main issue, adding LangChain is the wrong lever.

queen sphinx
#

ok, though for some reason the agent decided to use the firecrawl agent service, I'm just trying to understand why and see whether there is an equivalent I can run in docker

hallow yewBOT
# queen sphinx ok, though for some reason the agent decided to use the firecrawl agent service,...

That makes sense. The likely reason is: the model saw an available MCP tool that looked semantically useful, probably something like an “agent”/deep-research/browser-style Firecrawl tool, so it chose it. OpenClaw doesn’t necessarily know that your self-hosted Firecrawl backend can’t satisfy that endpoint; it just sees the tool schema exposed by the MCP server.

So the fix is usually on the MCP/tool exposure side:

  • if the Firecrawl MCP server lets you choose tools, disable/hide the agent/browser tool
  • expose only the tools your Docker Firecrawl actually supports, like scrape/search/crawl/map/extract depending on your MCP package
  • if needed, put a tiny wrapper MCP server in front of Firecrawl that only advertises the supported tools
  • add agent instructions like “use Firecrawl scrape/search only; do not use Firecrawl agent/browser tools”

As for an equivalent Docker service: yes, you can run a sidecar service, but I’d name it separately rather than pretend it is Firecrawl’s cloud /agent endpoint. A LangGraph sidecar behind MCP is the clean version if you want an “agentic research” tool. But it won’t be a drop-in Firecrawl-agent clone unless you implement the same API/tool contract.

The main thing I’d check next is the exact tool list your MCP server is exposing. If you can share the names only, no secrets, we can tell which one is baiting the agent into the unsupported path.

queen sphinx
#

Yes, I can run as a sidecar but first I'm trying to understand if there is an equivalent of the firecrawl agent service

hallow yewBOT
# queen sphinx Yes, I can run as a sidecar but first I'm trying to understand if there is an eq...

Yes — there is now an official-ish equivalent to look at: Firecrawl web-agent.

Important distinction:

  • hosted Firecrawl /agent / FIRE-1 is Firecrawl’s cloud agent capability
  • Firecrawl’s normal self-host Docker docs still say /agent and /browser are not supported in the standard self-hosted Firecrawl API
  • but Firecrawl has released an open-source web-agent stack that is meant to be run/customized yourself

Blog:
https://www.firecrawl.dev/blog/firecrawl-agent-open-source

Repo:
https://github.com/firecrawl/web-agent

Caveat from their own blog: it says this is not a port of the hosted /agent, it’s a lighter foundation, and it still uses Firecrawl’s hosted scrape/search/interact endpoints for web access. So it may not solve “I want everything fully local behind my self-hosted Firecrawl Docker,” but it is the closest equivalent to the Firecrawl agent service.

For your setup, I’d compare two paths:

hallow yewBOT
queen sphinx
#

ok so what does Firecrawl web-agent sidecar add that my firecrawl does not already have

hallow yewBOT
# queen sphinx ok so what does Firecrawl web-agent sidecar add that my firecrawl does not alrea...

Firecrawl itself is the web data layer. It can search, scrape, crawl, map, extract, and sometimes interact with pages depending on endpoint/support.

Firecrawl web-agent adds the agent/workflow layer on top:

  • decides a plan for a research task
  • breaks work into steps or parallel sub-tasks
  • calls Firecrawl search/scrape/interact as tools
  • uses an LLM to reason over intermediate results
  • applies reusable playbooks/skills for a domain
  • returns structured output from a multi-step job

So for your job-board lead agent:

Firecrawl MCP alone:
search job board -> scrape pages -> return content

Firecrawl web-agent sidecar:

hallow yewBOT
# queen sphinx ok so what does Firecrawl web-agent sidecar add that my firecrawl does not alrea...

given a lead-finding goal -> plan searches -> fan out across boards/companies -> scrape/enrich -> dedupe/rank/structure -> return leads

But if OpenClaw is already doing that orchestration well, web-agent may not add much. It mainly helps if you want that whole research loop packaged as one external service/tool, or if you want reusable Firecrawl-specific agent workflows outside OpenClaw too.

One caveat: from Firecrawl’s own blog, web-agent still relies on Firecrawl hosted web endpoints for scrape/search/interact, so it is not simply “extra missing features inside your self-hosted Firecrawl Docker.”

queen sphinx
#

ok, yes, openclaw is already doing that

hallow yewBOT
# queen sphinx ok, yes, openclaw is already doing that

Then I’d keep the stack as-is.

If OpenClaw is already planning, searching, scraping, filtering, and ranking leads, adding Firecrawl web-agent or LangChain would mostly duplicate the orchestration layer. The useful change is narrower: make sure the Firecrawl MCP server only exposes tools your backend supports, so OpenClaw stops choosing the unsupported “agent” path.

So the next practical move is: hide/disable the Firecrawl agent/browser tool from MCP, or add an instruction like “use Firecrawl scrape/search/crawl only; do not use Firecrawl agent.”

queen sphinx
#

how would I hide the agent

hallow yewBOT
# queen sphinx how would I hide the agent

There are two levels.

Best: hide it at the MCP server
If you’re using the official firecrawl-mcp, its README lists the tool as firecrawl_agent / “agent”, alongside firecrawl_scrape, firecrawl_search, etc. I didn’t see a documented env var in the README for “disable only agent”, so if the server itself has no tool filter, the clean fix is a small wrapper/fork that only registers the supported tools.

OpenClaw-side filter
OpenClaw can filter MCP tools by exposed tool name. Bundle MCP tools are named like:

serverName__toolName

So if your MCP server is named firecrawl, the exposed tool is probably:

firecrawl__firecrawl_agent

Then deny that one tool in config, roughly:

hallow yewBOT
# queen sphinx how would I hide the agent
{
  tools: {
    deny: ["firecrawl__firecrawl_agent"]
  }
}

If the server is named firecrawl-mcp, the prefix may be sanitized, so the name may be closer to:

{
  tools: {
    deny: ["firecrawl-mcp__firecrawl_agent"]
  }
}
hallow yewBOT
queen sphinx
#

This is the current config

  "firecrawl": {
    "command": "npx",
    "args": [
      "-y",
      "firecrawl-mcp"
    ],
    "env": {
      "FIRECRAWL_API_URL": "http://beelink:3002",
      "FIRECRAWL_API_KEY": "local-bypass",
      "FIRECRAWL_SCRAPE_OPTIONS": "{\"onlyMainContent\":true,\"formats\":[\"markdown\"],\">
      "FIRECRAWL_TIMEOUT": "90000"
    },
    "enabled": true
  },
hallow yewBOT
# queen sphinx This is the current config "firecrawl": { "command": "npx", ...

That config explains it: your MCP server name is firecrawl, so OpenClaw should expose the MCP tools with a firecrawl__... prefix. The tool to hide is most likely:

firecrawl__firecrawl_agent

Add a tool deny entry at the OpenClaw tool-policy level, not inside the MCP server block:

{
  mcp: {
    servers: {
      firecrawl: {
        command: "npx",
        args: ["-y", "firecrawl-mcp"],
        env: {
          FIRECRAWL_API_URL: "http://beelink:3002",
          FIRECRAWL_API_KEY: "...",
hallow yewBOT
# queen sphinx This is the current config "firecrawl": { "command": "npx", ...
          FIRECRAWL_TIMEOUT: "90000"
        },
        enabled: true
      }
    }
  },
  tools: {
    deny: ["firecrawl__firecrawl_agent"]
  }
}

Two things I notice in the pasted snippet:

  • FIRECRAWL_SCRAPE_OPTIONS looks malformed/truncated: it ends with ,"> and appears to be missing a closing quote/comma. If that’s really in the config, fix that first.
  • Don’t paste real API keys here. If local-bypass is just a local dummy value, fine; if it’s meaningful, rotate it.
hallow yewBOT