#firecrawl_search fails with "Firecrawl baseUrl must use https" with self-hosted Firecrawl server

1 messages · Page 1 of 1 (latest)

plucky chasm
#

I feel like I've heard of people using a self-hosted Firecrawl server, or other search services, with OpenClaw. But im running into issues when configuring mine. Note, I am using OpenClaw from within a NemoClaw sandbox, this is why I am having to use the host.openshell.internal URL.

OpenClaw 2026.4.8 within a NemoClaw v0.0.9 sandbox

firecrawl_search fails with Firecrawl baseUrl must use https error with "baseUrl": "http://host.openshell.internal:3002/v1"

Im trying to get openclaw to use my self-hosted Firecrawl server for web searches, but it is failing regardless of whether I use http or https

  1. Firecrawl service up and running on same host machine
  2. configure web search plugin to use local firecrawl service over http (http://host.openshell.internal:3002/v1)
  "tools": {
    "web": {
      "search": {
        "provider": "firecrawl"
      },
      "fetch": {
        "firecrawl": {
          "apiKey": "dummy",
          "baseUrl": "http://host.openshell.internal:3002/v1",
          "onlyMainContent": true,
          "maxAgeMs": 86400000,
          "timeoutSeconds": 60
        }
      }
    }
  },
  "plugins": {
    "entries": {
      "firecrawl": {
        "enabled": true,
        "config": {
          "webSearch": {
            "apiKey": "dummy",
            "baseUrl": "http://host.openshell.internal:3002/v1"
          }
        }
      }
    }
  }
  1. ask openclaw to do a basic websearch, fails with baseUrl must use https error
  2. try again with firecrawl service behind https reverse proxy (https://host.openshell.internal:444/v1), fails this time with Firecrawl baseUrl host is not allowed: host.openshell.internal error
stoic juncoBOT
#

You’re not misconfiguring it, current OpenClaw is blocking that on purpose.

In 2026.4.8, the bundled Firecrawl integration does not support self-hosted Firecrawl endpoints:

  • it requires https
  • it also hard-allowlists only api.firecrawl.dev

So these two errors line up exactly with the current code:

  • http://host.openshell.internal:3002/v1 → Firecrawl baseUrl must use https
  • https://host.openshell.internal:444/v1 → Firecrawl baseUrl host is not allowed: host.openshell.internal

The docs now say the same thing too:

Relevant bit: Firecrawl baseUrl overrides must stay on https://api.firecrawl.dev, and scrape/search base URLs are restricted to that host.

#

So, short version:

  1. Hosted Firecrawl works
  2. Self-hosted Firecrawl does not work with the built-in Firecrawl provider right now
  3. A reverse proxy does not help unless the hostname is still api.firecrawl.dev, because hostname validation happens before the request is sent

If you want a self-hosted search provider today, the supported option is SearXNG:

If you need self-hosted Firecrawl specifically, that’s a GitHub issue / feature request:

If you want, I can help you convert your current config to a SearXNG setup for NemoClaw.

plucky chasm
#

@stoic junco I will try out the SearXNG option for search. But I was hoping to use firecrawl for its scraping functionality, so for example, after performing a web search with searxng Openclaw would be able to get each pages' content in markdown format from my self hosted firecrawl instance. What would be the best way to accomplish this?

stoic juncoBOT
#

Best path today:

  • Search and fetch are separate knobs, so yes, you can do SearXNG for search and still use web_fetch on the result URLs.
  • web_fetch already returns markdown by default, so if your goal is just “search result → page content in markdown”, you may not need Firecrawl for many sites at all.

The catch is this part:

  • Built-in Firecrawl fetch/scrape is also locked to https://api.firecrawl.dev
  • so SearXNG + self-hosted Firecrawl is not possible via config only in 2026.4.8

So your options are basically:

  1. Simple / supported
    • web_search = searxng
    • web_fetch = default Readability/basic fetch
    • good for normal pages, markdown works
#
  1. Supported mixed setup

  2. Fully self-hosted, cleanest real solution

So if you specifically want self-hosted Firecrawl markdown extraction, the honest answer is: you need a custom plugin or a code patch right now.

If you want, I can sketch the minimal plugin shape for:

  • SearXNG for search
  • local Firecrawl for fetch/scrape
plucky chasm
#

i posted the inability to use a locally hosted firecrawl instance as an issue and it looks like it is being addressed: https://github.com/openclaw/openclaw/pull/63941

@stoic junco for the searxng configuration, can openclaw be configured use http for my local searxng server? or will it also complain and require https like it did for firecrawl?

stoic juncoBOT
#

Yep, SearXNG can use http:// for a local/private instance.

Unlike Firecrawl, it does not hardcode a single allowed host. The rule is:

  • https:// is always fine
  • http:// is allowed only if the host resolves to a trusted private or loopback address

So for something like:

{
  tools: {
    web: {
      search: {
        provider: "searxng",
      },
#
    },
  },
  plugins: {
    entries: {
      searxng: {
        config: {
          webSearch: {
            baseUrl: "http://host.openshell.internal:8080",
          },
        },
      },
    },
  },
}
#

that should work if host.openshell.internal resolves inside NemoClaw to a private/loopback IP.

Docs: https://docs.openclaw.ai/tools/searxng-search

If it fails, the error should be more like:

SearXNG HTTP base URL must target a trusted private or loopback host

So the quick answer is:

  • Firecrawl: no, currently locked down
  • SearXNG: yes, http:// is supported for local/private hosts

Also, nice, glad there’s already a PR moving on the Firecrawl bit 🙂

wheat shadow
#

Self-hosted Firecrawl has it been fixed to work with openclaw yet?

stoic juncoBOT
#

Not yet, at least not in a released build.

I checked current main, and it still has the Firecrawl restriction hardcoded, so:

  • 2026.4.8: definitely no
  • current main right now: also still no

That PR exists, but it looks open, not merged yet:

So the practical answer is:

  • if you need it today, you’d need a custom/patched build
  • if you want the normal path, you’ll need to wait for that PR (or equivalent fix) to land and then update to a newer release

Once it merges, I’d expect this thread’s workaround advice to change.

wheat shadow
#

How about now? the github PR jsut closed

stoic juncoBOT
#

Still not yet, unfortunately.

I checked GitHub main directly just now, and it still has the old restriction:

  • ALLOWED_FIRECRAWL_HOSTS = new Set(["api.firecrawl.dev"])
  • docs still say baseUrl must stay on https://api.firecrawl.dev

So even though that PR is now closed, the change is not on main right now and it’s not in the released docs/config behavior yet.

So the status right now is:

  • released OpenClaw: no
  • current main: also no

That usually means the PR was either:

  • closed without merge, or
  • replaced by a different follow-up that hasn’t landed yet