#Skip JS rendering and get raw content

10 messages · Page 1 of 1 (latest)

ruby sandal
#

Hi, it seems like Firecrawl cloud always runs JS rendering. Is there a way to skip/disable it per request?

Also, if a XML/RSS document URL is requested and formats is set to only [ "rawHtml" ] then the response content is enclosed in HTML instead of raw XML. For example:

{
  "rawHtml": "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><meta name=\"color-scheme\" content=\"light dark\"></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n</pre></body></html>"
}

Is there a way to return raw XML instead of HTML, and if not what's the best way to extract the XML (decoded) from the returned HTML?

Thank you.

sick crag
onyx solstice
ruby sandal
pulsar peakBOT
#

If you want Firecrawl to do a fresh scrape, just pass maxAge=0

#

Our cache is global but it doesn't have anything to do with fastMode

barren drift
#

Hi @ruby sandal! Firecrawl uses rendering to handle dynamic sites, and there isn’t a documented option to disable rendering. For a fully static site, you can just fetch the page yourself with a simple HTTP request.

If the XML is wrapped inside HTML, parse the HTML first and extract the XML from the right element. In Node.js, Cheerio is a good choice for that.

ruby sandal
# onyx solstice Hey! I also noticed another way to do the same. By setting waitFor parameter to ...

Thanks, looks like the default value of waitFor is 0 (see https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-wait-for) so it won't wait anyway which is fine, but if default wait time is zero then why does the website say "Firecrawl intelligently waits for content to load" (see https://www.firecrawl.dev/#:~:text=Firecrawl intelligently waits for content to load). Ideally it should wait for an event like DOM content loaded and network-idle (like in Puppeteer). Any clarification on this?

Firecrawl - The Web Data API for AI

The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ready to reason with.

Firecrawl Docs
pulsar peakBOT
#

Oh, waitFor just adds extra wait time. We already have a "smart wait" feature on our side.

ruby sandal
#

Thanks for clarifying. Maybe the docs can mention that waitFor adds extra wait time on top of "smart wait" time.