#Getting a different page when scrapping vs. loading the link in my browser

14 messages · Page 1 of 1 (latest)

short scarab
#

It seems like when I am scraping a link with Firecrawl (for this domain anyway) I get a relatively simple (probably SSR'd page for bots/scrapers) with some of the information on it but when I load it myself I get the full page.

Any way I can configure Firecrawl so it will be served the full page containing all the information I need on it?

The link in question.

Attached is the HTML I get back, which is very short and definitely not what I am seeing in my browser which leads me to believe it was served a page meant for scrapers with only the important info, likely for social sharing etc.

Thanks in advance!

ember lilyBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

quick cradleBOT
short scarab
quick cradleBOT
#

Can you share your full request parameters?

short scarab
#

Yup! On it

#
            const options: ScrapeOptions = {
                formats: [
                    {
                        type: 'json',
                        schema: scrapeSchema,
                        prompt,
                    },
                    'html',
                ],
                actions: [
                    {
                        type: 'wait',
                        selector: 'div.header__content',
                    },
                ],
                waitFor: 2_500,
            };

firecrawl.scrape(scrapeLink, options)
#

oh sorry but with proxy:stealth

#

i had it removed to try something else.

#
            const options: ScrapeOptions = {
                formats: [
                    {
                        type: 'json',
                        schema: scrapeSchema,
                        prompt,
                    },
                    'html',
                ],
                actions: [
                    {
                        type: 'wait',
                        selector: 'div.header__content',
                    },
                ],
                waitFor: 2_500,
                proxy: 'stealth',
            };
short scarab
#

Hey @mild wren sorry to bug but any ideas where its saying the engine doesn't support it?

mild wren
#

When scraping, we try a bunch of different scraping engines, and not all of them support all parameters. You can safely ignore the warning

short scarab
#

Ok thanks, any idea of how I can scrape the real page and not what seems to be a simple page rendered with server side rendering for bots?

mild wren
#

If stealth isn't working and you have a high volume of scrapes for this site, send us an email at [email protected] and we could look at getting it working