#Blocking network requests with crawlee PuppeteerCrawler

1 messages · Page 1 of 1 (latest)

severe brook
#

I'm trying to block network requests from specific domains within PuppeteerCrawler but can't get it to work.

I'd like to run something like this:

                // If the URL doesn't include our keyword, ignore it
                if (req.url().includes('bouncex')) {
                    req.abort();
                    return;
                };
                req.continue();
            });```
But it has to be initiated before page.goto. 

I tried adding it to `preNavigationHooks` like so:
```preNavigationHooks: [
        async ({ page }, goToOptions) => {
            goToOptions!.waitUntil = "networkidle2";
            goToOptions!.timeout = 3600000;
            await blocker.enableBlockingInPage(page);
            page.on('request', (req) => {
                // If the URL doesn't include our keyword, ignore it
                if (req.url().includes('bouncex')) {
                    req.abort();
                    return;
                };
                req.continue();
            });
            await page.setViewport(viewportConfig);
        },
    ],```
But this returns `Error: Request is already handled!`

Is there a way to do this with `PuppeteerCrawler`?
near dock
snow crescent
#

Just be aware that request interception disables cache which makes large crawls much worse performance wise