#How to avoid requesting some static resources?

1 messages · Page 1 of 1 (latest)

marble creek
#

When crawling with Playwright or Puppeteer, a lot of static assets (eg js, css, png, jpg) are loaded.

Is it possible to only request static resources for the first time, and use the last cached data for the next crawling without making a request.

stable quiverBOT
#

@marble creek just advanced to level 1! Thanks for your contributions! 🎉

upbeat talon
#

You can create an array of resourceTypes that you'd like to block.

Example for Playwright:
const BLOCKED = ['image', 'stylesheet', 'media', 'font','other'];

Then within your preNavigationHooks of your crawler, add this function:

async ({ page }) => {
    await page.route('**/*', (route) => {
        if (BLOCKED.includes(route.request().resourceType())) return route.abort();
        return route.continue()
    });
};

Or you can try to use Crawlee util functions (also in preNavigationHooks option):
https://crawlee.dev/api/3.0/playwright-crawler/namespace/playwrightUtils#blockRequests
https://crawlee.dev/api/3.0/puppeteer-crawler/namespace/puppeteerUtils#blockRequests

marble creek
#

@upbeat talon thanks for help ❤️。Is it possible to only request static resources for the first time, and use the last cached data for the next response without making a request?