#mixed headful and headless in a PlaywrightCrawler

1 messages · Page 1 of 1 (latest)

peak osprey
#

I want to check content of some requests in headful mode, approve it then let crawler scrap it in headless mode.
I've tried @crawlee/browser-pool but it doesn't seem to have autoscaledPool.

muted tundra
#

Hi @peak osprey I am no sure if I understand. First of all how would you like to "confirm it"? You may want to run two crawlers - one with headfull mode and the second one in headless mode. In the first crawler you may set a countrer for requests being done and abort it once these requests are proceeded.


router.addHandler('detail', async ({ request, page, log, crawler }) => {
    const title = await page.title();

    console.log(i);
    if (i++ > 1) {
        await crawler.autoscaledPool.abort();
        // crawler.headless = true;
    }
    log.info(`${title}`, { url: request.loadedUrl });

    await Dataset.pushData({
        url: request.loadedUrl,
        title,
    });
});

Then the second crawler starts up and continues in the headlesss mode.

tidal bison
#

other way is use two request queues

muted tundra