#userAgent in different crawlers

1 messages · Page 1 of 1 (latest)

lone stirrup
#

How to set the userAgent in different crawlers?

plain wraith
#

In different crawlers? Or in different requests for the same crawler?

lone stirrup
#

like cheerio, puppeteer. the apis are not always the same.

#

for my case i need to differentiate between cheerio ua and puppeteer ua.

reef wharf
#

You should use preNavigationHooks for it:
https://crawlee.dev/api/puppeteer-crawler/interface/PuppeteerCrawlerOptions#preNavigationHooks

example for Cheerio:

preNavigationHooks: [
    (crawlingContext, requestAsBrowserOptions) => {
        requestAsBrowserOptions.headers = {
            'User-Agent': 'La Centrale/6.17.1 (iPhone; iOS 13.6; Scale/2.00)',
            'accept-language': 'en-US;q=1',
            Accept: 'application/json',
        };
    },
],

for Puppeteer you should use page object.

you can try to use setExtraHTTPHeaders() (inside preNavigationHooks too):
https://pptr.dev/next/api/puppeteer.page.setextrahttpheaders

example:

await page.setExtraHTTPHeaders({
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36',
        'upgrade-insecure-requests': '1',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9,en;q=0.8'
    })
green iris
#

or just add it to request object: {url, headers: { 'user-agent': '[UA-STRING]' } }