#Add certificates to Playwright crawler using Chromium

1 messages · Page 1 of 1 (latest)

brittle jewel
#

hey folks, we are trying to integrate a proxy into our crawlers and the issue is the proxy needs certificate to be present before it'll allow us to authenticate, I couldnt find any option for this in the documentation.

Is there a way I can add those certs in crawlee/playwright? or if crawlee exposes agentOptions from Playwright anywhere (couldn't find it in the docs), that'll also work as per https://github.com/microsoft/playwright/issues/1799#issuecomment-959011162

GitHub

Similarly to puppeteer/puppeteer#540 Currently when navigating to a page that requires client certificates and client certificates are available a popup is shown in Firefox and Chrome which asks to...

#

P.S. I have added that certificate on my server and curl is working fine but crawlee is not

#

so I'm assuming crawlee is not picking it up

#

and the error I'm getting is page.goto: net::ERR_PROXY_CONNECTION_FAILED at

ornate pulsar
#

Hi @brittle jewel ,
Crawlee is using Playwright under the hood, so you should be able to intercept request in usual way. There I found an example for Playwright itself ( https://github.com/microsoft/playwright/issues/1799#issuecomment-959011162 ).

Can you do minimal working example using only Playwright (witohut Crawlee) to confirm that the issue is in Crawlee and not in Playwright itself? - I found a lot of issues regarding using certificated in Playwright.

GitHub

Similarly to puppeteer/puppeteer#540 Currently when navigating to a page that requires client certificates and client certificates are available a popup is shown in Firefox and Chrome which asks to...

brittle jewel
#

but with crawlee its not working

#

I think I got it wrong, I dont need to use the proxy's certificate with playwright/crawlee

#

its just a proxy config issue

#

page.goto: net::ERR_PROXY_CONNECTION_FAILED here's the full error

#

we bypassed it by avoiding the cert route and its working fine for us

ornate pulsar
#

Does the same proxy configuration works for other websites?

brittle jewel
#

it worked with crawlee once we full onboarded with the proxy provider and we didnt need to use their cert

ornate pulsar
#

@brittle jewel Can you please provide code snippet with your current configration for Crawlee?

brittle jewel
# ornate pulsar <@393773250229829632> Can you please provide code snippet with your current conf...
chromium.use(stealthPlugin());
    let queue = await RequestQueue.open('crawler');
    await queue.drop();
    queue = await RequestQueue.open('crawler');
    const startUrls = [`url`];
    const router = await initRouter({ resume, numPages, initialPage });
    const crawler = new PlaywrightCrawler({
        requestHandler: router,
        maxRequestsPerMinute: 100,
        log: new Log({
            logger: new CrawlerLogger(log.getOptions(), 'CRAWLER_1'), // please ignore, custom logger imp
            level: log.LEVELS.DEBUG,
        }),
        requestQueue: queue,
        launchContext: {
            launcher: chromium,
            launchOptions: {
                args: ['--ignore-certificate-errors'],
            },
        },
        ...(useProxy && {
            proxyConfiguration: new ProxyConfiguration({
                proxyUrls: [
                    proxy,
                ],
            }),
            useSessionPool: true,
            persistCookiesPerSession: true,
        }),
        
    });
    await crawler.run(startUrls, {});
ornate pulsar
#

@brittle jewel Thank you for your feedback, I am currently investigating this with the Crawlee developer team.

Would it be possible to also provide us with the pure Playwright solution code, that is currently working for you? Is the certificate taken from system or are you importing it on application level?

fervent finchBOT
#

@brittle jewel just advanced to level 7! Thanks for your contributions! 🎉

brittle jewel
#

I was trying to figure out how to do it on an app level but couldnt make it work

#

but in the end system level worked fine

#

here's the pure playwright code

#
const browser = await chromium.launch(
    {
        proxy:{
            server:"proxy_url",
            username:"username",
            password:"pwd"
        },
        args: ['--ignore-certificate-errors'],
    }
)

const page = await browser.newPage()

await page.goto('https://google.com')
const html = await page.innerHTML('body')
console.log(html)
#

I think it was an issue on our end, because after full acc activation with the proxy provider, it worked just fine, only issues we are currently facing is that a lot of our requests are failing with the proxy but thats unrelated to this is probably a config issue

ornate pulsar
#

@brittle jewel You should be able to replicate this event in Crawlee:

const crawler = new PlaywrightCrawler({
    // ... ,
    launchContext: {
      launchOptions: {
        proxy: {
          'server': 'http://proxy_url',
          'username': 'username',
          'password': 'password'
        },
        args: ['--ignore-certificate-errors'],
      }
    }
  });

and drop the proxyConfiguration attributte.

And please let me know if it helped 🙂

brittle jewel