#Hey ,why do i get web scrapping of first url , since i have another url .

1 messages · Page 1 of 1 (latest)

snow rose
#

I am implemented Playwright crawler to parse the url , I made a single request to crawler with first url, since the request has been processing , meanwhile , i passed anotther url in craler and hit the request, While processing, through crawler, it is processing content from first url , instead of second url both times. Can be please help?

async def run_crawler(url, domain_name, save_path=None):
print("doc url inside crawler file====================================>", url)
crawler = PlaywrightCrawler(
max_requests_per_crawl=10,
browser_type='firefox',
)

@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
    context.log.info(f'Processing {url} ...')

    links = await context.page.evaluate(f'''() => {{
                return Array.from(document.querySelectorAll('a[href*="{domain_name}"]'))
                    .map(a => a.href);
            }}''')

    await context.enqueue_links(urls=links)

    elements = await context.page.evaluate(PW_SCRAPING_CODE)

    data = {
        'url': url,
        'title': await context.page.title(),
        'content': elements
    }
    print("datat =================>", data)

    await context.push_data(data)

await crawler.run([url])

i am calling the craler using

turbid ibex
#

Someone will reply to you shortly. In the meantime, this might help:

ebon pollen
#

Hi, could you please try to rephrase your question? I don't understand what the problem is.
If you create a new crawler for each URL, each with a different request queue/list, they won't share requests, so they might process the same URL.