#parallel Login Scraping

1 messages · Page 1 of 1 (latest)

zealous barn
#

Hello, I want to make a scaled scraper that would scrape data from the site after logging in, and I want to run multiple instances, and such that each instance looks to have scraped from a unique device/location given proxies. Can you help me in visualizing the high-level overview of the project on how should I go solve this problem?

#

The site is dynamic and requires solving px-captchas, and when logging in, it requires some paramTokens, which seems like jwt. I just need to get cookies after logging into the site. Any suggestion if this process can be broken down into one-request solutions and not have to emulate human interactions to get to that endpoint?

#

||<@&999283328437997639>||

bronze elk
#

Hi there, there’s a few different ways you can tackle this - but it comes down to the site’s constraints.

The first thing to figure out in order to determine how to proceed is what needs to remain the same for each of your sessions? Proxy, cookies, fingerprint, etc etc.

It’s possible that if you can keep track of cookies / proxies that you can load them into the session and do a “one-request solution” after the initial login. But, it depends on the site.

zealous barn
#

yes I tried that it gets to some iterations and then get blocked

#

and have to do it again

bronze elk
#

Typically though, the answer is yes this is very doable and is how most sites work. If you can keep track of the right session data you should be able to use the api in the same way the website does

#

What indicates a block in this case?

zealous barn
#

probably auth token gets changed?

reef ospreyBOT
#

@zealous barn just advanced to level 1! Thanks for your contributions! 🎉

zealous barn
bronze elk
#

Well it sounds like you need to figure out what is causing the block first

zealous barn
#

like 100-500 at the instances at least

zealous barn
bronze elk
#

Try to analyze the site on chrome dev tools

zealous barn
bronze elk
#

Are there any network requests that may refresh the auth token / cookies?