Web Scraping After Input, Button Click, and Extracting Text | Apify & Crawlee | Page 1

fringe inlet Aug 2, 2023, 9:01 AM

#

Hello @carmine ginkgo ,

I believe you are using PuppeteerCrawler or PlaywrightCrawler

the context.jQuery just allows you to use jQuery selectors for obtaining the data (as the comments says) but not really a modification to the page like setting input (or is this a JsDomCrawler? I don't have many experiences with it...)

There is nothing like Javascript Compiler for validating the code, it just check the general syntax, it even allows you to run code with non defined variable, I suggest you to use TypeScript for better controll over the code.

There is differences for JS in browser and in Node.js - like you don't have the document object accessible in Node.js.

I am not sure if this if valid syntax for the jQuery selectors, don't you have in mind something like:
$('input[type="password"]') and $('input[type="text"]')

If you want to stick with this browser-like implementation you need to wrap it into the

const token = context.page.evaluate(() => {
   document.querySelector('input[type="password"]').value = "abc";
   document.querySelector('input[type="text"]').value = "abc";

   // etc ...
   return     document.querySelector('textarea').textContent;
})

carmine ginkgo Aug 3, 2023, 6:30 PM

#

Hmm still making out errors

fringe inlet Aug 4, 2023, 3:52 PM

#

Sorry I cannot help you more if I don't see run or the errors, you are getting..

trail kilnBOT Aug 4, 2023, 7:27 PM

#

@carmine ginkgo just advanced to level 1! Thanks for your contributions! 🎉

carmine ginkgo Aug 4, 2023, 7:30 PM

#

fringe inlet Sorry I cannot help you more if I don't see run or the errors, you are getting..

Here are the errors. Sorry if this is too much or looks amateurish. Not sure why it isnt working. Its a simple workflow of "enter data into these inputs", "click a button"", and extract from this text area all from one page. Any help would be appreciated. This is the built-in Web Scraper tool.

tired crow Aug 7, 2023, 1:14 AM

#

@carmine ginkgo take snapshot, looks like targeted site is bot-protected and when you opening it under crawler you not getting web form controls as expected.

carmine ginkgo Aug 7, 2023, 1:30 PM

#

tired crow <@456226577798135808> take snapshot, looks like targeted site is bot-protected a...

I was able to extract a simple h2 heading of the site using the Apify (it’s a simple static HTML site) but it seems as if I’m using an incorrect Apify product or syntax to input, click button, and extract input.

Is puppeteer the recommended product? Thanks for any help.

tired crow Aug 7, 2023, 11:28 PM

#

Header usually not blocked by anti-bot scripts, page content blocked because its matters. If puppeteer is blocked then next best bet is playwright, to check what happening need to save snapshot, what you see in your browser not necessarily equal to what scraper getting at runtime

#Web Scraping After Input, Button Click, and Extracting Text