#Scrape different layouts

1 messages ยท Page 1 of 1 (latest)

white turtle
#

Hi,

I am just getting started with Apify web scraping.

I am trying to scrape a page with different page layouts. There are many pages of listed items with links to the item pages. First I scrape the links from the list and then I need to fetch the actual data from each item page.

How can I manage this in Apify? My current solution right now was to wrap it in an if/else depending on the content of the URL. However, this gives issues when I try to add new requests as it apparently can't use the await statement anywhere else but at the top level of bodies of modules.

inland junco
#

Hello @white turtle,
Generally speaking if the logic of scraping one page is different from others it is implemented as another standalone actor.

But even your if-else solution should work.

May you share the block of code where the await statement cannot be used in? We may figure it out.

white turtle
# inland junco Hello <@218716127948308480>, Generally speaking if the logic of scraping one pag...

Does this mean it is possible to use the output from one actor as input to another actor? ๐Ÿ˜Š

This is essentially what I am trying to do right now that doesn't work:

if(context.request.url.includes("https://madensverden.dk/category/")) {
        const currentPageNo = $('.page-numbers.current').text();
        const nextPageNo = parseInt(currentPageNo) + 1;

        $('.listing-item a').each((index, el) => {
            const link = $(el).attr('href');
            if (link) {
                await context.enqueueRequest({ url: link });
            }
        })

        // Print some information to actor log
        context.log.info(`URLs: ${links}, PageNo: ${currentPageNo}`);

        // Manually add a new page to the queue for scraping.
        await context.enqueueRequest({ url: context.request.url + nextPageNo });
    }

The error I get is this:

ERROR Compilation of pageFunction failed.
await is only valid in async functions and the top level bodies of modules

#

sorry about the shitty formatting in that codeblock - it fucked up when I copied it

inland junco
#

There is several ways how to deal with this, the most easy to understand could be dealing with await outside of the (for)each:

// ...
    if (context.request.url.includes("https://madensverden.dk/category/")) {
        const currentPageNo = $('.page-numbers.current').text();
        const nextPageNo = parseInt(currentPageNo) + 1;
    
        const requests = [];
        $('.listing-item a').each((index, el) => {
            const link = $(el).attr('href');
            if (link) {
                requests.push({ url: link });
            }
        });
    
        await context.enqueueRequests(requests);
    
        // ...
    }

Actually it would generate less requests to Apify API (it will use only one, with all the urls at once).

white turtle
#

that makes sense. So the issue is the foreach. Before changing it I just appended it to a list. I'll go back to doing that ๐Ÿ™‚ Thanks man

#

Is it possible to use the output of an actor as input in another actor?

#

Also I found another issue. context.EnqueueRequests isn't a function that exists when I try it out. Is there any other way to queue multiple requests at the same time?

inland junco
#

Which version of apify, do you use? (can see it in package.json)
In the latest it could be await context.addRequests(requests)

white turtle
reef skyBOT
#

@white turtle just advanced to level 1! Thanks for your contributions! ๐ŸŽ‰

white turtle
#

but addRequests doesn't work either. Again I just get the message that no such function exists

#

I found the apify docs, but I still don't see anything there indicating I can add multiple requests to the queue

inland junco
#

If so I am not that deeply familiar with the version of Apify SDK, but:

for (const request of requests) {
     await context.enqueueRequest(request);
}

should also work.

white turtle
#

Hi again @inland junco
I spent the rest of yesterday trying to figure out what to do from here.

The for-loop seems to be the same solution as my foreach solution? With the same amount of calls?

#

What I want to do from here is to transfer my solution into an application locally and make API calls towards Apify. But I still can't see anywhere in the documentation that I can add multiple requests in one call - it would greatly improve the amount of calls I have to make to the API, so it would be much appreciated if we could figure out if this actually exists despite it not being obvious from the docs. ๐Ÿ˜Š

white turtle
#

@inland junco you don't have to answer me anymore. I have given up on Apify and will just build my own scraper in python. I realize the documentation is quite shit for python and close to non-existent, and it won't take me long to build my own. Thanks for your help though - I am sorry that Apify isn't mature enough for proper python usage and in depth documentation in that area.