#How to make crawlee try to refetch?

1 messages · Page 1 of 1 (latest)

finite jolt
#

If the return value of the http api I crawl does not meet expectations, but http status is 200

How can I mark this request as a failure and let crawlee get it again with next proxy?

hybrid locust
#

From what I understood, you want to make a request based on the data you receive from the initial request? If yes, then you can use the context object in the requestHandler to make a new request or enqueue a new request like this.

 import { HttpCrawler } from '@crawlee/http';

const crawler = new HttpCrawler({
    async requestHandler({ crawler, sendRequest, request }) {
        // Send request right away and get a response
        const { body } = await sendRequest({
            url: request.url
        })

        // RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
        const newRequest: RequestOptions = {
            url: request.url,
            uniqueKey: Date.now().toString()
        } 
        
        // Enqueue request
        await crawler.addRequests([newRequest])
    },
});

await crawler.run([
    'http://www.example.com/page-1',
    'http://www.example.com/page-2',
]);
waxen bear
#

wouldnt it make crawlee think its a duplicate?

hybrid locust
#

Good point, from my understanding it shouldn't be a problem if you're using sendRequest for the new request, but if you're using crawler.addRequests you will have to manually generate a uniqueKey for each RequestOptions to prevent it from being marked as duplicate. I have updated my snippet to show how to do this.

waxen bear
#

thanks

cursive thunder
#

throw new Error("REASONOFRETRY")

fallen whale
#

You can also do session.retire() before the throw to ensure it is discarded. Normally, it only increases error score for it