#How to make crawlee try to refetch?
1 messages · Page 1 of 1 (latest)
From what I understood, you want to make a request based on the data you receive from the initial request? If yes, then you can use the context object in the requestHandler to make a new request or enqueue a new request like this.
import { HttpCrawler } from '@crawlee/http';
const crawler = new HttpCrawler({
async requestHandler({ crawler, sendRequest, request }) {
// Send request right away and get a response
const { body } = await sendRequest({
url: request.url
})
// RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
const newRequest: RequestOptions = {
url: request.url,
uniqueKey: Date.now().toString()
}
// Enqueue request
await crawler.addRequests([newRequest])
},
});
await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
]);
wouldnt it make crawlee think its a duplicate?
Good point, from my understanding it shouldn't be a problem if you're using sendRequest for the new request, but if you're using crawler.addRequests you will have to manually generate a uniqueKey for each RequestOptions to prevent it from being marked as duplicate. I have updated my snippet to show how to do this.
thanks
throw new Error("REASONOFRETRY")
You can also do session.retire() before the throw to ensure it is discarded. Normally, it only increases error score for it