I'm using CheerioCrawler and I'd like to mark sessions as bad when the request either times out or there's a proxy error. Those cases trigger an error before reaching requestHandler and the request is added back to the queue without me having the opportunity to mark the session. Is there a hook somewhere that I can use? Or should I override _requestFunctionErrorHandler?
#Mark session as bad when request times out or proxy responds with 502
1 messages · Page 1 of 1 (latest)
I would like to know this as well
You can mark a session as bad with the session.markBad() function within the errorHandler function (which runs on every request failed, as opposed to failedRequestHandler, which runs once a request has reached its max retries)
const crawler = new CheerioCrawler({
proxyConfiguration,
requestHandler: router,
errorHandler: ({ session }) => {
session.markBad();
},
});
But if you just want a session to be thrown away if it fails once, you can do this instead in the sessionPoolOptions:
const crawler = new CheerioCrawler({
proxyConfiguration,
requestHandler: router,
sessionPoolOptions: {
sessionOptions: {
maxErrorScore: 1,
},
},
});
Amazing thank you @tropic warren I didn't know about errorHandler
One more question: how can I access the error in errorHandler? Is it passed as parameter?
All good I found my answer in the docs!
@tropic warren can I prevent the request from being retried depending on the error from the errorHandler?
So I've tried that but without success, the request still ends up being retried
Is there any other way to prevent a retry? Maybe throwing a NonRetryableError?
See on the logs, I print request right after setting request.noRetry to true in errorHandler, then the request is retried right after
Hmm, that means it’s going off of the old value and reassigning it here does nothing. Let me look into it.
Thanks!
@dapper grotto just advanced to level 4! Thanks for your contributions! 🎉
This feature doesn’t seem to exist yet. I’m making a PR on Crawlee’s GitHub to fix this
Thank you @tropic warren let me know if there is a link to the issue that I can follow
GitHub
See this Discord post to fully understand the use case: https://discord.com/channels/801163717915574323/1019936393235017769
Didn't want to make big changes to existing code so kept the else sta...
@dapper grotto It was merged with master
Thank you for the follow up @tropic warren