#Mark session as bad when request times out or proxy responds with 502

1 messages · Page 1 of 1 (latest)

dapper grotto
#

I'm using CheerioCrawler and I'd like to mark sessions as bad when the request either times out or there's a proxy error. Those cases trigger an error before reaching requestHandler and the request is added back to the queue without me having the opportunity to mark the session. Is there a hook somewhere that I can use? Or should I override _requestFunctionErrorHandler?

warm sonnet
#

I would like to know this as well

tropic warren
#

You can mark a session as bad with the session.markBad() function within the errorHandler function (which runs on every request failed, as opposed to failedRequestHandler, which runs once a request has reached its max retries)

const crawler = new CheerioCrawler({
    proxyConfiguration,
    requestHandler: router,
    errorHandler: ({ session }) => {
        session.markBad();
    },
});

But if you just want a session to be thrown away if it fails once, you can do this instead in the sessionPoolOptions:

const crawler = new CheerioCrawler({
    proxyConfiguration,
    requestHandler: router,
    sessionPoolOptions: {
        sessionOptions: {
            maxErrorScore: 1,
        },
    },
});
dapper grotto
#

Amazing thank you @tropic warren I didn't know about errorHandler

#

One more question: how can I access the error in errorHandler? Is it passed as parameter?

dapper grotto
#

All good I found my answer in the docs!

#

@tropic warren can I prevent the request from being retried depending on the error from the errorHandler?

tropic warren
#

This should work

dapper grotto
#

So I've tried that but without success, the request still ends up being retried

#

Is there any other way to prevent a retry? Maybe throwing a NonRetryableError?

#

See on the logs, I print request right after setting request.noRetry to true in errorHandler, then the request is retried right after

tropic warren
#

Hmm, that means it’s going off of the old value and reassigning it here does nothing. Let me look into it.

dapper grotto
#

Thanks!

delicate inletBOT
#

@dapper grotto just advanced to level 4! Thanks for your contributions! 🎉

tropic warren
#

This feature doesn’t seem to exist yet. I’m making a PR on Crawlee’s GitHub to fix this

dapper grotto
#

Thank you @tropic warren let me know if there is a link to the issue that I can follow

tropic warren
tropic warren
#

@dapper grotto It was merged with master

dapper grotto
#

Thank you for the follow up @tropic warren