#Crawler issues - Cannot crawl static document

1 messages · Page 1 of 1 (latest)

lapis charm
#

I am trying to update the documents of https://explore.wolt.com/en/deu/terms. However, When crawling the document, the crawler returns that there is no text.
The website serves the terms of service as html, so it does not have to do with javascript loading the document later.

#

@craggy harness I think you are the right person to mention in this thread?

craggy harness
#

@lapis charm it could be that the website is displayed different to the crawler: e.g. an „are you a robot?“ wall

harsh spear
#

That is happening with alnsot Evers crawl right now, sometimes with the same doc that worked seconds before

#

I reported that somewhere already

craggy harness
#

Hmmm

#

I‘ll check the logs

harsh spear
craggy harness
#

Sadly not on us as initial HEAD requests are required

lapis charm
#

Thanks for clarifying, I’ll send an email regarding the request

lapis charm
craggy harness
#

i'll pass it onto the phoenix team 👍🏻

lapis charm
#

Furthermore, I seem to not be able to find where the issue is. I see no request header in the head request when running it manually.

You state that it does an invalid redirect. and the error you sent shows that the protocols mismach but for me the head request also seems to check out

#

Only when doing a http:// request we get a similar issue

lapis charm
#

@craggy harness sorry for pinging again. But could you repeat the request but instead use https://?

craggy harness