#scrape job status

11 messages · Page 1 of 1 (latest)

frozen pecan
#

Hi!

I noticed when scraping web sites with a lot of pages (>1000) scrape job gets stuck (or just job status) in a state from where I don't know any more what is going on.

For example, right now I have a running job (job has been limited to max of 1000 scrape urls), and if I fetch the status using the API, I get the following data:

  • status: active
  • current: 1000
  • total: 1000
  • data: 0 items in the list
  • partial_data: 50 items in the list

This state is now the same for hours. Items in partial_data are the same, from index 951 to 1000.

And that's it. Nothing is coming to the webhook, and the job isn't listed in logs (https://www.firecrawl.dev/app/logs).

Should this kind of behaviour be expected?
Should we wait for hours to get all the complete data?

frozen pecan
#

Finally, the job failed, the status is failed.
But this came after a very long time and without any information to our webhook or anywhere else.

ivory girder
#

Hm.. that's odd @frozen pecan Looking into it.

#

Can you dm me your email so we can analyze the logs and see what happened?

#

It shouldnt' have had this behavior

frozen pecan
#

dm sent

covert rapids
#

I am getting a very similar issue on several websites I am attempting to crawl. I don't even get the current or total and its just stuck in active but on the dashboard I can download the documents. JobID: 52204c2f-6360-4851-97a0-a353fd2f4569

Gets response:
{
"success": true,
"status": "active",
"data": null,
"partial_data": []
}

In others I get no activity log and have nothing to look into for the reason of failure.

dreamy agate
#

Hey all! Wanted to give y'all an update. We've built a fix for this and we're currently testing it. This behavior is an edge case that should only happen sometimes currently. If all goes well the fix will be live soon.

frozen pecan
#

hi @dreamy agate , did it go well?

dreamy agate
frozen pecan
#

thanks 👍