#Crawl "Include Only Paths" not working?

3 messages · Page 1 of 1 (latest)

low lodge
#

I'm trying to scrape the products that exist on catalog page, and to do so I'm setting up a crawl where I set an Include Only Paths (includesPath), however the crawl only returns the original catalog URL.

The catalog page / main crawl URL: https://www.ssense.com/en-us/men/sale/clothing.
Ex. Product Page 1: https://www.ssense.com/en-us/men/product/essentials/black-patch-hoodie/14616841
Ex. Product Page 2: https://www.ssense.com/en-us/men/product/auralee/brown-pleated-trousers/14085441
Include Only Paths: en-us/men/product/

In this case I expect to get back all 3 pages back, but I only get the catalog page. I've even tried Allowing backwards links. Is this a bug or am I missing something?

lavish mirage
#

Did you try /en-us/men/product/* (note the first forward slash and the last asterisk)?

low lodge
#

Hi yes, see attached screenshot