karmic sky Feb 15, 2024, 11:13 AM

#

What is this?
This plugin allow to scrape an entire website and ingest in rabbithole all website pages and PDFs

Usage
After plugin installation you need to digit scrapycat url

The URL must be the website root url (homepage). The ingest phase may be long, you need to wait the cat response with number of urls/pdf ingested

Settings
On the plugin settings you can set "Ingest PDF": If this settings is enabled the plugin ingest also pdfs presents on website.

Example
"@scrapycat https://cheshire-cat-ai.github.io/docs/"

Plugin repo:
https://github.com/team-sviluppo/cc_scrapycat

GitHub

GitHub - team-sviluppo/cc_scrapycat: A cheshire cat plugin that all...

A cheshire cat plugin that allow to crape an entire website and ingest in rabbithole all website pages and PDFs - team-sviluppo/cc_scrapycat

languid elbow Feb 19, 2024, 3:10 AM

#

Loving the plugin. Awesome work

knotty dirge Jul 8, 2024, 1:55 PM

#

Could someone explain how exactly the cheshire-cat can use all the information it scraped into its current knowledge? I can't seem to get it to work, I installed the plugin and tried a few variations regarding the prompt and I could not get any information from the cat regarding the website the plugin scraped.

#

Also the process was completed succesasfully

knotty dirge Jul 8, 2024, 2:10 PM

#

From what I have gathered the embedder settings are off but I cannot understand how to fix it

knotty dirge Jul 9, 2024, 12:52 PM

#

It was an embedder problem, for anyone in the future, I configured the embedder and everything connected. The problem seemed to be that the RAG was getting the data from the ScrapyCat plugin but it could not basically connect with the LLM to actually use that data.

steep python Oct 20, 2024, 3:48 PM

#

I've been waiting for this for months, really appreciated!

karmic sky Sep 4, 2025, 11:54 AM

#

⚡ Scrapycat Version 2.0.0 Released

The new version add option for using Crawl4i [https://github.com/unclecode/crawl4ai] as html and pdf scraper. This open source library scrape reources and convert content in markdown (useful for complex website and pdfs)

To use Crawl4i update the plugin, enable the option "Use Crawl4ai" and digit the command in chat:

@scrapycat crawl4ai-setup

This command install all packages needs for crawl4ai, wait some minutes to receive the "Crawl4AI setup completed successfully." message. Only the first time you need to execute the command

woeful lark Sep 26, 2025, 7:29 AM

#

Hello everybody!!!
Is this plugin working ?
I tryied it and it imports always just one url, attached 2 images,

user message:
@scrapycat https://cheshire-cat-ai.github.io/docs/

cat message:
1 of 1 URLs successfully imported in rabbit hole!
one is the logger, and the other is the answer of the logger.

No difference in the output with or without Crawl4Ai

Any help on how to make it works?

karmic sky Sep 26, 2025, 12:03 PM

#

woeful lark Hello everybody!!! Is this plugin working ? I tryied it and it imports always ju...

The number of url is regulated by 2 settings og plugin: max_depth & max_pages. How do you have set this settings?
And try also with urls without end slash: @scrapycat https://cheshire-cat-ai.github.io/docs

#Scrapy Cat

⚡ Scrapycat Version 2.0.0 Released