#How to clear named KeyStores before every run?

1 messages · Page 1 of 1 (latest)

tough vault
#

I have this function

async function initialSetup() {
    // Clear previous data sets
    const storeKeys = ["categories_store", "details_store"]

    for (const storeKey of storeKeys) {
        try {
            const store = await KeyValueStore.open(storeKey)
            await store.drop()
        } catch (error) {
            console.log(error)
            continue
        }
    }

    // Create outputs directory if it doesn't exist
    if (!fs.existsSync("outputs")) {
        fs.mkdirSync("outputs")
    }
}

It is the first thing that runs before I start my crawler. It works as expected and drops both of the keystores.

But when I try to write fresh data to these stores again with this code

const categoriesStore = await KeyValueStore.open("categories_store")
await categoriesStore.setValue("categories", categories)

I get this error

INFO  PuppeteerCrawler: Starting the crawl
INFO  PuppeteerCrawler: enqueueing new URLs
Error: Key-value store with id: 6c47506c-2c01-4a6a-9eaa-567ee6f58e96 does not exist.
    at KeyValueStoreClient.throwOnNonExisting (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\common\base-client.ts:11:15)
    at KeyValueStoreClient.setRecord (C:\scrape_crawlee\node_modules\@crawlee\src\resource-clients\key-value-store.ts:222:18)

I was expecting it to create a new store if didn't exist, but for some reason it doesn't and I am kinda lost with this error.

Any help would be appreciated!

night finch
#

This looks like a bug, will report

limpid furnace
#

It doesn't seem to be a bug. Herewith an example that It works as expected

tough vault
# limpid furnace It doesn't seem to be a bug. Herewith an example that It works as expected

I should've mentioned it in the original post, I don't have a request handler inline, instead I am importing a router from a routes.ts file which looks like this

import { createPuppeteerRouter, RequestOptions, createHttpRouter, KeyValueStore, useState } from 'crawlee';

export const categorySlugRouter = createPuppeteerRouter();

categorySlugRouter.addDefaultHandler(async ({ page, log }) => {
    log.info(`enqueueing new URLs`);
    
    const categories = await page.evaluate(() => {
        const slugs: string[] = []

        try {
            // Web scraping code...

            return slugs
        } catch (error) {
            return []
        }
    })
    
    // Where the error occurs
    const categoriesStore = await KeyValueStore.open("categories_store")
    await categoriesStore.setValue("categories", categories)
});
#

Then in the main.ts I am setting it up like this

import { categorySlugRouter } from './routes.js';

async function scrapeCategorySlugs() {
    const startUrls = ['http://www.example.com'];

    const categorySlugCrawler = new PuppeteerCrawler({
        requestHandler: categorySlugRouter,
    });

    await categorySlugCrawler.run(startUrls);
}```
limpid furnace
tough vault
#

I'll test this with a new project and see if I can still reproduce it

vernal estuaryBOT
#

@tough vault just advanced to level 1! Thanks for your contributions! 🎉