#Accessing collection data at build time

23 messages · Page 1 of 1 (latest)

round berry
#

Hi.

I have a content collection defined in src/content.config.ts like this:

import { defineCollection } from 'astro/content/config';
import { glob } from 'astro/loaders';
import { z } from 'astro/zod';

const appSchema = z.object({ name: z.string() });

const apps = defineCollection({
    loader: glob({
        base: './src/content/apps',
        pattern: '**/*.md',
    }),
    schema: appSchema,
});

export const collections = { apps };

As you can see, it's a collection of markdown files located in src/content/apps, each having specific properties such as name defined in the schema.

Now I want to integrate pagefind, but not in the traditional way where it scans the build output, instead I need to access my collection at build time, then send this data to pagefind API programatically and it will generate it's own files for me.

But the question is, how do I do that? Right now I'm trying to create a custom integration like this:

import type { AstroIntegration } from 'astro';
import { getCollection } from 'astro:content';

function testIntegration(): AstroIntegration {
    return {
        name: 'pagefind-integration',
        hooks: {
            'astro:build:done': async ({ assets, dir, logger, pages }) => {                const apps = await getCollection('apps');
            },
        },
    };
}

export { testIntegration };

// then in my astro.config.ts
export default defineConfig({
    integrations: [testIntegration()],
});

However, I am getting errors because I don't think I'm allowed to use getCollection in integrations, is that correct? How do I tackle this problem.

lyric salmonBOT
#

Hi @round berry. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

You're correct that getCollection() is not designed to be used inside integration hooks. It's a function meant for use within Astro page/component files (i.e., at request or build-time rendering), not inside the integration API context.

The knowledge sources confirm that getCollection() is used in Astro component frontmatter like this: [getCollection docs]

---
import { getCollection } from 'astro:content';
const allApps = await getCollection('apps');
---

Unfortunately, the provided knowledge sources do not contain enough information about how to access collection data from within an integration hook (such as astro:build:done). The integration API reference only describes what hooks like astro:build:done provide (e.g., pages, dir, assets, logger), and none of those give direct access to collection entries. [build:done hook]

Possible workaround to consider

Since the knowledge sources don't cover this directly, here are some directions based on what is documented:

  1. Use a custom build-time loader that, as part of its load() method, also calls the Pagefind API. The loader runs at build time and has access to your data before it's stored. [custom loaders]

  2. Write the collection data to a file during the build from within an Astro page (using getCollection()), and then read that file in your astro:build:done hook using Node's fs APIs. The dir option in astro:build:done gives you the build output directory URL. [build:done hook]

#

For a definitive answer on accessing collection data inside integration hooks, I'd recommend checking the Astro Discord or the Astro GitHub discussions, as this is a more advanced use case not covered in the provided documentation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@round berry Kapa.ai is still learning and improving, please let me know how I did by reacting below.

round berry
#

@lyric salmon I have a follow-up question.

I have a custom AstroIntegration which is a function. That function accepts parameters, one of which I want to be the dev server address and port - the address which is used during astro dev. How do i get this address and port?

lyric salmonBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You can get the dev server address and port using the astro:server:start hook, which fires just after the server's listen() event and provides an address option of type AddressInfo (from Node.js's net module, containing address, family, and port). [server:start hook]

Here's how you can use it in your integration:

import type { AstroIntegration } from 'astro';

function testIntegration(): AstroIntegration {
    return {
        name: 'my-integration',
        hooks: {
            'astro:server:start': ({ address, logger }) => {
                const host = address.address;
                const port = address.port;
                logger.info(`Dev server running at ${host}:${port}`);
                // use host and port as needed
            },
        },
    };
}

export { testIntegration };

Note that this hook only runs in dev mode (i.e., when using astro dev), not during builds or preview. [server:start hook]

If you need to pass this address/port to the function that creates your integration, you could store it in a shared variable within the integration's closure and populate it when the hook fires, rather than passing it as a parameter upfront (since the address isn't known until the server starts).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@round berry Kapa.ai is still learning and improving, please let me know how I did by reacting below.

round berry
#

@lyric salmon I have a pagefind integration which outputs a bunch of scripts in the public directory after the astro build and astro dev commands. I then have an .astro component file in which I am importing a react .tsx file.

My question is, how do I load the pagefind js files inside of my react file?

lyric salmonBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Based on the knowledge sources, since your pagefind scripts are in the public/ directory, they are served as static files and not processed by Astro's bundler. To load them in your React .tsx file, you have a couple of options:

Option 1: Use a <script> tag with is:inline in your .astro component

Since you can't use .astro components inside React files, you can load the script in the parent .astro file using is:inline with an absolute path (since the file is in public/):

<!-- In your .astro component -->
<script is:inline src="/pagefind/pagefind.js"></script>

This will make the pagefind library available globally, and your React component can then access it via window or the global scope. [external scripts]

Option 2: Dynamic import inside your React component

You can use a dynamic import() inside your React .tsx file to load the script at runtime. However, note that files in public/ are not processed by Astro/Vite, so a static ESM import statement won't work for them. A dynamic import or window-level access after the script is loaded via the .astro parent is the recommended approach.

Key constraint to be aware of

You cannot import .astro components inside a React .tsx file — the React component is a self-contained framework island. Any script loading that relies on Astro's processing must happen in the .astro parent file. [framework components]

#

Also, keep in mind that files placed in public/ are not bundled or optimized by Astro, so they must be referenced by their absolute URL path (e.g., /pagefind/pagefind.js). [public directory]

The knowledge sources don't go into further detail about dynamic runtime loading patterns specific to pagefind, so for more advanced integration patterns I'd recommend checking the Astro Discord.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@round berry Kapa.ai is still learning and improving, please let me know how I did by reacting below.

round berry
#

@lyric salmon Why can't I use this:

import { useEffect, useState } from 'react';

async function getPagefind() {
    const pagefind = await import('/pagefind/pagefind.js');
    await pagefind.init();
    return pagefind;
}

export const TestComponent = () => {
    const [pagefind, setPagefind] = useState(null);

    console.log(pagefind);

    useEffect(() => {
        void getPagefind().then(setPagefind);


    }, []);

    return (
        <div>
            <h1>Test Component</h1>
            <p>This is a test component to verify React integration.</p>
        </div>
    );
};

I want to use a dynamic import here, but im getting an error:

Cannot import non-asset file /pagefind/pagefind.js which is inside /public. JS/CSS files inside /public are copied as-is on build and can only be referenced via <script src> or <link href> in html. If you want to get the URL of that file, use /pagefind/pagefind.js?url instead.
lyric salmonBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

The error message itself explains the issue clearly: Vite (which Astro uses under the hood) does not allow dynamic ESM import() of JS files located in the public/ directory. Files in public/ are copied as-is and are not processed or bundled, so they can only be referenced via <script src> or <link href> in HTML — not via ESM import() statements. [public directory]

The error message also hints at a workaround: using ?url to get the URL of the file rather than importing it as a module. However, for actually executing the script, the recommended approach based on the knowledge sources is to load it via a <script> tag.

Recommended approach

Load the pagefind script in your parent .astro file using is:inline and an absolute path:

<script is:inline src="/pagefind/pagefind.js"></script>

This makes pagefind available on the global window object, and your React component can then access it at runtime like this:

import { useEffect, useState } from 'react';

export const TestComponent = () => {
    const [pagefind, setPagefind] = useState(null);

    useEffect(() => {
        // Access pagefind from the global scope after the script has loaded
        if (window.pagefind) {
            window.pagefind.init().then(() => setPagefind(window.pagefind));
        }
    }, []);

    return (
        <div>
            <h1>Test Component</h1>
        </div>
    );
};
#

The key point is that is:inline src="/..." is the correct way to load scripts from public/ — it skips Astro/Vite's processing and references the file by its absolute URL path, which is exactly what files in public/ require. [external scripts]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@round berry Kapa.ai is still learning and improving, please let me know how I did by reacting below.

round berry
#

@lyric salmon Ok, but then how can I guarantee that pagefind code will execute before the react code.

<Layout title='Home page'>
<Header />
<Main>
<TestComponent client:load />
</Main>
</Layout>

<script
is:inline
src='/pagefind/pagefind.js'

</script>

TestComponent is the React component which previously tried to load the pagefind library, and keep in mind I'm also using the client:load directive there.

lyric salmonBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

This is a timing challenge: with client:load, the React component's JavaScript loads and executes immediately on page load, and there's no built-in guarantee that the <script is:inline src="/pagefind/pagefind.js"> will have finished loading before your React component tries to access window.pagefind.

The knowledge sources do not explicitly cover this specific timing scenario. However, based on what they do say:

  • client:load loads and hydrates the component JavaScript immediately on page load [client:load]
  • is:inline scripts are rendered exactly where they are authored in the final HTML output [is:inline directive]

Since is:inline scripts are rendered in place, the order in your .astro file matters. If you place the <script is:inline src="/pagefind/pagefind.js"> before the component that uses it, it will appear earlier in the HTML output.

That said, the safest approach within your React component is to poll or wait for window.pagefind to be available rather than assuming it's already there:

useEffect(() => {
    const interval = setInterval(() => {
        if (window.pagefind) {
            clearInterval(interval);
            window.pagefind.init().then(() => setPagefind(window.pagefind));
        }
    }, 50);

    return () => clearInterval(interval);
}, []);

This way, regardless of exact script execution order, your React component will wait until pagefind is available on window before proceeding.

#

The knowledge sources don't go deeper into cross-script timing guarantees in this specific scenario, so for a more robust solution I'd recommend checking the Astro Discord.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: