#Changes in source code invalidate npm install cache

1 messages ยท Page 1 of 1 (latest)

limber blade
#

Creating 2 dagger directories like below, changing a file in source code (eg a translation) the npm install stays cached. (which is expected)

...
    sourceCode := dag.Host().Directory("cube-apps-admin", dagger.HostDirectoryOpts{
        Exclude: []string{
            "**/node_modules",
            "package-lock.json",
            "package.json",
        },
    })

    packageFiles := dag.Host().Directory("cube-apps-admin", dagger.HostDirectoryOpts{
        Include: []string{"package*.json"},
    })

    packageJson := packageFiles.File("package.json")
    packageLockJson := packageFiles.File("package-lock.json")
    daggerContainer = daggerContainer.
      WithMountedFile("./package.json", packageFiles.File("package.json")).
      WithMountedFile("./package-lock.json", packageFiles.File("package-lock.json")).
      With(nodeOpts.Npm.WithNpmInstall()).
      WithDirectory(".", sourceCode).
      With(nodeOpts.Npm.WithNpmBuild())
...

But If I like this, any change in the source code, (not in any of the package files), invalidates the cache.


    sourceCode := dag.Host().Directory("cube-apps-admin", dagger.HostDirectoryOpts{
        Exclude: []string{
            "**/node_modules",
        },
    })

    packageJson := sourceCode.File("package.json")
    packageLockJson := sourceCode.File("package-lock.json")
    daggerContainer = daggerContainer.
        WithMountedFile("./package.json", packageJson).
    WithMountedFile("./package-lock.json", packageLockJson).
    With(nodeOpts.Npm.WithNpmInstall()).
    WithDirectory(".", sourceCode.
        WithoutFile("package.json").
        WithoutFile("package-lock.json"),
    ).
    With(nodeOpts.Npm.WithNpmBuild())

my question is, is that the right way to do it? Or is there a better way?

EDIT: add the dag on each example to make clearer what is going on

scarlet viper
#

hm, that does seem kind of unexpected to me ๐Ÿค”
is this just with the local cache on the same instance?

limber blade
#

All tests currently is done local on my laptop yes

untold fog
#

Can you see if it makes a difference to get package.json and the lock file from dag.Host().Directory("cube-apps-admin", dagger.HostDirectoryOps{Include: []string{"package*.json"}) directly? You can revert sourceCode to exclude them so you don't have to exclude later.

strong scarab
#

I personally prefer to use cache volumes instead of all this package*.json step order dancing. That way, running your pipeline works the same way as in your computer and you don't have to worry about the order of the steps.

limber blade
limber blade
#

There isn't much of a point doing that, since npm clean-install deletes node-modules anyway. (npm install does not honor package-lock.json)
I do use a cache volume for .npm tho. Which helps a lot (from 2+min down to 30sec)

strong scarab
#

wouldn't performing an npm ci would be better so a cache volume can be used though?

limber blade
#

Ah right, looks like because I see it in my IDE, does not mean everyone else would LOL!
Will make some tests tho with include/exclude see how it behaves.

#

npm ci is short for clean-install ๐Ÿ˜„

strong scarab
limber blade
#

I was re-reading this now and confused me ๐Ÿ˜„

If I use Include, it will only have the package* files in the fetched directory.
Which means I would need a new directory with the sourceCode, but without the package files, right?

#

Which is what I do in the first example, I think?

#

(That works btw, and I'm "ok" with that)
Just took me a hot minute to figure out why cache was getting invalidated on the second example

#

Hmm I see as the initial post of the dag might be confusing. Let me edit and update the actual dag for each example
Updated.

strong scarab
#

your second approach should still work as intended.. let me do a test really quickly

limber blade
#

Btw same thing happens with git. But here I can't find a way work around this.
Any changes in git will also invalidate the npm clean-install.

        sourceCode := dag.Git(git, dagger.GitOpts{SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK"))}).
            Branch(branch).Tree().
            WithoutDirectory("node_modules").
            WithoutFile("package-lock.json").
            WithoutFile("package.json")

        packageFile := dag.Git(git, dagger.GitOpts{
            SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK")),
        }).Branch(branch).Tree().File("package.json")

        packageLockFile := dag.Git(git, dagger.GitOpts{
            SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK")),
        }).Branch(branch).Tree().File("package-lock.json")

        daggerContainer = daggerContainer.
            WithMountedFile("./package.json", packageFile).
            WithMountedFile("./package-lock.json", packageLockFile).
            With(nodeOpts.Npm.RunCleanInstall()).
            WithDirectory(".", sourceCode)
strong scarab
#

AFAIK this "works as designed" since the git operation will invalidate as all its dependent steps once the head commit of the branch changes.

I think there's a workaround you can use which is call Contents on the package.json and package-lock.json files respectively and then use WithNewFile instead of WithMountedFile. That should not invalidate the cache if the contents of the file don't change

#

give it a try and let us know how it goes

limber blade
#

That worked! Thanks.
Is there something similar for directories?

#

I was hopping to skip some npm builds too later on. But since the git will change, not sure how that would work.
eg if no code is changed but only a config file of a specific variant we build.
I'd want to skip building all variants but the changed one

scarlet viper
strong scarab
scarlet viper
#

this isn't the first time we've gotten confused about the caching behavior of mounted directories/files lol ๐Ÿ˜ข

strong scarab
scarlet viper
#

yup ๐Ÿ™‚

#

i wonder if we need to update our docs to distinguish between mounted and non-mounted calls here

#

since to be fair, i'm not even 100% sure why you'd want the mounted equivalent over the non-mounted one

strong scarab
#

in that case I'd automatically use WithMountedDirectory since I wouldn't want my code to be present in the image that I'm working on

scarlet viper
#

ahhh true, yes

#

the pattern i tend to use instead is that i build into a directory, and then just copy those results into a new container anyways
since often the compilation environment looks very different than the runtime environment anyways

strong scarab
#

having said that, whenever I have to make the decision of using WithMountedDirectory vs WithDirectory I ask myself: "do I want this to be present in the overlay FS"? Without putting too much thought about caching

#

which goes back to your point about clarifying this in the docs from the caching perspective

scarlet viper
limber blade
#

Hmm some nice information!
I'll play later.

I don't really build containers, there are currently 2 "kind" of workflows.

  1. is to run build an electron application in couple architectures, then zip it up and send it to s3 and/or jenkins for quick grab
  2. is to build few websites that prior a 'npm run config $store' is run, that copies some json's over. then build and upload to cloudflare with wrangler. optional save a zip to jenkins
    (I don't really care about caching on the 1 case, but the 2 case ranges from 3-4 to 25-30 sites per run).

I'll report back later or tomorrow! Thanks for your time!

limber blade
#

So back for some testing, my first examples are mostly "scratched" as I'd prefer to use the git in both local and remoteo ops, since the git is cached anyway.

What I don't get is this

        sourceCode = dag.Git(git, dagger.GitOpts{SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK"))}).
            Branch(branch).Tree().
            WithoutDirectory("node_modules").
            WithoutFile("package-lock.json").
            WithoutFile("package.json")
        packageFileContent, err = dag.Git(git, dagger.GitOpts{
            SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK")),
        }).Branch(branch).Tree().File("package.json").Contents(ctx)
        if err != nil {
            log.Fatal().Err(err).Msgf("Failed to get package.json")
        }

        packageLockFileContent, err = dag.Git(git, dagger.GitOpts{
            SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK")),
        }).Branch(branch).Tree().File("package-lock.json").Contents(ctx)
        if err != nil {
            log.Fatal().Err(err).Msgf("Failed to get package-lock.json")
        }

sourceCode will have the cloned content from git without node_modules and package*.json files.
And package*FileContent will have the actual file content.

I would assume that consuming them like this and assume no changes are done in the package files.
it would use the "cached" npmcleaninstall data. But this does not seem the case.

    daggerContainer = daggerContainer.
        WithNewFile("./package.json", dagger.ContainerWithNewFileOpts{
            Contents: packageFileContent,
        }).
        WithNewFile("./package-lock.json", dagger.ContainerWithNewFileOpts{
            Contents: packageLockFileContent,
        }).
        With(nodeOpts.Npm.RunCleanInstall()).
        WithDirectory(".", sourceCode)

From what we talked so far, if I understood correctly, any change in git will invalidate everything below it.
Right?

#

TL;DR git caching only helps with subsequent clones/checkouts, but not with caching in "build" steps after it

strong scarab
#

From what we talked so far, if I understood correctly, any change in git will invalidate everything below it.

That shouldn't be the case if you're using WithFile and WithDirectory as @scarlet viper was previously mentioning. I need to double check this though

limber blade
#

I think I found a "gross" workaround lol.
Let me do couple tests to verify

#

Ahm ๐Ÿ˜„ Now it seems that the clean install IS cached with the latest example.
I'd swaer that earlier it wasnt ๐Ÿ˜„

#

(btw my gross workaround was to .Export() source to host and use it from there :P)
But luckily git worked now!

strong scarab
#

What's the latest example?

limber blade
#
// Get the whole tree
source := dag.Git(url, dagger.GitOpts{
    SSHAuthSocket: dag.Host().UnixSocket(os.Getenv("SSH_AUTH_SOCK")),
  }).Branch(branch).Tree()

// Extract content
pFile, err := source.File("package.json").Contents(ctx)
if err != nil {
    return "", "", err
}

pLockFile, err := source.File("package-lock.json").Contents(ctx)
if err != nil {
    return "", "", err
}

// Use
daggerContainer = daggerContainer.
    WithNewFile("./package.json", dagger.ContainerWithNewFileOpts{
        Contents: pFile,
    }).
    WithNewFile("./package-lock.json", dagger.ContainerWithNewFileOpts{
        Contents: pLockFile,
    }).
    With(nodeOpts.Npm.RunCleanInstall()).
    WithDirectory(".", source)
#

Now to figure this:

After clean-install
I run in go routines the following steps for X number of stores

  • npm run config $store which copies configs/$store to src/config
  • npm run build
  • npm run upload

A change in configs/$store/*.json will invalidate ALL cached results of npm run build of all stores.
Which I'd like to avoid.

Before I "bother" you..(more) Is there a way to print the reason of cache invalidation?

#

I want to say that already I have accomplished a HUGE improvement.
Considering that a normal run is currently taking ~10m across 4 servers.
And I'm currently at ~3-5m in my laptop!