#withMountedDirectory behaviour

1 messages · Page 1 of 1 (latest)

random vale
#

If I do this:

import * as dagger from '@dagger.io/dagger';

dagger.connect(
  async (client) => {
    const assets = client.host().directory('.', {
      include: ['foo.txt'],
    });

    await client
      .container()
      .from('node:16-slim')
      .withWorkdir('/root/foo')
      .withEntrypoint(['/bin/sh', '-c'])
      .withMountedDirectory('.', assets)
      .withExec('mkdir -p ./foo/bar')
      .withExec('touch ./foo/bar/file.txt')
      .export('./foo.tar.gz');
  },
  {
    LogOutput: process.stdout,
    Workdir: '.',
  }
);

And load the docker image, the file.txt asset generated within the Dagger pipeline does not exist.

But if I do this:

import * as dagger from '@dagger.io/dagger';

dagger.connect(
  async (client) => {
    const assets = client.host().directory('.', {
      include: ['foo.txt'],
    });

    await client
      .container()
      .from('node:16-slim')
      .withWorkdir('/root/foo')
      .withEntrypoint(['/bin/sh', '-c'])
      .withDirectory('.', assets)
      .withExec('mkdir -p ./foo/bar')
      .withExec('touch ./foo/bar/file.txt')
      .export('./foo.tar.gz');
  },
  {
    LogOutput: process.stdout,
    Workdir: '.',
  }
);

The file.txt exists.

I understand the differences between withDirectory and withMountedDirectory, but it makes no sense to me that what I choose to add after the mount, which in this case happens to be in the same root directory, would also wipe the assets that I made within the pipeline. The behaviour should be that it only removes the assets that were originally mounted.

[In terms of why I think this is bad: this cost me a couple of hours of my time to figure out why my final image was missing assets in a complex pipeline.]

twilit oar
# random vale If I do this: ``` import * as dagger from '@dagger.io/dagger'; dagger.connect...

WithMountedDirectory creates a mount point at the given path and whatever happens within that path does not affect the copy-on-write layer of the images you're working on.

If you're familiarized with Dockerfiles, this is the same behavior that happens with RUN --mount

Having said that, even though I undersand the behavior you're expecting, imagine the opposite scenario:

  • I want to run a command that uses a specific path to generate some sort of artifacts that could be localted in multiple folders within that path that I don't want to include in the final image. If WithMountedDirectory works the way you're expecting, this would be very painful to do
#

The recommended way to handle the situation you're expecting is to create multiple folders and mount things in multiple places. For example

/assets > mounted assets from local machine
/foo/bar > regular CoW files for the image

random vale
#

So I could move the assets around within the pipeline, but copying a large node_modules from one directory into another is going to be very slow to do. I'll have to use withDirectory and then remove the assets I don't want to keep afterwards.

Ngl here, it was really annoying to have to figure this out through trial and error, and I've ran into several footguns like this that takes hours of my time to debug. I'm not saying your argument is wrong, but I do think that the current setup where Dagger happily exports an image that strips off assets that you assumed existed because you specified it in your pipeline isn't ideal. That could cause me to write pipelines that crash applications on startup in production.

twilit oar
#

So I could move the assets around within the pipeline, but copying a large node_modules from one directory into another is going to be very slow to do. I'll have to use withDirectory and then remove the assets I don't want to keep afterwards.

You're correct about this, but now that I have an idea about your use-case better (node_modules handling), what generally works better in this case instead of using WithMountedDirectory is to use a cache volume with your yarn / npm cache folder and then let yarn install generate the node_modules folder in the CoW filesystem as it normally does. Let me know if this makes sense and if you need some guidance on how to make this work

Ngl here, it was really annoying to have to figure this out through trial and error, and I've ran into several footguns like this that takes hours of my time to debug.
but I do think that the current setup where Dagger happily exports an image that strips off assets that you assumed existed because you specified it in your pipeline isn't ideal. That could cause me to write pipelines that crash applications on startup in production.

I agree with your sentiment, and we really appreciate your input so we can improve our DX and docs about this behavior. As a final comment, take into account that Dagger is not stripping anything during export time. As stated before, this is the exact same UX limitation Dockerfiles have when building images. Things that are not in the CoW filesystem (like mounts), will not be present in the final image.

So in conclusion, this hard time you're having is mostly a consequence of our lack of examples and docs so you can efficiently use the correct pipeline features to achieve what you're trying to do. cc @rigid berry

random vale
#

You're correct about this, but now that I have an idea about your use-case better (node_modules handling), what generally works better in this case instead of using WithMountedDirectory is to use a cache volume with your yarn / npm cache folder and then let yarn install generate the node_modules folder in the CoW filesystem as it normally does. Let me know if this makes sense and if you need some guidance on how to make this work

If I'm understanding your train of thought, unfortunately cache volumes won't work either as node modules have to be shipped with the image; I spoke to Jeremy/Kyle on a call about this a month back. For more details see this issue I raised:

https://github.com/dagger/dagger/issues/5635

GitHub

What are you trying to do? Let's say I have a node app. At present, I can use withMountedCache to reduce the npm install times i.e. it does not require us to re-fetch the dependencies over the ...

twilit oar
twilit oar
# twilit oar What package manager are you using? npm, yarn or pnpm? I can put a quick example...

here's an example Ronan

import { connect } from "@dagger.io/dagger"

// initialize Dagger client
connect(
  async (client) => {

    const assets = client.host().directory(".", { exclude: ["node_modules", "out"] })

    const node = client.container().from("node:16")
      .withExec(["node", "-v"])
      .withMountedCache("/usr/local/share/.cache/yarn", client.cacheVolume("yarn_cache"))
      .withDirectory("/app", assets)
      .withWorkdir("/app")
      .withExec(["yarn", "install"])


    // execute
    await node.directory("/app").export("./out")
  },
  { LogOutput: process.stderr }
)

as you can see, I'm using withMountedCache and then I let yarn install create the node_modules folder by linking the yarn cache assets which will be present upon exporting the application directory. Would this work in your use case?

random vale
#

I'm using yarn with an offline install of tar.gz files. I'll have to look at this again tomorrow then, as my last attempt in the github issue failed.

twilit oar
#

oh, so the CI server can't access the internet to install packages?

twilit oar
twilit oar
#

@random vale this is how to make it work with an offline cache mirror:

import { connect } from "@dagger.io/dagger"

// initialize Dagger client
connect(
  async (client) => {

    const assets = client.host().directory(".", { exclude: ["node_modules", "out"] })
    const offline_cache = client.host().directory("/home/marcos/npm-packages-offline-cache/")

    const node = client.container().from("node:16")
      .withMountedDirectory("/npm-offline-cache", offline_cache)
      .withMountedCache("/usr/local/share/.cache/yarn", client.cacheVolume("yarn_cache"))
      .withDirectory("/app", assets)
      .withWorkdir("/app")
      .withExec(["yarn", "config", "set", "yarn-offline-mirror", "/npm-offline-cache"])
      .withExec(["yarn", "install", "--offline", "--frozen-lockfile"])

    // execute
    await node.directory("/app").export("./out")
  },
  { LogOutput: process.stderr }
)

^ here's a working example with an offline mirror and as you can see here I'm using both WithMountedDirectory and WithMountedCache since I don't want /npm-offline-cache to be present in my final image but I do want node_modules to be there.

#

and since you're using a yarn_cache here, all subsequent builds will be very fast since node_modules will be popualted from the yarn cache directly

random vale
#

Thanks, I'll try and test this out on the project in question next Monday.

random vale
#

Hey, so that solution is yarn 1 specific. but I adapted it to yarn 3.


    const setupYarn = () => {
      return client
        .container()
        .from('node:16-slim')
        .withEntrypoint(['/bin/sh', '-c'])
        .withWorkdir('/root/titan')
        .withMountedCache(
          '/root/.yarn/berry/cache',
          client.cacheVolume('berry_cache')
        )
        .withMountedDirectory('/root/npm-packages-offline-cache', offlineCache)
        .withDirectory('.', packageJsons)
        .withExec(
          'yarn config set cacheFolder /root/npm-packages-offline-cache'
        )
        .withExec('yarn config set enableNetwork false');
    };

That does partially solve the issue. Which I can then reuse like so:

        .withDirectory(
          'node_modules',
          setupYarn()
            .withExec('yarn install --immutable --immutable-cache')
            .directory('/root/titan/node_modules')
        );

However, there's still a few issues.

To do the install you need the package json files. But you can't use

.withMountedDirectory('.', packageJsons)

As node_modules won't be emitted. Instead you have to do:

.withDirectory('.', packageJsons)

Secondly, transferring node modules from one container into another very slow. I suspect it's because copying loads of tiny files is slower than a large file.

#

Dagger telemetry shows the slowdown, though weirdly it also says that an npm install is slow. 🤷‍♂️

twilit oar
#

Secondly, transferring node modules from one container into another very slow. I suspect it's because copying loads of tiny files is slower than a large file.

this shouldn't have to be the case because this is done internally through mergeOp which is at the overlay level that doesn't involve file copying. Having said that, since I can't see the full pipeline, do you need to transfer the node_modules between containers?

#

As node_modules won't be emitted. Instead you have to do:

.withDirectory('.', packageJsons)

that's not an issue since you're doing withDirectory("node_modules") afterwards it's the correct way to add the packageJsons. That's what I did in my example above so IDK why you changed it to withMoubtedDirectory

twilit oar
random vale
#

Having said that, since I can't see the full pipeline, do you need to transfer the node_modules between containers?

I could merge it into one client.container.from() chain, but then how are you supposed to have code reuse? I have to install dependencies in multiple places, but with slight variations e.g. in one place install all dev deps, but in another only install production deps.

that's not an issue since you're doing withDirectory("node_modules") afterwards it's the correct way to add the packageJsons. That's what I did in my example above so IDK why you changed it to withMoubtedDirectory

I don't follow this, sorry. This will be an issue if you have pipelines that don't branch off into different containers to setup your dependencies i.e. you don't want to ship the final image with fluff that's not required.

twilit oar
#

@random vale would you like to jump into #911305510882513037 whenever you have some time to unblock you? I think the main blocker here is our lack of docs on how to make this flow clearer. All your questions/conerns above have an answer on how to make them work but it's hard to me to provide comprehensive input if I can't fully graps your context. Each time we interact here there's a new input from your side about how you're attempting to do this which is not allowing us to fully unblock you

#

or we can continue async if you prefer if this is not a priority for you.