#Is there a more effcient way I could be

1 messages · Page 1 of 1 (latest)

near copper
#

As a note, I used to be able to do this when using the SDK direct:


// Lint runs a yaml linter
func (Yaml) Lint(ctx context.Context, c *dagger.Client) error {
    client := c

    golang := client.Container().
        From("registry.gitlab.com/pipeline-components/yamllint:0.29.0").
        WithDirectory(".", client.Host().Directory(".",
            dagger.HostDirectoryOpts{
                Include: []string{".yamllint", "**/*.yaml", "**/*.yml"},
                Exclude: []string{"**/node_modules", ".cache"},
            },
        )).
        WithExec([]string{"yamllint", "."})

    output, err := golang.Stdout(ctx)
    if err != nil {
        panic(err)
    }

    // print output
    fmt.Println(output)

    return nil
}
#

from a timing point of view,

time docker run --rm -v `pwd`:/code registry.gitlab.com/pipeline-components/yamllint:0.32.1 yamllint -c /code/.yamllint /code

Takes around 3 seconds to run, yet the best time I can get out of my dagger code is about 12 seconds, with the first execution being around 50 seconds

undone star
#

I wonder if you could replace all the logic around adding files to the empty directory with WithoutDirectory and WithoutFile as in this example.

near copper
#

It would have to have some sort of pattern that negates yaml / yml / .yamlint files?

#

I think it'd end up in the same place of finding each file to exclude rather than include

#

withoutDirectory / withoutFile each teke a single path

I guess I'm looking for a WithFiles that matches the WithoutFiles signature

undone star
#

Something like this?

func (m *DaggerExample) Test(srcDir *dagger.Directory) *dagger.Directory {
    localFile := srcDir.File("README.md")
    return dag.Directory().WithFiles("", []*dagger.File{localFile})
}
#

Regarding the negation pattern, perhaps you could try a similar approach to this.

// +optional
// +defaultPath="."
// +ignore=["*", "!*.yaml", "!*.yml", "!.yamllint"]
srcDir *dagger.Directory,
#

Yeah, this seems to work for me:

func (m *DaggerExample) Test(
    ctx context.Context,
    // +optional
    // +defaultPath="."
    // +ignore=["*", "!*.yaml", "!*.yml", "!.yamllint"]
    srcDir *dagger.Directory,
) string {
    entries, err := srcDir.Entries(ctx)
    if err != nil {
        log.Fatal(err)
    }

    return strings.Join(entries, ",")
}
$ dagger call test

.yamllint,foo.yaml,foo.yml
covert breach
#

@near copper in addition to the other solutions suggested here, note that Directory.withDirectory accepts optional Include and Exclude arguments, which you can use to filter.

#
// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
    dir := dag.Directory().WithDirectory("", srcDir, dagger.DirectoryWithDirectoryOpts{
        Include: []string{"**/*.yml", "**/*.yaml", ".yamlint"},
    })
    return dag.Container().
        From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
        WithDirectory("/code", dir).
        WithWorkdir("/code").
        WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
        Stdout(ctx)
}
near copper
#

Thank you

I've gone with Solomon's suggestion and it is giving much better cache performance

I don't think it's perfect as the following doesn't cache hit as I'd expect

  1. Run the lint - 36 seconds
❯ dagger call -m ci lint     
✔ connect 0.3s
✔ initialize 5.3s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 30.7s
  1. Run the lint
❯ dagger call -m ci lint     
✔ connect 0.3s
✔ initialize 1.1s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 1.6s
  1. Edit a non yaml file
    echo "test" > bash.sh

  2. Re run the lint

❯ dagger call -m ci lint     
✔ connect 0.3s
✔ initialize 1.0s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 28.1s

There is some time taken uploading the src dir, however even once done, dagger is re-running the yamllint

Have I miss-understod something about the caching system, as I would have thought the 4th run as above would not re-run the yamllint itself

#
// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
    dir := srcDir.WithDirectory("", srcDir, dagger.DirectoryWithDirectoryOpts{
        Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
        Exclude: []string{"**/node_modules", ".cache"},
    })

    return dag.Container().
        From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
        WithDirectory("/code", dir).
        WithWorkdir("/code").
        WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
        Stdout(ctx)
}
#

ok, I don't think that withDirectory(""), overwrites the root dir, so all the updated files are still in place, updating to this:

// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
    dir := srcDir.WithDirectory("filtered", srcDir, dagger.DirectoryWithDirectoryOpts{
        Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
        Exclude: []string{"**/node_modules", ".cache", ".git"},
    }).Directory("filtered")

    entires, _ := dir.Entries(ctx)
    for _, f := range entires {
        fmt.Println(f)
    }

    return dag.Container().
        From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
        WithDirectory("/code", dir).
        WithWorkdir("/code").
        WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
        Stdout(ctx)
}

at least gievs a dir with the correct files. It still triggers a cache miss

covert breach
#

@near copper glad you're unblocked! Note that the big advantage of using +ignore in the argument, is specifically when srcDir is uploaded from your local filesystem. The +ignore allows for pre-call filtering, so the engine knows only to upload the files you need. Hence the difference in upload speed. From a caching perspective, and for calls where srcDir is not uploaded from the client, there is no difference between the two methods

#

So even though I showed you how to do it with Directory.withNewDirectory(include:) I think in your case, using +ignore might be the superior solution, because of the more efficient uploads

near copper
#

Thanks for the suggestion

How would the +ignore work if I wanted this as a function (within a library) rather than as a stand alone module?

I'm thinking my main lint module (function) to do something like:

func (CI) Lint(ctx context.Context, srcDir *dagger.Directory) error {
  eg, ctx := errgroup.WithContext(ctx)
  eg.Go(func() error { return utils.LintYaml(ctx, dag, srcDir) })
  eg.Go(func() error { return utils.LintSpelling(ctx, dag, srcDir) })
  eg.Go(func() error { return utils.LintDockerfiles(ctx, dag, srcDir) })
  
  return eg.Wait()
}

In this case, I'd want my main module to have the entire src dir included, and then each util function having the ability to filter out the files that it needs?

Am I over complicating this and should instead have differnt CI jobs that trigger each (pseduo gitlab ci.yaml):

- name: yaml
  stage: lint
  cmd: dagger call -m ci lint-yaml 
- name: docker
  stage: lint
  cmd: dagger call -m ci lint-docker
- name: spelling
  stage: lint
  cmd: dagger call -m ci lint-spelling

My primary aim is to have as many cache hits as possible

#

could / should I re-org my code into a lint module that my main pipeline calls?

(I'm refactoring from using the SDK directly over to dagger modules, so apologies if my head space is a few months out of date)

zinc falcon
#

Heya, I was directed to this thread from a GitHub discussion. I have a pretty similar issue and ended up using both the +ignore directive and the DirectoryWithDirectoryOpts configuration.

#

However, for some reason, It's not excluding all YAML in my snapshot. In most of the folders, it is, but in my frontend folder in this screenshot, it's still including everything (my node_modules for example).

#

This is my ignore directive:

// Root directory where all modules/bases are accessible from. (Usually just the $GITHUB_WORKSPACE)
// +optional
// +defaultPath="."
// +ignore=["*", "!*.yaml", "!*.yml"]
rootDir *dagger.Directory,
undone star
#

Hmm perhaps you need to use !**/*.yaml"?

zinc falcon
#

No, the cache is still massive 😦 . My confusion is why the +ignore directive still allows matching files to be uploaded in some Snapshots.

undone star
#

Not sure about that. Let me explain what I did. I've removed the engine container and the volume to have a clean slate. Then I run the following:

func (m *DaggerExample) Test(
    ctx context.Context,
    // +optional
    // +defaultPath="."
    // +ignore=["*", "!**/*.yaml", "!**/*.yml", "!.yamllint"]
    srcDir *dagger.Directory,
) *dagger.Container {
    dir := srcDir.WithDirectory("filtered", srcDir, dagger.DirectoryWithDirectoryOpts{
        Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
        Exclude: []string{"**/node_modules", ".cache", ".git"},
    }).Directory("filtered")

    entires, _ := dir.Entries(ctx)
    for _, f := range entires {
        fmt.Println(f)
    }

    ctr := dag.Container().
        From("alpine").
        WithDirectory("/src", dir).
        WithWorkdir("/src")

    return ctr
}

When calling the func with dagger call test with-exec --args "ls,-lR" stdout, I see only the YAML files from my root dir . and the one inside the frontend folder:

.:
total 4
-rw-r--r--    2 root     root             0 Sep 24 21:57 foo.yaml
-rw-r--r--    2 root     root             0 Sep 24 21:57 foo.yml
drwxr-xr-x    2 root     root          4096 Oct  1 16:39 frontend

./frontend:
total 0
-rw-r--r--    2 root     root             0 Oct  1 16:36 frontend.yaml
zinc falcon
#

I guess my question is about the other 45 snapshots on your volume. Do any of them contain non-YAML?

undone star
#

There appears to be files from the alpine filesystem (kind of expected as I'm running an alpine container in the function) and Dagger-specific files (output.json, schema.json).

#

But most of the numeric folders contain the YAML files (not sure why there are that many folders with the same YAML files tbh)

zinc falcon
#

I think that's my #1 question lol
(1) why are there so many snapshots?
(2) why do some contain files I've explicitily ignored?