#Is there a more effcient way I could be
1 messages · Page 1 of 1 (latest)
As a note, I used to be able to do this when using the SDK direct:
// Lint runs a yaml linter
func (Yaml) Lint(ctx context.Context, c *dagger.Client) error {
client := c
golang := client.Container().
From("registry.gitlab.com/pipeline-components/yamllint:0.29.0").
WithDirectory(".", client.Host().Directory(".",
dagger.HostDirectoryOpts{
Include: []string{".yamllint", "**/*.yaml", "**/*.yml"},
Exclude: []string{"**/node_modules", ".cache"},
},
)).
WithExec([]string{"yamllint", "."})
output, err := golang.Stdout(ctx)
if err != nil {
panic(err)
}
// print output
fmt.Println(output)
return nil
}
from a timing point of view,
time docker run --rm -v `pwd`:/code registry.gitlab.com/pipeline-components/yamllint:0.32.1 yamllint -c /code/.yamllint /code
Takes around 3 seconds to run, yet the best time I can get out of my dagger code is about 12 seconds, with the first execution being around 50 seconds
I wonder if you could replace all the logic around adding files to the empty directory with WithoutDirectory and WithoutFile as in this example.
It would have to have some sort of pattern that negates yaml / yml / .yamlint files?
I think it'd end up in the same place of finding each file to exclude rather than include
withoutDirectory / withoutFile each teke a single path
I guess I'm looking for a WithFiles that matches the WithoutFiles signature
Something like this?
func (m *DaggerExample) Test(srcDir *dagger.Directory) *dagger.Directory {
localFile := srcDir.File("README.md")
return dag.Directory().WithFiles("", []*dagger.File{localFile})
}
Regarding the negation pattern, perhaps you could try a similar approach to this.
// +optional
// +defaultPath="."
// +ignore=["*", "!*.yaml", "!*.yml", "!.yamllint"]
srcDir *dagger.Directory,
Yeah, this seems to work for me:
func (m *DaggerExample) Test(
ctx context.Context,
// +optional
// +defaultPath="."
// +ignore=["*", "!*.yaml", "!*.yml", "!.yamllint"]
srcDir *dagger.Directory,
) string {
entries, err := srcDir.Entries(ctx)
if err != nil {
log.Fatal(err)
}
return strings.Join(entries, ",")
}
$ dagger call test
.yamllint,foo.yaml,foo.yml
@near copper in addition to the other solutions suggested here, note that Directory.withDirectory accepts optional Include and Exclude arguments, which you can use to filter.
// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
dir := dag.Directory().WithDirectory("", srcDir, dagger.DirectoryWithDirectoryOpts{
Include: []string{"**/*.yml", "**/*.yaml", ".yamlint"},
})
return dag.Container().
From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
WithDirectory("/code", dir).
WithWorkdir("/code").
WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
Stdout(ctx)
}
Thank you
I've gone with Solomon's suggestion and it is giving much better cache performance
I don't think it's perfect as the following doesn't cache hit as I'd expect
- Run the lint - 36 seconds
❯ dagger call -m ci lint
✔ connect 0.3s
✔ initialize 5.3s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 30.7s
- Run the lint
❯ dagger call -m ci lint
✔ connect 0.3s
✔ initialize 1.1s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 1.6s
-
Edit a non yaml file
echo "test" > bash.sh -
Re run the lint
❯ dagger call -m ci lint
✔ connect 0.3s
✔ initialize 1.0s
✔ prepare 0.0s
✔ ci: Ci! 0.0s
✔ Ci.lint: String! 28.1s
There is some time taken uploading the src dir, however even once done, dagger is re-running the yamllint
Have I miss-understod something about the caching system, as I would have thought the 4th run as above would not re-run the yamllint itself
// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
dir := srcDir.WithDirectory("", srcDir, dagger.DirectoryWithDirectoryOpts{
Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
Exclude: []string{"**/node_modules", ".cache"},
})
return dag.Container().
From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
WithDirectory("/code", dir).
WithWorkdir("/code").
WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
Stdout(ctx)
}
ok, I don't think that withDirectory(""), overwrites the root dir, so all the updated files are still in place, updating to this:
// Lint will run yamllint on the given directory via dagger. The directory will
// be filted for .yaml and .yml files and the .yamllint file will be respected
func Lint(ctx context.Context, dag *dagger.Client, srcDir *dagger.Directory) (string, error) {
dir := srcDir.WithDirectory("filtered", srcDir, dagger.DirectoryWithDirectoryOpts{
Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
Exclude: []string{"**/node_modules", ".cache", ".git"},
}).Directory("filtered")
entires, _ := dir.Entries(ctx)
for _, f := range entires {
fmt.Println(f)
}
return dag.Container().
From("registry.gitlab.com/pipeline-components/yamllint:0.32.1").
WithDirectory("/code", dir).
WithWorkdir("/code").
WithExec([]string{"yamllint", "-c", ".yamllint", "."}).
Stdout(ctx)
}
at least gievs a dir with the correct files. It still triggers a cache miss
@near copper glad you're unblocked! Note that the big advantage of using +ignore in the argument, is specifically when srcDir is uploaded from your local filesystem. The +ignore allows for pre-call filtering, so the engine knows only to upload the files you need. Hence the difference in upload speed. From a caching perspective, and for calls where srcDir is not uploaded from the client, there is no difference between the two methods
So even though I showed you how to do it with Directory.withNewDirectory(include:) I think in your case, using +ignore might be the superior solution, because of the more efficient uploads
Thanks for the suggestion
How would the +ignore work if I wanted this as a function (within a library) rather than as a stand alone module?
I'm thinking my main lint module (function) to do something like:
func (CI) Lint(ctx context.Context, srcDir *dagger.Directory) error {
eg, ctx := errgroup.WithContext(ctx)
eg.Go(func() error { return utils.LintYaml(ctx, dag, srcDir) })
eg.Go(func() error { return utils.LintSpelling(ctx, dag, srcDir) })
eg.Go(func() error { return utils.LintDockerfiles(ctx, dag, srcDir) })
return eg.Wait()
}
In this case, I'd want my main module to have the entire src dir included, and then each util function having the ability to filter out the files that it needs?
Am I over complicating this and should instead have differnt CI jobs that trigger each (pseduo gitlab ci.yaml):
- name: yaml
stage: lint
cmd: dagger call -m ci lint-yaml
- name: docker
stage: lint
cmd: dagger call -m ci lint-docker
- name: spelling
stage: lint
cmd: dagger call -m ci lint-spelling
My primary aim is to have as many cache hits as possible
could / should I re-org my code into a lint module that my main pipeline calls?
(I'm refactoring from using the SDK directly over to dagger modules, so apologies if my head space is a few months out of date)
Heya, I was directed to this thread from a GitHub discussion. I have a pretty similar issue and ended up using both the +ignore directive and the DirectoryWithDirectoryOpts configuration.
However, for some reason, It's not excluding all YAML in my snapshot. In most of the folders, it is, but in my frontend folder in this screenshot, it's still including everything (my node_modules for example).
This is my ignore directive:
// Root directory where all modules/bases are accessible from. (Usually just the $GITHUB_WORKSPACE)
// +optional
// +defaultPath="."
// +ignore=["*", "!*.yaml", "!*.yml"]
rootDir *dagger.Directory,
Hmm perhaps you need to use !**/*.yaml"?
No, the cache is still massive 😦 . My confusion is why the +ignore directive still allows matching files to be uploaded in some Snapshots.
Maybe something to do with this? https://github.com/dagger/dagger/blob/main/engine/buildkit/filesync.go#L73-L83
Not sure about that. Let me explain what I did. I've removed the engine container and the volume to have a clean slate. Then I run the following:
func (m *DaggerExample) Test(
ctx context.Context,
// +optional
// +defaultPath="."
// +ignore=["*", "!**/*.yaml", "!**/*.yml", "!.yamllint"]
srcDir *dagger.Directory,
) *dagger.Container {
dir := srcDir.WithDirectory("filtered", srcDir, dagger.DirectoryWithDirectoryOpts{
Include: []string{"**/*.yml", "**/*.yaml", ".yamllint"},
Exclude: []string{"**/node_modules", ".cache", ".git"},
}).Directory("filtered")
entires, _ := dir.Entries(ctx)
for _, f := range entires {
fmt.Println(f)
}
ctr := dag.Container().
From("alpine").
WithDirectory("/src", dir).
WithWorkdir("/src")
return ctr
}
When calling the func with dagger call test with-exec --args "ls,-lR" stdout, I see only the YAML files from my root dir . and the one inside the frontend folder:
.:
total 4
-rw-r--r-- 2 root root 0 Sep 24 21:57 foo.yaml
-rw-r--r-- 2 root root 0 Sep 24 21:57 foo.yml
drwxr-xr-x 2 root root 4096 Oct 1 16:39 frontend
./frontend:
total 0
-rw-r--r-- 2 root root 0 Oct 1 16:36 frontend.yaml
I guess my question is about the other 45 snapshots on your volume. Do any of them contain non-YAML?
There appears to be files from the alpine filesystem (kind of expected as I'm running an alpine container in the function) and Dagger-specific files (output.json, schema.json).
But most of the numeric folders contain the YAML files (not sure why there are that many folders with the same YAML files tbh)
I think that's my #1 question lol
(1) why are there so many snapshots?
(2) why do some contain files I've explicitily ignored?