#TF Cache
1 messages · Page 1 of 1 (latest)
So that's creating an explicit buildkit cache to store the providers between runs and speedup subsequent calls to init, a bit like if you did RUN --mount=type=cache in a Dockerfile. The ID is a label for the cache volume, so that the same cache gets mounted between runs and set as the TF_PLUGIN_CACHE_DIR. IIRC the .terraform directory (i.e. TF_DATA_DIT) will then symlink plugins from the cache. The
lol I think I’m in over my head
Isn’t the dagger eng supposed to cache the image as a whole? Is this just like an additional cache in the event providers change?
Idk I have a simple main.tf just trying to spit out an output variable right now and I can’t even get the plan to run. I can send my code, but it’s definitely trash lol. Do you have like a full working example of terrraform plan/apply?
The exact mechanism of dagger caching still eludes me, if I'm quite honest. But my reasoning here is:
- I don't want to cache the full image, as the credentials I'm passing in are temporary. That's why the "cachebuster" step is there.
- This ensures the providers are cached even if I'm e.g loading in a different main.tf that uses the same providers.
But there could be some cargo cult logic there I'm more than willing to admit. If someone from the dagger team could check I'd be grateful!
Hmm, let me see what I can share.
Yeah that makes sense, my idea was to have the image with all the tools cached but env variables and commands would be ran with dynamical values…but I am by no means a developer, just how I basically operate through the CLI when running terraform lol
A bit of clarity
Is this just like an additional cache in the event providers change?
Almost, if your actual tf changes (or any of your files that you've put in the container) then the entire cache is busted for the operations. The cache volume persists regardless of the inputs, so it's perfect for caching dependencies, or providers in this case, so they don't have to be redownloaded
With terraform, you probably never want your actual commands to be cached since they rely on remote state (you want terraform to actually execute and check the status of resources), so cache volumes are really useful here to make the setup faster
Tbh I’m stumped, the way I understand terraform init pulls down all the providers and only pulls them down/purges again if there has been a change to them or modules etc. so the init command dynamically handles the cache itself no?
Kind of. For simplicity let's say you have 1 terraform file, your main.tf, and that gets put into the container and then you terraform init. If that main.tf never changes, the init and all following commands (up to a cache busting env var if you're doing that) will be cached.
When the main.tf inevitably changes, that will bust all of the cache. So without a volume, terraform init would need to redownload all providers. With a volume, it can have those providers persisted and when init runs it can determine what needs to be downloaded or not.
Does that make more sense?
Here's a case where I use the function from the gist above to create my TF image, and then run some commands in it:
tf_image = await create_terraform_image(client, b3_session)
for tf_identifier, resource_id in generate_lambda_imports():
try:
await (
tf_image
.with_exec(['-chdir=/terraform', 'import',
tf_identifier, resource_id])
.sync()
)
logger.info(
f"[green]Imported {tf_identifier} {resource_id}![/]"
)
except dagger.ExecError as err:
result = err.stderr
if 'already managed' in result:
logger.info(
f"[yellow]{tf_identifier} already imported!"
)
else:
logger.error(
f"[bold red]Caught error importing {tf_identifier}: " +
f"[/]\n {result}"
)
Kinda, I guess I have to just get a working state for me to test with and without the volume cache to understand the flow more. I just feel like init already handles the cache logic
Thanks, I’ll try to put this into my code and see if I can get it to run. Appreciate the help to everyone on here! I’m just a break fix type of guy, and reverse engineer stuff to learn it, so the limited examples have been a learning curve for me haha
I guess a question I have never quite got is; what gets cached and when? Is each function call a "layer" that gets cached separately? Or is it an overall result? So would the init step be appropriately cached (as long as main didn't change) if we called sync at that step?
Oh I think I got it now, it doesn’t get cached because the init is a sub command more or less from the build command(s) of the image? So that’s why you’re forcing dagger to update the cache when init updates?
Sorry I’m really struggling to understand this since it’s referencing the generate_lambda_imports function that I don’t see.
Really maybe dagger just isn’t built for using terraform the simple way? Like I want to be able to just clone/transfer terraform files into the container and then run any terraform I pass into a python function. Can’t seem to see anyone using this pipeline in that fashion
Apologies; that's just a generator that yields some extra string variables I subsitute in to my terraform command.
What you want to do sounds 100% reasonable; the cache discussion maybe derailed things because it's about optimising things to get the best of dagger / buildkit. But what I do in code is functionally:
- Create my base terraform image using
tf_image = await create_terraform_image(). This has theterraformfolder from my project baked in, plus some extras I need like the git socket for fetching terraform modules - Tell dagger I want to run commands against the image using
with_exec, and then trigger the execution usingsyncorstdoutor a similar method.
So, using my function from the gist, you could do:
async def run_arbitrary_tf_commands(*tf_args):
tf_image = create_terraform_image()
try:
await (
tf_image
.with_exec(["--chdir=/terraform", # chdir is here because my create_terraform_image function mounts a 'terraform' folder in the container at `/terraform`
*tf_args]) #These are a list of arguments, much like you would pass to subprocess.run
.sync()
)
except dagger.ExecError as err: #This exception is raised if the 'with_exec' step returns a non-zero exit code
logger.error(f'Terraform errored: \n{err.stderr}')
Is that helpful at all?
Here's another example where I run either a plan or an apply against my terraform, again just using the same base image from the gist...
async def execute_terraform_deploy(config: dagger.Config,
command: Literal['plan', 'apply']) -> bool:
async with dagger.Connection(config) as client:
tf_args = ['-chdir=/terraform', command]
tf_image = (
await create_terraform_image(client, aws_session)
)
if command == 'apply':
tf_args.append('-auto-approve')
logger.info(f'Executing terraform {command}')
try:
result = await (
tf_image
.with_exec(tf_args)
.stdout()
)
print(result)
except dagger.ExecError as err:
rich.print("[bold red]Error running terraform:\n" +
f"{err}")
return False
return True
Ok the need for cache has come lol. Now I need the ability to leverage multiple different tools in my container. Instead of Ubuntu and installing everything from scratch, what I want to do is use a cache that has terraform,vault, awscli, and whatever tools there, then I can just mount that cache to the ubuntu container and run subsequent terraform commands like init etc without doing a full build. Any idea how I can tackle that?