Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and array of compression formats #29

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

joestump
Copy link

@joestump joestump commented Jul 20, 2018

What does this PR do?

This is a pretty invasive PR. I've got the basics working to support all sorts of compression formats using the archiver package by @mholt. Due to the fact that the archiver package detects the compression format from file path, the type attribute has been marked as Deprecated. Compression format is now detected from output_path.

Supported formats/extensions:

  • .zip
  • .tar
  • .tar.gz & .tgz
  • .tar.bz2 & .tbz2
  • .tar.xz & .txz
  • .tar.lz4 & .tlz4
  • .tar.sz & .tsz

Why is this change being proposed?

I'm a maintainer for the Heroku Terraform provider and I'd like to create the ability to deploy a Heroku "slug" from Terraform. A slug is just a .tgz file with an app/ directory in it. I wanted to avoid null_resource with a local-exec provisioner and felt adding more compression formats to archive_file was probably the most correct answer.

Once I dug into the internals I saw that the way the internals were structured was pretty close to how the archiver package worked, except the archiver package supports all sorts of goodies.

What's left?

I need to flesh out source_content, source_content_file, and source blocks. As it stands, archiver only supports archiving a list of files/directories. I'm thinking of using ioutil.TempDir to write the contents out to a temporary directory before passing the file list off to archiver.

Maybe update to dep? I switched the Heroku provider to it and like it quite a bit more than govend and govendor.

TODO

  • Documentation
  • Haven't ran tests outside of an ad-hoc HCL file
  • Flesh out source_content and source blocks
  • Add new dependencies to vendor/

@mholt
Copy link

mholt commented Jul 20, 2018

The archiver library does check file headers, at least for some formats: https://github.com/mholt/archiver/blob/e4ef56d48eb029648b0e895bb0b6a393ef0829c3/tar.go#L24

@apparentlymart
Copy link
Contributor

Thanks for working on this, @joestump!

I feel a little conflicted about this, to be honest. The archive_file data source was originally added primarily to work around the fact that AWS Lambda at launch did not have a good way to pass in provision-time values without embedding them in the source zip file. Until that point, we had always maintained the position that deployment artifacts ought to be constructed in a build step prior to running Terraform, and that Terraform itself should not be used to construct such artifacts.

AWS Lambda eventually added support for passing environment variables, which then gave us the missing feature we needed that is comparable to an aws_instance's user_data, or heroku_app's config_vars. That then removed the need from archive_file altogether as far as we were concerned.

Unfortunately, in the mean time some users had begun using archive_file to build the entire zip file to deploy to Lambda, regardless of whether any Terraform-generated files were present in the archive, and so we've ended up retaining it to avoid breaking those users.

We have precedent for Terraform provider features that allow Terraform to step slightly into "build phase" use-cases even though we don't consider that in Terraform's core scope, such as aws_ami, because we want to be pragmatic and support users who have slightly unusual needs, but this is a tricky tradeoff because such resources can potentially mislead users about what Terraform's is for and what sorts of things are best achieved with other tools.

Ordinarily I'd be happy to retain the archive_file feature for the same reason that we have aws_ami, but there are some further extenuating circumstances here: this data source does not obey the usual rules for data sources because it has a side-effect of writing a local file to disk. In practice this leads to a number of surprising consequences, such as complicating the ability to run plan and apply on separate machines, and having no way to clean up the effectively-temporary file after Terraform is finished running.

Given that this data source is already in a rather troublesome state, I have reservations about expanding its scope to any additional use-cases, particularly if that would involve making a breaking change to its interface.

One way to address this would be to export the archive data in a base64-encoded attribute as discussed in #27, but that would be unsuitable for any archives of any non-trivial size since it would cause the entire file to be loaded into memory. That would also require switching back to having an explicit type argument, because there would no longer be an output filename to derive that from. It also seems like this archiver package is built to work with files on disk and so probably wouldn't be the best choice for in-memory operations.

So with that all said, I find myself leaning towards restricting current scope of this data source just to compatibility with this emergent AWS Lambda use-case, even though we don't recommend it. We may even eventually explicitly deprecate it, though that's not a decision to be made lightly today.

With no other changes, I assume this would require Heroku slugs to be built and uploaded to Heroku by a prior build step, and the id passed in to the Terraform configuration for provisioning. That would be the approach we'd generally recommend, but if you think an analog to aws_ami for Heroku is valuable then I'd lean towards having a specialized resource within the Heroku provider which could build the necessary archive and upload it to Heroku all in a single action, avoiding the need for it to be passed around Terraform expressions in-memory or leave a temporary file behind on disk. For example:

resource "heroku_slug" "example" {
  source_dir = "${path.module}/heroku_slug"
}

resource "heroku_app_release" "foobar-release" {
    app = "${heroku_app.example.name}"
    slug_id = "${heroku_slug.example.id}"
}

Since this would be specialized to the Heroku provider, it can abstract away the details of exactly what file format is required, which feels to me like a smoother user-experience anyway.

(If we do eventually decide to deprecate archive_file for the AWS Lambda use-case, I expect we'd take a similar approach there and have an AWS-provider-specific resource that encapsulates the fact that the source code needs to be provided in a zip file and instead attempts to emulate the abstraction of the Lambda Web UI where you can just add individual source files directly.)

@hashicorp-cla
Copy link

hashicorp-cla commented Nov 22, 2020

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


Joe Stump seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA.
If you have already a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

@Jasper-Ben
Copy link

Jasper-Ben commented Apr 6, 2022

Just adding another usecase for this.

We want to do the following within a single terraform apply while avoiding using the local-exec provisioner:

  1. create a aws ecr repository
  2. create a s3 bucket
  3. upload container image source code to s3 bucket
  4. build container image using kaniko as a kubernetes job within our Kubernetes cluster. the source code in s3 has to be provided as tar.gz, see https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts
  5. push the built image to the newly created ecr repository
  6. use the image when creating a kubernetes deployment/job/whatever resource
  7. profit 😉

@jkroepke jkroepke mentioned this pull request Nov 22, 2023
@jkroepke
Copy link
Contributor

In case, you are looking for tar.gz, take a look here and vote for it: #277

@mholt
Copy link

mholt commented Mar 12, 2024

FYI this branch should be updated to use Archiver v4: "github.com/mholt/archiver/v4" -- which has much better APIs for streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants