Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel: provide pre-built binary toolchain for protoc #19558

Open
alexeagle opened this issue Dec 6, 2024 · 9 comments
Open

Bazel: provide pre-built binary toolchain for protoc #19558

alexeagle opened this issue Dec 6, 2024 · 9 comments
Assignees
Labels

Comments

@alexeagle
Copy link
Contributor

From https://protobuf.dev/news/2024-10-01/#end-goal:

Once the rules are in the protobuf repo, we intend to address common user requests, such as using prebuilts for the proto compiler where possible.

This is that request.

What language does this apply to?
All

Describe the problem you are trying to solve.

All Bazel users are expected to build protoc from source as a cc_binary. This leads to problems which are often reported on the Bazel Slack:

  1. Bazel doesn't include a hermetic C++ toolchain, so the compilation fails for a subset of developers due to the host toolchain on their computer. This can be easily reproduced by registering a non-functional toolchain. Many users have no C++ code, so they have no benefit from dealing with this hermeticity failure other than to repair protoc.
  2. protoc frequently gets recompiled rather than being a cache hit - example report, issue. This makes Bazel builds slow.

Describe the solution you'd like

Bazel's toolchain feature allows it to download the pre-built binaries from protobuf releases

bazelbuild/rules_proto#205 was part of my earlier work to provide this capability. The ruleset mirrors its own integrity hashes as part of each release.

https://github.com/aspect-build/toolchains_protoc/ is a user-land implementation of this proposal, however it was broken by changes in Bazel 8 and rules_proto described by https://protobuf.dev/news/2024-10-01/.

Additional context
Some user reports:

I describe some GitHub Actions workflows for automating this pattern on https://blog.aspect.build/releasing-bazel-rulesets-rust (and the earlier https://blog.aspect.build/releasing-bazel-rulesets)

@alexeagle alexeagle added the untriaged auto added to all issues by default when created. label Dec 6, 2024
@shaod2 shaod2 self-assigned this Dec 9, 2024
@shaod2 shaod2 added protoc and removed untriaged auto added to all issues by default when created. labels Dec 9, 2024
@shaod2
Copy link
Member

shaod2 commented Dec 9, 2024

Yes, we have changes pending to provide prebuilt protocs in the open source realm. I don't have the exact timeline tho; currently expecting for H1 2025

@vorburger
Copy link
Contributor

Having to constantly rebuild protoc from scratch everywhere causes all sorts of weird surprises; e.g. grpc/grpc-java#11790!

It also makes builds more "heavy" than they could otherwise be, which is a problem in certain more "light" (lite?) build environments; note e.g. jitpack/jitpack.io#3129.

It would be really cool if using Protocol Buffers with Bazel would no longer require building protoc.

@mering
Copy link
Contributor

mering commented Dec 30, 2024

In setups with RBE available, this is not a problem since it is mostly just downloaded from the remote cache anyways. In fact, in such setups it is usually preferred to build from source instead of downloading random binaries from the internet for compliance reasons. So building from source should definitely be kept as an option and downloading pre built binaries should be optional.

@vorburger
Copy link
Contributor

In setups with RBE available, this is not a problem since it is mostly just downloaded from the remote cache anyways.

Sure, but not every user of Protobuf of Bazel has RBE set-up for every project.

In fact, in such setups it is usually preferred to build from source instead of downloading random binaries from the internet for compliance reasons.

This can of course easily be solved with some sort of checksum / hash that's verified on the download; à la http_archive's sha256 or the HTML's Subresource Integrity (SRI) or that ?hl= "standard" idea for Cryptographic Hyperlinks from draft-sporny-hashlink-07 (what a shame that never gained more widespread traction) et al.

So building from source should definitely be kept as an option and downloading pre built binaries should be optional.

An "opt in" flag (?) would already be a huge progress over the current situation!

@mering
Copy link
Contributor

mering commented Dec 31, 2024

This can of course easily be solved with some sort of checksum / hash that's verified on the download; à la http_archive's sha256 or the HTML's Subresource Integrity (SRI) or that ?hl= "standard" idea for Cryptographic Hyperlinks from draft-sporny-hashlink-07 (what a shame that never gained more widespread traction) et al.

This is only a small part of the story. How do you verify that the binary doesn't contain malicious code? You need reproducible builds first. Then you need to build from source and verify that the binary you download is actually the artifact built from the source you are expecting. You need to do this for every version as inspecting binary diffs is not handy. When you build from source, you only need to check the source diff which is much easier to review.
Dependency attacks are a thing. This is how companies get hacked.

@alexeagle
Copy link
Contributor Author

@mering yes, supply chain security is an important consideration here, both under Bazel and any other build system.

I think you're pointing out #16165 again - since protoc downloads aren't provided along with a checksum, users have to compute one themselves. Any Bazel rules that fetch should include a checksum - this is true whether they fetch sources and then compile them, or a binary that's compiled by someone else. https://github.com/aspect-build/toolchains_protoc/blob/main/protoc/private/versions.bzl#L54 for example.

Then yes, it would be nice to have a proof of provenance, via some attestation published on protoc releases. We are adding these right now to modules on the BCR. It will be trivial for tools like tar where we use GitHub Actions and they have a feature for this.

setups with RBE available, this is not a problem

This isn't true for most Bazel users, since they ship Macs to their developers and the remote cache contains Linux binaries only.

@mering
Copy link
Contributor

mering commented Jan 7, 2025

@mering yes, supply chain security is an important consideration here, both under Bazel and any other build system.

I think you're pointing out #16165 again - since protoc downloads aren't provided along with a checksum, users have to compute one themselves. Any Bazel rules that fetch should include a checksum - this is true whether they fetch sources and then compile them, or a binary that's compiled by someone else. https://github.com/aspect-build/toolchains_protoc/blob/main/protoc/private/versions.bzl#L54 for example.

Then yes, it would be nice to have a proof of provenance, via some attestation published on protoc releases. We are adding these right now to modules on the BCR. It will be trivial for tools like tar where we use GitHub Actions and they have a feature for this.

This requires trusting whoever is specifying the checksums. If you don't want to (or are not allowed to) blindly trust someone providing correct checksums but reviewing the code yourself, this is much easier when you build source code instead of comparing binaries (which usually also requires building the code in the first place).

setups with RBE available, this is not a problem

This isn't true for most Bazel users, since they ship Macs to their developers and the remote cache contains Linux binaries only.

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

@michaelschuett-tomtom
Copy link

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

Most of the time before bazel is even introduced to the environment you have existing machines and at least for the last 10 years of my career that has been macs. I personally would like a linux machine the same as my deploy environment however it is just not the reality for most workplaces.

@mering
Copy link
Contributor

mering commented Jan 7, 2025

Why are they not just using Linux for development in the first place if this is what they test and ship with their CI/CD? It's usually a bad idea to test something different from what you ship...

Most of the time before bazel is even introduced to the environment you have existing machines and at least for the last 10 years of my career that has been macs. I personally would like a linux machine the same as my deploy environment however it is just not the reality for most workplaces.

Interesting. In the various companies I worked across different industries, I have seen more Linux machines than Macs in the past 15 years (mostly without using Bazel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants