Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental github actions cache cannot communicate with enterprise github server using AWS S3 as backing storage #21764

Open
DLukeNelson opened this issue Dec 16, 2024 · 13 comments · May be fixed by #21835
Labels

Comments

@DLukeNelson
Copy link
Contributor

Describe the bug
My CI runs have a number of errors in the logs indicating:

2024-12-13T23:21:23.8756542Z 23:21:23.87 �[33m[WARN]�[0m Failed to read from remote cache (1 occurrences so far): failed to read pants_ci_cache/action-cache/24/03/24038b6a590014ebfe65e4e2c860fd89bdb44932ae9de388522776a7b9a78935: Unexpected (persistent) at read, context: { uri: https://ghe-actions-prod-qhqyjglk.s3.amazonaws.com/actions-69c8c8939b70/9bb02394953d4d45a28a6ccad6554933/28b64a3ef36b141086d40095c0eef846?AWSAccessKeyId=AKIAQ3EGVTWOBBE5CS57&Expires=1734135684&Signature=icOXbi0Y%2FMIoIvBUvVTtJwn3z5k%3D, response: Parts { status: 400, version: HTTP/1.1, headers: {"x-amz-request-id": "21TP2FZZWN6QPB02", "x-amz-id-2": "oEQXgcqlyaknVIamKBLcouhA+2IrySLl8LXOIkTPBJgqDO98sJCwB+ehDpbM38D48J0LSd04lvg=", "x-amz-region": "us-east-1", "content-type": "application/xml", "transfer-encoding": "chunked", "date": "Fri, 13 Dec 2024 23:21:23 GMT", "connection": "close", "server": "AmazonS3"} }, service: ghac, path: pants_ci_cache/action-cache/24/03/24038b6a590014ebfe65e4e2c860fd89bdb44932ae9de388522776a7b9a78935, range: 0- } => <?xml version="1.0" encoding="UTF-8"?>
2024-12-13T23:21:23.8763324Z <Error><Code>InvalidArgument</Code><Message>Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>21TP2FZZWN6QPB02</RequestId><HostId>oEQXgcqlyaknVIamKBLcouhA+2IrySLl8LXOIkTPBJgqDO98sJCwB+ehDpbM38D48J0LSd04lvg=</HostId></Error>

The stats output indicates that there were some successful reads, but the test output indicates that all tests were run (not pulled from cache). Also, remote_cache_total_time_saved_ms: 0, so the cache seems to be not working, despite a few successes recorded in the stats.
Pants version
2.22

OS
Linux

Additional info
Googling the error message led me to https://stackoverflow.com/questions/74293491/requests-specifying-server-side-encryption-with-aws-kms-managed-keys-require-aws which indicates that this could be solved (if the erroneous request originated in python/boto3) by just adding an explicit configuration line. I couldn't find as much info on Rust/openDAL, but I would expect the fix to be similar.

I did search into openDAL a bit, and found that it has a Ghac backend (which pants is using), and that this backend already makes some consideration for GHES backed by AWS S3, so it seems that the issue may be related to the particular configuration used by my org.

Searching openDAL source indicates that it has implemented AWSV4Signer (https://github.com/search?q=repo%3Aapache%2Fopendal+AWSV4Signer&type=code). It was not clear to me how to inject this signer into the GHAC backend.

@DLukeNelson
Copy link
Contributor Author

pants.ci.toml

[GLOBAL]
colors = true
remote_provider = "experimental-github-actions-cache"
remote_cache_read = true
remote_cache_write = true
remote_instance_name = "pants_ci_cache"

[stats]
log = true

@huonw
Copy link
Contributor

huonw commented Dec 16, 2024

Thanks for filing an issue. It looks like apache/opendal#3985 is a fix for S3-backed GHAC, but that only landed in opendal 0.44.2. Pants currently uses 0.41.0.

So, maybe a fix here is to update the version of OpenDAL. @DLukeNelson, would you like to contribute that fix?

@DLukeNelson
Copy link
Contributor Author

I'd love to! I'll open up a PR right away

@DLukeNelson
Copy link
Contributor Author

#21779

@DLukeNelson
Copy link
Contributor Author

@huonw Unfortunately, our above changes were not enough. The new error I'm seeing in CI is like this:

Failed to read from remote cache (1 occurrences so far): failed to read pants_ci_cache/action-cache/1f/62/1f6292e82842e0bc4146825c85f31299878c1c05bf26ee2fa15fea4764bdfab8: Unexpected (temporary) at read, context: { url: https://<github_url>/_services/artifactcache/ocrHE8ZScyxq8pTiT1IyKRHNwdDtYL7HOvJRB2oXuk7JCGeFXS/_apis/artifactcache/cache?keys=pants_ci_cache/action-cache/1f/62/1f6292e82842e0bc4146825c85f31299878c1c05bf26ee2fa15fea4764bdfab8&version=pants-1, called: http_util::Client::send, service: ghac, path: pants_ci_cache/action-cache/1f/62/1f6292e82842e0bc4146825c85f31299878c1c05bf26ee2fa15fea4764bdfab8, range: 0- } => send http request, source: error sending request for url (https://<github_url>/_services/artifactcache/ocrHE8ZScyxq8pTiT1IyKRHNwdDtYL7HOvJRB2oXuk7JCGeFXS/_apis/artifactcache/cache?keys=pants_ci_cache/action-cache/1f/62/1f6292e82842e0bc4146825c85f31299878c1c05bf26ee2fa15fea4764bdfab8&version=pants-1): error trying to connect: invalid URL, scheme is not http

any thoughts?

@huonw
Copy link
Contributor

huonw commented Jan 14, 2025

Hm, that error message looks like it comes from the hyper library (https://github.com/hyperium/hyper-util/blob/5f055e36d6f945b51b4d4b023290f551ae7eb4b6/src/client/legacy/connect/http.rs#L392), which is used by OpenDAL (Pants -> OpenDAL -> Reqwest -> Hyper). It looks like the default setting of that library is to only support HTTP, no HTTPS.

Maybe this is a bug in OpenDAL and its usage of Reqwest?

@DLukeNelson
Copy link
Contributor Author

I did find this: apache/opendal#5205
which seems to be the same set of errors that we have. (and we too have disabled all default features from opendal. I'm not sure I understand the ask in that thread though. (to enable reqwest's https support).

Pants' Cargo.toml does have reqwest = { version = "0.12", default-features = false, features = ["stream", "rustls-tls"] } which seems like tls/https ought to be enabled.

@Xuanwo
Copy link
Contributor

Xuanwo commented Jan 14, 2025

Hi, pants is using opendal 0.45.1 now: https://github.com/pantsbuild/pants/blob/b9587acfb8c2f6323cd00b4ec88e2a487a758114/src/rust/engine/Cargo.toml#L278C1-L282

But opendal 0.45.1 is relaying on reqwest 0.11 instead of 0.12: https://crates.io/crates/opendal/0.45.1/dependencies

So specify reqwest = { version = "0.12", default-features = false, features = ["stream", "rustls-tls"] } doesn't work.

To get this fixed, we should either:

  • upgrade opendal to latest (recommanded)
  • specify reqwest 0.11 in cargo.toml

@DLukeNelson
Copy link
Contributor Author

I'm unsure if pants may be relying on reqwest for any other reason, so its probably best to upgrade opendal to latest. (The earlier PR in this thread attempted this, but ran into some trouble with breaking changes, so I cut the scope and only upgraded to 0.45.1)

Since we now know that this is insufficient, we ought to fully upgrade. @huonw do you agree?

@huonw huonw linked a pull request Jan 14, 2025 that will close this issue
@huonw
Copy link
Contributor

huonw commented Jan 14, 2025

Thanks for the analysis @Xuanwo, very helpful! (And thanks for maintaining OpenDAL!)

I've posted #21835 with the full upgrade, let's see what CI thinks of it!

upgrade opendal to latest (recommanded)

As a bit of feedback from your users, we initially tried this in #21779, but found it was quite difficult due to breaking changes with non-obvious resolutions (e.g. not mentioned or explained in the changelog).

For instance, I ended up having to do a full git bisect on the OpenDAL code itself to work out what was causing a behaviour change and thus deduce a resolution (#21779 (review)).

@Xuanwo
Copy link
Contributor

Xuanwo commented Jan 15, 2025

As a bit of feedback from your users, we initially tried this in #21779, but found it was quite difficult due to breaking changes with non-obvious resolutions (e.g. not mentioned or explained in the changelog).

Hi, @huonw, sorry about that. I tried my best to split the changes across several versions and implement them gradually, but it becomes frustrating when multiple breaking changes appear simultaneously. We’ve maintained an upgrade guide for each version, which I believe will be helpful: https://docs.rs/opendal/latest/opendal/docs/upgrade/index.html.

I’m more than happy to help address the breaking changes introduced by OpenDAL. I expect that users who only use the public API (not touching the opendal::raw mods) should find it easy to upgrade. This is also an opportunity for me to validate my design and the upgrade path.

@huonw
Copy link
Contributor

huonw commented Jan 15, 2025

Ah cool, I had not seen that page, nor had I noticed there's more specific upgrade contents available in the GitHub releases: https://github.com/apache/opendal/releases

Is it easy to have each section in the CHANGELOG file include a bit of text like See [the GitHub release](...) for how to upgrade or similar? (Or some link to the upgrade page too)

I’m more than happy to help address the breaking changes introduced by OpenDAL

All good, I think I've resolved them in #21835.

@Xuanwo
Copy link
Contributor

Xuanwo commented Jan 15, 2025

Ah cool, I had not seen that page, nor had I noticed there's more specific upgrade contents available in the GitHub releases: apache/opendal/releases

Is it easy to have each section in the CHANGELOG file include a bit of text like See [the GitHub release](...) for how to upgrade or similar? (Or some link to the upgrade page too)

Sure, I'll ensure they're included in the changelog.

All good, I think I've resolved them in #21835.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants