Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SDK doesn't use gzip compression #3105

Open
andrejlevkovitch opened this issue Sep 10, 2024 · 7 comments
Open

The SDK doesn't use gzip compression #3105

andrejlevkovitch opened this issue Sep 10, 2024 · 7 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@andrejlevkovitch
Copy link

Describe the bug

Despite the claims in documentation aws sdk doesn't perform gzip compression under any circumstances

Expected Behavior

So, it is expected that sdk will compress request data according to client configuration. However, I want to also clarify that this configuration lacks of list of allowed content types for compression - as we don't want to compress files like jpg or mp4

Current Behavior

Currently if you try to upload file with TransferManager or S3 client - no compression will happen. You also can try to compress the file manually and after that upload it with TransferManager or S3 client, but it will not work properly. First of all, there are no options in TransferManager to provide content-encoding, so it will not work obviously. Second of all, S3 client also will not work properly, because signer will rewrite content-encoding, unless you changed checksum algorithm to NOT_SET and use md5 instead.

Reproduction Steps

Just use default config for TransferManager and try to upload some big json file (bigger then 10 Kb!) and check GetBytesTransfered() for that operation - it will be equal (or a bit bigger) to size of the file. So no compression applied

Possible Solution

So, first of all we should enable the compression - the line returns false always.

But even if we change it explicitly to GZIP - it still will not work, because during the compression it still left previous content-length and doesn't set decoded content-length, that is required for backend service for uncompression. So, lets add it (and remove content-length - proper one will be set later by AddContentBodyToRequest).

But even after that it will not work, as for calculate checksum it still uses the old body, not compressed one. So, let change that too and use actual body (from httpRequest)

But it still will not work, because when we calculate the hash we expect that stream will be good, what can be not the case, especially after some reading and moving cursor back-and-forth. So, we should explicitly clear the stream, before start reading.

And only now, if you explicitly set checksum algorithm as NOT_SET and enable computing md5, compression will finally work

Additional Information/Context

No response

AWS CPP SDK version used

1.11.398

Compiler and Version used

gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

Operating System and version

Ubuntu 18.04.6 LTS

@andrejlevkovitch andrejlevkovitch added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2024
@DmitriyMusatkin
Copy link
Contributor

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage. AFAIK Cloudwatch is the only service the SDK currently supports compression for.

With that said, overriding the content encoding when aws-chunked is used seems like a bug and it should be appending instead.

@jmklix jmklix added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2024
@andrejlevkovitch
Copy link
Author

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage

hm, are you sure? We actually upload gzip files in that way and we have unpacked files on s3

@andrejlevkovitch
Copy link
Author

sorry, yeah, you are correct - the s3 doesn't decompress data. It just handles it in a nice way, so from user perspective difference is not visible, but yeah, data stored compressed

@jmklix
Copy link
Member

jmklix commented Oct 4, 2024

@andrejlevkovitch did you have any other questions about compression and this sdk?

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Oct 4, 2024
@andrejlevkovitch
Copy link
Author

did you have any other questions about compression and this sdk?

nope, except the problems related to SDK I already described:

  1. rewriting of Content-Encoding
  2. gzip compressing doesn't work correctly if requested (Content-Length not changed after compression, incorrect checksum calculation after compression)
  3. small bug in hash calculation (clearing of the stream is needed before using it)
  4. compression params are never used, despite they are present in the api. Also that params are not flexible - they can be set for all files types only. IMHO, better to set it per Content-Type (so you do not need to compress video, but for json file it will be useful)

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Oct 6, 2024
@jmklix
Copy link
Member

jmklix commented Nov 12, 2024

Are you still expecting the sdk/s3 to compress your data? Your most recent questions seem to indicate that you think the sdk should be doing some type of compression but this is not done. None of the compression params will be used as @DmitriyMusatkin mentioned.

@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 12, 2024
@andrejlevkovitch
Copy link
Author

@jmklix

  1. there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)
  2. there are RequestCompressionConfig in ClientConfiguration that is enabled by default and never used - that is confusing at least

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants