-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws dynamodb update-item adding unnecessary base64 encoding to binary attributes #1097
Comments
Thanks for the detailed writeup. I can see what's going on here. This is due to our generic processing for binary/blob types. Just to give more background here, whenever something is modeled as a blob type, we automatically base64 encode the input value. This is a general construct. It applies to any input for any service/operation that has binary input. However on output, we don't automatically base64 decode values. This is because we don't want to write arbitrary binary data to stdout. In the case of top level parameters where you can input binary content directly (via I'll discuss this with the team and update when we have something to propose. Thanks for bringing this to our attention. |
Thanks for the update, and also for providing such a great tool in the first place. :) Look forward to seeing this fixed, and let me know if there's anything else I can do to help. |
Any news on this issue? |
I believe this issue should be assigned a higher priority than it currently is, since using using this CLI to interact with DynamoDB on EC2 instances is one of its most fundamental features. We are suffering from this exact same issue here, too. Note that this Base64 issue does not only manifests in |
@asieira, using the term "gentlemen" to address developers is presumptuous, sexist, and unappreciated. |
by default when you put_item to dynamodb using binary data type, it will do base64. So i think there're two options there. `import boto3 note: b'str' is not encoded string, is original string.str=b'test!' response = client.put_item(TableName='testbug',Item=item) 2: use data type "S", which is String, and aws won't do extra base64 for your data. |
This behaviour is really weird. Like this one liner for copying all items from one dynamodb table to another will fail because of that extra base64 encoding on binary data.
There should be a way of disabling that when data is coming from json. Is there any plan to address this? |
I have just hit this issue as well. Can any of the devs on here please comment on whether or not this is going to be addressed? |
This is still an issue. Please give me the option to not re-encode data I queried directly from DynamoDB |
I am affected by this too. Is there a fix/workaround? |
Could you please add some flag to the command to disable the default encoding behaviour when necessary? |
The only workaround I found is to use boto3. If you have AWS CLI, then you have Python along with boto3 already installed. Then you can have one-liner |
It's worth noting that this is not a problem w/ the 2.x awscli. |
Even I hit upon this issue now. Is there an existing solution/workaround using boto3? |
We experience the same issue with |
This commit partially reverts db4a344; specifically, it disables the writing of compressed config and reverts back to writing JSON config as before. This is being reverted due to an issue noticed during testing. It seems that the written items are ending up with *two* layers of base64 encoding, which is not intended. The boto docs[1] use base64 strings as example arguments, giving the impression the caller is expected to take care of base64 encoding, but in fact botocore internally does the encoding; if the client also encodes, we end up with two layers of encoding. There is also a bug filed relating to this[2]. The code here still seems to "work" since the same mistake is made on both the writing and reading end, but the goal is to make the config smaller and having double-encoding works against that. It should be easy enough to fix, but I'd like some time to confirm my understanding of how it works, check whether exodus-lambda needs a fix and also check whether localstack and AWS are behaving the same. Hence I'll revert this for now and keep writing the old style of config. [1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/put_item.html [2] aws/aws-cli#1097
This commit partially reverts db4a344; specifically, it disables the writing of compressed config and reverts back to writing JSON config as before. This is being reverted due to an issue noticed during testing. It seems that the written items are ending up with *two* layers of base64 encoding, which is not intended. The boto docs[1] use base64 strings as example arguments, giving the impression the caller is expected to take care of base64 encoding, but in fact botocore internally does the encoding; if the client also encodes, we end up with two layers of encoding. There is also a bug filed relating to this[2]. The code here still seems to "work" since the same mistake is made on both the writing and reading end, but the goal is to make the config smaller and having double-encoding works against that. It should be easy enough to fix, but I'd like some time to confirm my understanding of how it works, check whether exodus-lambda needs a fix and also check whether localstack and AWS are behaving the same. Hence I'll revert this for now and keep writing the old style of config. [1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/put_item.html [2] aws/aws-cli#1097
Thanks all for your patience. It was noted here earlier that this is not an issue in v2 of the AWS CLI. We highly recommend migrating to use v2 of the AWS CLI, please refer to these instructions for migrating to v2: https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration-instructions.html |
This issue is now closed. Comments on closed issues are hard for our team to see. |
It seems to me that
dynamodb update-item
is adding an additional and unnecessary layer of base64 encoding to binary attribute values.I'm using a table with a string hash key called
key
(very original) and a string range key calledversion_part
.Let's follow this example. I start by adding data that contains a binary field with the UTF-8 string "test!":
These are the contents of the key file:
These are the contents of the attribute-updates file:
Notice the value of the
data
attribute, it is the base 64 representation of the stringtest!
as confirmed by the command-line toolbase64
:So the operation succeeds, and I proceed to read the string back.
This is what the request-items file contains:
This is what the JSON output is:
Notice how the value of
data
does not match what was written, which was the result ofbase64encode('test!')
. In fact, it is the equivalent ofbase64encode(base64encode("test!"))
as this simple test confirms:So it seems that the update-item operation is incorrectly assuming that a binary attribute value is not yet base 64 encoded and is applying an additional layer of base64 encoding. In fact, binary values must already be base64 encoded, or it wouldn't be possible to represent them properly in a JSON format in the first place.
All of this was tested on Max OS X Mavericks:
The text was updated successfully, but these errors were encountered: