Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws dynamodb update-item adding unnecessary base64 encoding to binary attributes #1097

Closed
asieira opened this issue Jan 14, 2015 · 17 comments
Closed
Labels
bug This issue is a bug. dynamodb p3 This is a minor priority issue v1

Comments

@asieira
Copy link
Contributor

asieira commented Jan 14, 2015

It seems to me that dynamodb update-item is adding an additional and unnecessary layer of base64 encoding to binary attribute values.

I'm using a table with a string hash key called key (very original) and a string range key called version_part.

Let's follow this example. I start by adding data that contains a binary field with the UTF-8 string "test!":

aws --no-paginate --output json --region '<redacted>' dynamodb update-item --table-name '<redacted>' --return-values 'UPDATED_OLD' --key 'file:///<redacted>.tmp' --attribute-updates 'file:///<redacted>.tmp'

These are the contents of the key file:

{  
   "key":{  
      "S":"test"
   },
   "version_part":{  
      "S":"index"
   }
}

These are the contents of the attribute-updates file:

{  
   "version":{  
      "Action":"ADD",
      "Value":{  
         "N":"1"
      }
   },
   "timestamp":{  
      "Action":"PUT",
      "Value":{  
         "S":"20150114 17:19:09"
      }
   },
   "parts":{  
      "Action":"PUT",
      "Value":{  
         "N":"0"
      }
   },
   "data":{  
      "Action":"PUT",
      "Value":{  
         "B":"dGVzdCE="
      }
   },
   "compress":{  
      "Action":"PUT",
      "Value":{  
         "S":"none"
      }
   }
}

Notice the value of the data attribute, it is the base 64 representation of the string test! as confirmed by the command-line tool base64:

| => echo dGVzdCE= | base64 -D
test!

So the operation succeeds, and I proceed to read the string back.

aws --no-paginate --output json --region '<redacted>' dynamodb batch-get-item --request-items 'file:///<redacted>.tmp'

This is what the request-items file contains:

{  
   "<redacted>":{  
      "ConsistentRead":false,
      "Keys":[  
         {  
            "key":{  
               "S":"test"
            },
            "version_part":{  
               "S":"index"
            }
         }
      ]
   }
}

This is what the JSON output is:

{  
   "UnprocessedKeys":{  

   },
   "Responses":{  
      "<redacted>":[  
         {  
            "version":{  
               "N":"2"
            },
            "timestamp":{  
               "S":"20150114 17:19:09"
            },
            "compress":{  
               "S":"none"
            },
            "version_part":{  
               "S":"index"
            },
            "parts":{  
               "N":"0"
            },
            "key":{  
               "S":"test"
            },
            "data":{  
               "B":"ZEdWemRDRT0="
            }
         }
      ]
   }
}

Notice how the value of data does not match what was written, which was the result of base64encode('test!'). In fact, it is the equivalent of base64encode(base64encode("test!")) as this simple test confirms:

| => echo ZEdWemRDRT0= | base64 -D
dGVzdCE=
| => echo ZEdWemRDRT0= | base64 -D | base64 -D
test!

So it seems that the update-item operation is incorrectly assuming that a binary attribute value is not yet base 64 encoded and is applying an additional layer of base64 encoding. In fact, binary values must already be base64 encoded, or it wouldn't be possible to represent them properly in a JSON format in the first place.

All of this was tested on Max OS X Mavericks:

| => aws --version
aws-cli/1.7.0 Python/2.7.8 Darwin/14.0.0
@asieira asieira changed the title aws dynamodb put-item and extra base64 aws dynamodb update-item and extra base64 Jan 14, 2015
@asieira asieira changed the title aws dynamodb update-item and extra base64 aws dynamodb update-item adding unnecessary base64 encoding to binary attributes Jan 14, 2015
@jamesls
Copy link
Member

jamesls commented Jan 15, 2015

Thanks for the detailed writeup. I can see what's going on here. This is due to our generic processing for binary/blob types. Just to give more background here, whenever something is modeled as a blob type, we automatically base64 encode the input value. This is a general construct. It applies to any input for any service/operation that has binary input.

However on output, we don't automatically base64 decode values. This is because we don't want to write arbitrary binary data to stdout.

In the case of top level parameters where you can input binary content directly (via fileb://), this seems reasonable. However you raise a good point when the binary type is nested such that the input required is JSON. When this happens, it's not actually possible to enter binary content via JSON, so we need to handle this case.

I'll discuss this with the team and update when we have something to propose. Thanks for bringing this to our attention.

@jamesls jamesls added bug This issue is a bug. accepted labels Jan 15, 2015
@asieira
Copy link
Contributor Author

asieira commented Jan 16, 2015

Thanks for the update, and also for providing such a great tool in the first place. :)

Look forward to seeing this fixed, and let me know if there's anything else I can do to help.

@asieira
Copy link
Contributor Author

asieira commented Jun 5, 2015

Any news on this issue?

@fumin
Copy link

fumin commented Jun 29, 2015

I believe this issue should be assigned a higher priority than it currently is, since using using this CLI to interact with DynamoDB on EC2 instances is one of its most fundamental features.

We are suffering from this exact same issue here, too. Note that this Base64 issue does not only manifests in update-item but in every operation including put-item and delete-item, etc.

@schleary
Copy link

@asieira, using the term "gentlemen" to address developers is presumptuous, sexist, and unappreciated.

@ishallbethat
Copy link

by default when you put_item to dynamodb using binary data type, it will do base64. So i think there're two options there.
1: stop base64 binary and let dynamodb do the work. below is my code sample.

`import boto3
import os
import sys
import botocore
import json
import base64
client = boto3.client('dynamodb',region_name='ap-southeast-2')

note: b'str' is not encoded string, is original string.

str=b'test!'
item={
'testbug':{
'S':'testbug'
},
'testbytes':{
'B': str
}
}

response = client.put_item(TableName='testbug',Item=item)
print (response)`

2: use data type "S", which is String, and aws won't do extra base64 for your data.

@artur-jablonski
Copy link

This behaviour is really weird.

Like this one liner for copying all items from one dynamodb table to another will fail because of that extra base64 encoding on binary data.

aws dynamodb scan --table-name from_table | jq -rc '.Items[]' | tr '\n' '\0' | xargs -0i aws dynamodb put-item --table-name to_table --item '{}'

There should be a way of disabling that when data is coming from json.

Is there any plan to address this?

@Mufasa
Copy link

Mufasa commented Apr 5, 2018

I have just hit this issue as well. Can any of the devs on here please comment on whether or not this is going to be addressed?

@KAJed82
Copy link

KAJed82 commented Jun 21, 2018

This is still an issue. Please give me the option to not re-encode data I queried directly from DynamoDB

@ngortheone
Copy link

I am affected by this too. Is there a fix/workaround?

@mibollma
Copy link

Could you please add some flag to the command to disable the default encoding behaviour when necessary?

@jenyayel
Copy link

The only workaround I found is to use boto3. If you have AWS CLI, then you have Python along with boto3 already installed. Then you can have one-liner python -c 'from boto3 import client; c=client(\"dynamod\"....'.

@jeberle
Copy link

jeberle commented Jun 26, 2020

It's worth noting that this is not a problem w/ the 2.x awscli.

@kdaily kdaily added the dynamodb label Nov 9, 2020
@mithun-mohan
Copy link

Even I hit upon this issue now. Is there an existing solution/workaround using boto3?

@kdaily kdaily added the needs-review This issue or pull request needs review from a core team member. label Aug 16, 2021
@tim-finnigan tim-finnigan added v1 p3 This is a minor priority issue and removed needs-review This issue or pull request needs review from a core team member. labels Nov 4, 2022
@mcopik
Copy link

mcopik commented Mar 31, 2023

We experience the same issue with boto3. We have a pipeline using DynamoDB streams - the Lambda trigger receives base64 encoded data, and then we perform an unnecessary decoding to put into another DynamoDB table. Otherwise, we end up with multiple rounds of base64 encoding applied to our binary data.

rohanpm added a commit to rohanpm/exodus-gw that referenced this issue Aug 5, 2024
This commit partially reverts db4a344; specifically, it
disables the writing of compressed config and reverts back to writing
JSON config as before.

This is being reverted due to an issue noticed during testing. It seems
that the written items are ending up with *two* layers of base64
encoding, which is not intended.
The boto docs[1] use base64 strings as example arguments, giving the
impression the caller is expected to take care of base64 encoding, but
in fact botocore internally does the encoding; if the client also
encodes, we end up with two layers of encoding.

There is also a bug filed relating to this[2].

The code here still seems to "work" since the same mistake is made
on both the writing and reading end, but the goal is to make the
config smaller and having double-encoding works against that.

It should be easy enough to fix, but I'd like some time to confirm my
understanding of how it works, check whether exodus-lambda needs a fix
and also check whether localstack and AWS are behaving the same.
Hence I'll revert this for now and keep writing the old style of config.

[1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/put_item.html
[2] aws/aws-cli#1097
rohanpm added a commit to rohanpm/exodus-gw that referenced this issue Aug 5, 2024
This commit partially reverts db4a344; specifically, it
disables the writing of compressed config and reverts back to writing
JSON config as before.

This is being reverted due to an issue noticed during testing. It seems
that the written items are ending up with *two* layers of base64
encoding, which is not intended.
The boto docs[1] use base64 strings as example arguments, giving the
impression the caller is expected to take care of base64 encoding, but
in fact botocore internally does the encoding; if the client also
encodes, we end up with two layers of encoding.

There is also a bug filed relating to this[2].

The code here still seems to "work" since the same mistake is made
on both the writing and reading end, but the goal is to make the
config smaller and having double-encoding works against that.

It should be easy enough to fix, but I'd like some time to confirm my
understanding of how it works, check whether exodus-lambda needs a fix
and also check whether localstack and AWS are behaving the same.
Hence I'll revert this for now and keep writing the old style of config.

[1] https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/put_item.html
[2] aws/aws-cli#1097
@tim-finnigan
Copy link
Contributor

Thanks all for your patience. It was noted here earlier that this is not an issue in v2 of the AWS CLI. We highly recommend migrating to use v2 of the AWS CLI, please refer to these instructions for migrating to v2: https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration-instructions.html

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. dynamodb p3 This is a minor priority issue v1
Projects
None yet
Development

No branches or pull requests