Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit tests for publish2cloud.py #162

Merged
merged 8 commits into from
Aug 18, 2020

Conversation

boolean5
Copy link
Contributor

@boolean5 boolean5 commented Jul 27, 2020

Implement unit tests for chunk_metadata, new_data_to_publish_to_s3 and publish_to_s3.
Will close #130.

@boolean5 boolean5 marked this pull request as draft July 27, 2020 16:29
@boolean5 boolean5 force-pushed the add-tests-for-p2c branch from af55d55 to 6daca9e Compare July 30, 2020 10:34
@boolean5
Copy link
Contributor Author

boolean5 commented Aug 3, 2020

I added tests for new_data_to_publish_to_s3. Also, I refactored this function and fixed some minor bugs. For a full description of the changes see the commit message of 5aaf927.

Note: One of the changes was returning True right away when dealing with a new list that has not been uploaded to S3 yet. Previously, we were creating an S3 key with dummy list contents and comparing against its checksum. We should test this on staging by temporarily introducing a new section in the configuration file and confirming that the corresponding list is added to S3.

@boolean5
Copy link
Contributor Author

boolean5 commented Aug 4, 2020

Just added unit tests for publish_to_s3

@boolean5 boolean5 marked this pull request as ready for review August 4, 2020 14:32
@boolean5 boolean5 mentioned this pull request Aug 13, 2020
Return the checksum of the contents of the list file starting right
after the chunk header. Previously, we were always starting from the
26th byte, regardless of the size of the header.
Move the code that uses the configuration file to determine if a list
should be uploaded to S3 outside of `new_data_to_publish_to_s3` and into
`publish_to_cloud`. Rename `check_upload_remote_settings_config` to
`check_upload_config` and adjust it to work both for S3 and remote
settings. In `publish_to_cloud` make sure that
`new_data_to_publish_to_s3` is only called when `upload_to_s3` is True,
to avoid the cost of unnecessarily accessing S3.

In addition, when the key corresponding to a list does not yet exist on
S3, return True right away instead of creating a key with dummy list
contents and comparing against its checksum.

Also, fix a minor bug in getting the name of the S3 key from the
configuration file. Previously, if the section did not have the `s3_key`
option, `new_data_to_publish_to_s3` would raise a `NoOptionError`
instead of falling back to using the `output` option as the S3 key name.
Raise a ValueError when the `s3_key` option exists but is empty.

Finally, add a docstring to describe the functionality of
`new_data_to_publish_to_s3`.
Add moto as a requirement and implement unit tests for
new_data_to_publish_to_s3.
Add unit tests to check that the expected S3 key permissions are set in
new_data_to_publish_to_s3 and publish_to_s3. These tests are currently
expected to fail because setting key permissions with add_user_grant()
does not work in moto.
Copy link
Contributor

@say-yawn say-yawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor change requested.

eoh = header.find('\n')
chunktype, chunknum, hash_size, data_len = header[:eoh].split(':')
header = fp.readline().rstrip('\n')
chunktype, chunknum, hash_size, data_len = header.split(':')
return dict(
type=chunktype, num=chunknum, hash_size=hash_size, len=data_len,
checksum=hashlib.sha256(fp.read()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this update we will now be getting a more accurate checksum that excludes the header but includes the checksum of the rest of the file. When I merge this to Stage I expect most, if not all, lists to be updated. When this is confirmed I should talk to the ops team to let them know there may be large requests coming in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make use of the updated list better, so that scaling our service is done to deliver real updates and not changes to our list creation script, we may want to coordinate the changes from shavar-prod-lists along with shipping this change to prod.

publish2cloud.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants