From d1e9ce5add2f0df45580c52bfb885d5488ac4a88 Mon Sep 17 00:00:00 2001 From: James Mead Date: Thu, 4 Jan 2018 16:00:59 +0000 Subject: [PATCH] Disable nightly sync of Asset Manager assets from asset-slave-2 to S3 The push-attachments-to-s3.sh script is run by the push_attachments_to_s3 cron job daily at 21:00 and is only enabled on asset-slave-2 in production. It uses `s3cmd sync` to sync /mnt/uploads (i.e. all assets) to the govuk-attachments-production S3 bucket. In production the copy-attachments-to-slaves.sh script also writes to the same S3 bucket. It's run by the copy-attachments-to-slaves cron job which runs every 1 min on asset-master-1. In production process_uploaded_attachments_to_s3 is set to true and so copy-attachments-to-slaves.sh uses `s3cmd put` to copy virus scanned assets to the same govuk-attachments-production S3 bucket. However, this is only relevant to Whitehall assets, because Asset Manager virus scanning works differently. Thus it seems that currently Asset Manager assets are only copied to the govuk-attachments-production S3 bucket every night and not continuously like Whitehall assets. It's not clear to me what purpose this S3 bucket is serving given that there are also Duplicity jobs creating off-site backups to a different S3 bucket and a cron job rsyncing files from the asset master to each of the asset slaves. However, I suppose the Duplicity backups will be up to 1 day out-of-date and the asset slaves are not off-site, so perhaps it's filling that gap. It's worth noting there's an attachments-s3-env-sync.sh script on asset master in staging & integration which looks like it was intended to sync from the govuk-attachments-production S3 bucket, but it does not appear to be called from anywhere. Since this change [1] to Asset Manager the files for new assets are deleted from the filesystem once they have been virus scanned and uploaded to S3, because Asset Manager now serves them from S3 via Nginx. Thus the Asset Manager app should no longer be permanently adding asset files to the filesystem and there's no need to have the asset-manager sub-directory under /mnt/uploads synced to S3 by the push-attachments-to-s3.sh script; hence the change in this commmit to exclude that directory. I would've preferred to have changed the main source directory for the `s3cmd sync` command to /mnt/uploads/whitehall, i.e. so only the whitehall sub-directory is synced. We're about to delete the files for Asset Manager assets which have been uploaded to S3 i.e. the vast majority of them. We plan to use this Asset Manager Rake task [2] to delete the files via the Carrierwave uploader mounted on Asset#file. This will delete the underlying file from the uploads directory under the Rails root directory which is sym-linked to /data/uploads/asset-manager. The latter is where the asset-master /mnt/uploads directory is mounted using NFS. If we were to leave this script unchanged, its `s3cmd sync` command would have deleted all the Asset Manager assets from the S3 bucket. By excluding the asset-manager sub-directory, we can leave a recent set of Asset Manager assets in the S3 bucket, acting as a kind of backup in case we run into any unforseen problems when deleting the assets. The script should continue to run and so the push_attachments_to_s3_xxx Icinga check should not report any alerts. [1]: https://github.com/alphagov/asset-manager/pull/373 [2]: https://github.com/alphagov/asset-manager/blob/d803db930614a6063c0fc16730f6ba3eaf08e6d9/lib/tasks/govuk_assets.rake#L5 --- .../templates/node/s_asset_base/push-attachments-to-s3.sh.erb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/govuk/templates/node/s_asset_base/push-attachments-to-s3.sh.erb b/modules/govuk/templates/node/s_asset_base/push-attachments-to-s3.sh.erb index 9a7e63dd9d..63c98e4b6a 100644 --- a/modules/govuk/templates/node/s_asset_base/push-attachments-to-s3.sh.erb +++ b/modules/govuk/templates/node/s_asset_base/push-attachments-to-s3.sh.erb @@ -32,7 +32,7 @@ if [ ! "$DIRECTORY_TO_COPY" ]; then usage fi -if envdir /etc/govuk/aws/env.d /usr/local/bin/s3cmd --cache-file=/tmp/s3cmd_attachments.cache --server-side-encryption sync --exclude="lost+found" --skip-existing --delete-removed "$DIRECTORY_TO_COPY/" "s3://<%= @s3_bucket -%>$DIRECTORY_TO_COPY/"; then +if envdir /etc/govuk/aws/env.d /usr/local/bin/s3cmd --cache-file=/tmp/s3cmd_attachments.cache --server-side-encryption sync --exclude="lost+found" --exclude="asset-manager" --skip-existing --delete-removed "$DIRECTORY_TO_COPY/" "s3://<%= @s3_bucket -%>$DIRECTORY_TO_COPY/"; then echo "Attachments copied to S3 (<%= @s3_bucket -%>) successfully" else echo "Attachments errored while copying to S3 (<%= @s3_bucket -%>)"