Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve Whitehall's feature images from Asset Manager #403

Closed
5 tasks done
chrisroos opened this issue Jan 11, 2018 · 15 comments
Closed
5 tasks done

Serve Whitehall's feature images from Asset Manager #403

chrisroos opened this issue Jan 11, 2018 · 15 comments
Assignees

Comments

@chrisroos
Copy link
Contributor

chrisroos commented Jan 11, 2018

This has been extracted from #215 to make it easier to manage the remaining work. See that issue for lots more information.

Example asset: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg

Tasks

@chrisroos
Copy link
Contributor Author

chrisroos commented Jan 15, 2018

I ran the following commands in integration. We need to run the same commands in production to give us confidence that the Whitehall NFS mount and Asset Manager database are in sync before we switch the config:

$ find /data/uploads/whitehall/clean/system/uploads/feature/image -type f | wc -l
477736

> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 478219
> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 433

Note that these figures aren't necessarily realistic - it's the commands we're interested in.

@chrisroos
Copy link
Contributor Author

@gpeng has run the commands above in production.

$ find /data/uploads/whitehall/clean/system/uploads/feature/image -type f | wc -l
478583

> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 478541
> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 607

NOTE. The number of files on the filesystem doesn't match the corresponding number of assets in the database. The difference of 42 matches the number of files that weren't migrated due to the problem described in "GdsApi::InvalidUrl exception when migrating assets #384". I'm hopeful that those failed jobs are still on the queue and that they'll be migrated soon. If that's not the case then we'll have to manually migrate them somehow.

@chrisroos
Copy link
Contributor Author

I've requested the example asset in the description from integration and used Kibana to confirm that it was served by Whitehall.

$ curl -v "https://assets-origin.integration.publishing.service.gov.uk/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR22321 HTTP/2
> Host: assets-origin.integration.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0< HTTP/2 200 
< date: Tue, 16 Jan 2018 12:04:20 GMT
< content-type: image/jpeg
< content-length: 471923
< server: nginx
< accept-ranges: bytes
< cache-control: max-age=14400, public
< content-disposition: inline; filename="Afghanistan-dfid.jpg"
< etag: "576fca2c-73373"
< last-modified: Sun, 26 Jun 2016 12:27:24 GMT
< x-frame-options: SAMEORIGIN
< access-control-allow-origin: *
< access-control-allow-methods: GET, OPTIONS
< access-control-allow-headers: origin, authorization

# Kibana logs - searching for CJR22321
January 16th 2018, 12:04:21.018	 - 	 - 	whitehall-frontend-error
January 16th 2018, 12:04:21.000	 - 	 - 	assets-origin-json.event.access
January 16th 2018, 12:04:20.517	 - 	 - 	whitehall
January 16th 2018, 12:04:20.000	 - 	 - 	whitehall-frontend-json.event.access
January 16th 2018, 12:04:20.000	 - 	 - 	whitehall-admin-json.event.access

@chrisroos
Copy link
Contributor Author

@andrewgarner has run the following commands in production. Unfortunately they still reveal a discrepancy between the number of files on disk and those in the database. I'm going to investigate further to see whether the original failing jobs are going to retry.

$ find /data/uploads/whitehall/clean/system/uploads/feature/image -type f | wc -l
478891

irb(main):001:0> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 478849
irb(main):002:0> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 607

@chrisroos
Copy link
Contributor Author

I had hoped that the 42 assets missing from the Asset Manager database would've been uploaded since deploying the fix in #384. Unfortunately, I can see that they're still in the Sidekiq retry queue. I believe this is because we've deployed Whitehall since they were added to the queue and so they'll now be suffering from the problem described in #414.

I think this means that we need to run the following commands in production to migrate these remaining featured image assets:

rake asset_manager:migrate_assets[system/uploads/feature/image/56493]
rake asset_manager:migrate_assets[system/uploads/feature/image/57540]
rake asset_manager:migrate_assets[system/uploads/feature/image/57541]
rake asset_manager:migrate_assets[system/uploads/feature/image/58283]
rake asset_manager:migrate_assets[system/uploads/feature/image/58657]
rake asset_manager:migrate_assets[system/uploads/feature/image/58661]

@chrislo - Does my plan to re-run asset_manager:migrate_assets for the subset above sound like a good idea?

My investigation

By looking at the events in the GdsApi::InvalidUrl exception in Sentry I can see the following retries:

  • Retry 18 | Jan 11 3:31 am - 4:06 am
  • Retry 19 | Jan 12 8:47 am - 9:23 am | 1 day 5 hours since retry 18
  • Retry 20 | Jan 13 9:03 pm - 9:44 pm | 1 day 12 hours since retry 19

According to Sidekiq's error handling this means that I would've expected retry 21 to occur at around 4pm on Jan 15 (i.e. about 1 day 20 hours after retry 20). The Whitehall fix for the InvalidUrl problem was deployed in the morning of 15 Jan so I expected retry 21 to succeed.

Although I can't see a GdsApi::InvalidUrl exception in Sentry caused by retry 21, I'm confident that it failed because of the log entries in Kibana. It would appear that Sidekiq retries use the same JID so I was able to use one of them (b00a43166ef4252bfba248a1) to search Kibana for information about the retry:

January 15th 2018, 18:16:20.000	 - 	 - 	whitehall	Failure! Retry 21 in 194584 seconds
January 15th 2018, 18:16:20.000	 - 	 - 	whitehall	fail: 0.775 sec
January 15th 2018, 18:16:20.000	 - 	 - 	whitehall	enqueued retry: <snipped>
January 15th 2018, 18:16:20.000	 - 	 - 	whitehall	start
January 13th 2018, 21:44:29.000	 - 	 - 	whitehall	start
January 13th 2018, 21:44:29.000	 - 	 - 	whitehall	fail: 0.012 sec
January 13th 2018, 21:44:29.000	 - 	 - 	whitehall	Failure! Retry 20 in 160309 seconds
January 13th 2018, 21:44:29.000	 - 	 - 	whitehall	enqueued retry: <snipped>
January 12th 2018, 09:23:48.000	 - 	 - 	whitehall	fail: 0.002 sec
January 12th 2018, 09:23:48.000	 - 	 - 	whitehall	start
January 12th 2018, 09:23:48.000	 - 	 - 	whitehall	enqueued retry: <snipped>
January 12th 2018, 09:23:48.000	 - 	 - 	whitehall	Failure! Retry 19 in 130796 seconds
January 11th 2018, 04:06:10.000	 - 	 - 	whitehall	Failure! Retry 18 in 105447 seconds
January 11th 2018, 04:06:10.000	 - 	 - 	whitehall	fail: 0.019 sec
January 11th 2018, 04:06:10.000	 - 	 - 	whitehall	enqueued retry: <snipped>
January 11th 2018, 04:06:10.000	 - 	 - 	whitehall	start

These failing jobs should eventually move to the dead queue on around 29 Jan (in about 12 days).

@chrislo
Copy link
Contributor

chrislo commented Jan 17, 2018 via email

@chrisroos
Copy link
Contributor Author

I've asked 2ndline to run the Rake tasks to migrate the remaining feature images to Asset Manager.

@chrisroos
Copy link
Contributor Author

chrisroos commented Jan 19, 2018

@andrewgarner has just run the following Rake tasks in production:

rake asset_manager:migrate_assets[system/uploads/feature/image/56493]
rake asset_manager:migrate_assets[system/uploads/feature/image/57540]
rake asset_manager:migrate_assets[system/uploads/feature/image/57541]
rake asset_manager:migrate_assets[system/uploads/feature/image/58283]
rake asset_manager:migrate_assets[system/uploads/feature/image/58657]
rake asset_manager:migrate_assets[system/uploads/feature/image/58661]

I suspect these jobs might need to wait until the uploads in #404 have finished.

@chrislo
Copy link
Contributor

chrislo commented Jan 19, 2018

Yes, it looks like those jobs were probably added at the back of the queue. I'll keep an eye on them.

screen shot 2018-01-19 at 08 58 15

@chrislo chrislo self-assigned this Jan 22, 2018
@chrisroos
Copy link
Contributor Author

I've asked 2ndline to compare the number of these assets on the filesystem to the number that have been created in the Asset Manager database.

@chrisroos
Copy link
Contributor Author

chrisroos commented Jan 23, 2018

@h-lame ran the following commands in production to compare the assets on the filesystem to those in the Asset Manager database:

# Feature Images
$ find /data/uploads/whitehall/clean/system/uploads/feature/image -type f | wc -l
480025

$ govuk_app_console asset-manager
Loading production environment (Rails 5.1.4)
irb(main):001:0> WhitehallAsset.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 480025
irb(main):002:0> WhitehallAsset.deleted.where(legacy_url_path: %r(/government/uploads/system/uploads/feature/image/)).count
=> 607

The number of assets in the database matches the number on the filesystem so we're all good to open a PR to update the nginx config to serve these assets from asset-manager.

chrisroos added a commit to alphagov/govuk-puppet that referenced this issue Jan 23, 2018
See alphagov/asset-manager#403 for more
information.

We've been uploading all new feature images to asset-manager since
alphagov/whitehall#3602 was merged and deployed.

We uploaded all historical feature images on 5 Jan 2018[1].

[1]: alphagov/asset-manager#215 (comment)
@chrisroos
Copy link
Contributor Author

I've opened alphagov/govuk-puppet#7128 to update the nginx config to start serving these assets from Asset Manager.

chrisroos added a commit to alphagov/govuk-puppet that referenced this issue Jan 23, 2018
See alphagov/asset-manager#403 for more
information.

We've been uploading all new feature images to asset-manager since
alphagov/whitehall#3602 was merged and deployed.

We uploaded all historical feature images on 5 Jan 2018[1].

[1]: alphagov/asset-manager#215 (comment)
@chrisroos
Copy link
Contributor Author

chrisroos commented Jan 23, 2018

I've tested the effect of this PR in integration and used Kibana to confirm that these assets are now being served by Asset Manager.

Note. We don't currently have a realistic set of assets or asset-manager data in integration so I've had to create a Whitehall asset to mirror the example asset in the description.

# Create asset
$ export BEARER_TOKEN=`cat /etc/govuk/manuals-publisher/env.d/ASSET_MANAGER_BEARER_TOKEN`

$ echo `date` > tmp.txt
$ curl \
  -H"Authorization: Bearer $BEARER_TOKEN" \
  -H"Accept: application/json" \
  https://asset-manager.integration.govuk-internal.digital/whitehall_assets \
  --form "asset[file][email protected]" \
  --form "asset[legacy_url_path]=/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg"

# Request the asset in integration
$ curl -v  "https://assets-origin.integration.publishing.service.gov.uk/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR20912 HTTP/2
> Host: assets-origin.integration.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< date: Tue, 23 Jan 2018 12:47:16 GMT
< content-type: text/plain
< content-length: 29
< server: nginx
< vary: Accept-Encoding
< accept-ranges: bytes
< cache-control: max-age=14400, public
< content-disposition: inline; filename="tmp.txt"
< etag: "5a672e77-1d"
< last-modified: Tue, 23 Jan 2018 12:45:43 GMT
< strict-transport-security: max-age=31536000
< vary: Accept-Encoding
< vary: Accept-Encoding
< x-frame-options: SAMEORIGIN
< access-control-allow-origin: *
< access-control-allow-methods: GET, OPTIONS
< access-control-allow-headers: origin, authorization

# Search Kibana for CJR20912
January 23rd 2018, 12:47:16.908	 - 	 - 	asset-manager
January 23rd 2018, 12:47:16.000	 - 	 - 	assets-origin-json.event.access
January 23rd 2018, 12:47:16.000	 - 	 - 	asset-manager-json.event.access
January 23rd 2018, 12:47:16.000	 - 	 - 	static-json.event.access

@chrisroos
Copy link
Contributor Author

For reference, I requested the example asset in the description to confirm that it's being served by Whitehall in production:

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR$RANDOM" > /dev/null

> GET /government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CJR21443 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Last-Modified: Wed, 27 Mar 2013 10:49:53 GMT
< Content-Disposition: inline; filename="Afghanistan-dfid.jpg"
< Cache-Control: max-age=14400, public
< ETag: "5152ced1-73373"
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 471923
< Accept-Ranges: bytes
< Date: Tue, 23 Jan 2018 12:59:10 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lcy19227-LCY
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516712351.799813,VS0,VE114

# Kibana search results for CJR21443
January 23rd 2018, 12:59:10.875	 - 	 - 	whitehall
January 23rd 2018, 12:59:10.000	 - 	 - 	whitehall-admin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 12:59:10.000	 - 	 - 	whitehall-frontend.publishing.service.gov.uk-json.event.access
January 23rd 2018, 12:59:10.000	 - 	 - 	whitehall-admin.publishing.service.gov.uk-json.event.access
January 23rd 2018, 12:59:10.000	 - 	 - 	whitehall-frontend.publishing.service.gov.uk-json.event.access
January 23rd 2018, 12:59:10.000	 - 	 - 	assets-origin.publishing.service.gov.uk-json.event.access

@chrislo
Copy link
Contributor

chrislo commented Jan 24, 2018

These assets are now being served by asset manager in production. I made the following request:

$ curl -v "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CRL$RANDOM" > /dev/null
> GET /government/uploads/system/uploads/feature/image/1/Afghanistan-dfid.jpg?CRL14433 HTTP/1.1
> Host: assets.publishing.service.gov.uk
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Content-Type: image/jpeg
< Content-Disposition: inline; filename="Afghanistan-dfid.jpg"
< Cache-Control: max-age=14400, public
< ETag: "5152ced1-73373"
< Last-Modified: Wed, 27 Mar 2013 10:49:53 GMT
< X-Frame-Options: SAMEORIGIN
< Strict-Transport-Security: max-age=31536000
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: origin, authorization
< Fastly-Backend-Name: origin
< Content-Length: 471923
< Accept-Ranges: bytes
< Date: Wed, 24 Jan 2018 11:35:11 GMT
< Via: 1.1 varnish
< Age: 0
< Connection: keep-alive
< X-Served-By: cache-lcy19221-LCY
< X-Cache: MISS
< X-Cache-Hits: 0
< X-Timer: S1516793711.086294,VS0,VE284

And checked Kibana for the random string

screen shot 2018-01-24 at 06 35 52

@chrislo chrislo closed this as completed Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants