Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Full-Page Caching for 404s #46

Closed
wants to merge 54 commits into from
Closed

Conversation

mslinnea
Copy link
Member

@mslinnea mslinnea commented Jun 15, 2023

Summary

This feature addresses a common performance issue where numerous 404 requests can strain server resources, as each request bypasses the standard full-page cache and requires PHP to generate the 404 page anew.

Resolves #9

Key Features

  • Hourly cron job populates cache.
  • Stale cache also maintained (24 hour lifespan)
  • Hooks in on template_redirect hook at priority 1
  • Adds X-Alleyvate-404-Cache HTTP header
  • Logged in users bypass this cache

Requirements

  • Persistant Object Cache
  • SSL

Notes for reviewers

  • Looking for feedback on any potential compatibility issues or other concerns.

Changelog entries

Added

Changed

Deprecated

Removed

Fixed

Security

Summary by CodeRabbit

New Features:

  • Introduced a new feature, Full_Page_Cache_404, that enhances the website's performance by caching 404 pages and retrieving them when needed. This reduces server load and improves user experience during navigation.

Tests:

  • Added a new test suite, Test_Full_Page_Cache_404, to ensure the correct functionality of the 404 page caching feature. This includes tests for caching a 404 page and ensuring non-404 pages do not return the cache, ensuring the reliability of the new feature.

Copy link

coderabbitai bot commented Nov 1, 2023

Important

Auto Review Skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The changes introduce a new feature to the Alleyvate WordPress plugin, Full_Page_Cache_404, which enables caching of 404 pages. This feature is added to the list of features in the plugin's load file. Additionally, a new test suite is created to ensure the functionality of the caching feature, including tests for caching 404 pages and not caching non-404 pages.

Changes

File Path Summary
src/.../class-full-page-cache-404.php Introduced a new class Full_Page_Cache_404 that implements the Feature interface. This class adds functionality for caching 404 pages and retrieving them from the cache.
src/.../load.php Added the Full_Page_Cache_404 feature to the list of features in the plugin's load file.
tests/.../test-full-page-cache-404.php Created a new test suite Test_Full_Page_Cache_404 for the caching feature. It includes tests for caching 404 pages and not caching non-404 pages, with helper methods for setting and retrieving the cache content.

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

@mslinnea
Copy link
Member Author

mslinnea commented Nov 1, 2023

@coderabbitai review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 4665bec and 730dd63.
Files selected for processing (3)
  • src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1 hunks)
  • src/alley/wp/alleyvate/load.php (1 hunks)
  • tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (1 hunks)
Files skipped from review due to trivial changes (1)
  • src/alley/wp/alleyvate/load.php
Additional comments: 7
src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1)
  • 1-243: The class Full_Page_Cache_404 is well-structured and follows good practices. It implements the Feature interface and provides methods for managing the cache. The use of constants for cache keys, cache group, and cache times is a good practice. The methods are well-documented with comments. The use of WordPress hooks and filters is appropriate. The use of output buffering to capture the HTML of the 404 page for caching is a clever approach. The use of wp_cache_get, wp_cache_set, and wp_cache_delete functions for managing the cache is appropriate. The use of wp_schedule_event and wp_schedule_single_event for scheduling cache replenishment is a good practice. The use of wpcom_vip_file_get_contents for making a request to the guaranteed 404 page to populate the cache is appropriate. The use of header function for sending custom headers for debugging is a good practice. Overall, the code is well-written and follows good practices.
tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (6)
  • 35-37: The setUp method is correctly setting up the Feature instance.

  • 42-58: The test_full_page_cache_404_returns_cache method is correctly testing the caching of a 404 page.

  • 63-69: The test_full_page_cache_not_returned_for_non_404 method is correctly testing that a non-404 page does not return the cache.

  • 74-77: The set_404_cache helper method is correctly setting the cache content.

  • 84-86: The get_404_html helper method is correctly retrieving the cache content.

  • 91-94: The tearDown method is correctly deleting the cache.

@mslinnea mslinnea linked an issue Dec 28, 2023 that may be closed by this pull request
@mslinnea mslinnea marked this pull request as ready for review December 28, 2023 22:51
@mslinnea mslinnea requested a review from a team as a code owner December 28, 2023 22:51
@mslinnea mslinnea changed the title Serve cache for 404 pages - WIP Serve cache for 404 pages Dec 28, 2023
@mslinnea mslinnea changed the title Serve cache for 404 pages [Feature] Full-Page Caching for 404s Dec 29, 2023
@renatonascalves
Copy link
Contributor

@coderabbitai review

@mslinnea mslinnea requested a review from dlh01 February 15, 2024 03:03
@benpbolton
Copy link
Member

This is... probably not the best place for my off-hand observations, but I was working on a server recently (monolith, old school) re: PHP-fpm's inability to keep up with 404's, and I had two observations:

  1. it made me happy for this PR
  2. i noticed that the default WordPress behavior was to not allow caching 404s by passing the Cache-Control and Expires headers that basically prevent downstream caches from working.

In my specific case, nginx needed extra instrumentation to ignore WordPress' Cache (fastcgi_ignore_headers Cache-Control Expires;) ... and then I wondered ... does this PR allow the (php cached) 404 page to also be cached by upstream (nginx, varnish, CDNs of all types, etc)

If not, we should consider that...

@mslinnea
Copy link
Member Author

This is... probably not the best place for my off-hand observations, but I was working on a server recently (monolith, old school) re: PHP-fpm's inability to keep up with 404's, and I had two observations:

  1. it made me happy for this PR
  2. i noticed that the default WordPress behavior was to not allow caching 404s by passing the Cache-Control and Expires headers that basically prevent downstream caches from working.

In my specific case, nginx needed extra instrumentation to ignore WordPress' Cache (fastcgi_ignore_headers Cache-Control Expires;) ... and then I wondered ... does this PR allow the (php cached) 404 page to also be cached by upstream (nginx, varnish, CDNs of all types, etc)

If not, we should consider that...

This doesn't change the Cache-Control Expires HTTP header... It makes no change to the existing HTTP headers. It does add a new HTTP header X-Alleyvate-404-Cache for debugging purposes, to indicate whether the cache was hit...

A 404 is not necessarily permanent...It should be a 410 response to indicate "Gone" for a permanent one... I imagine that upstream caching is prevented because a 404 could be temporary.

@benpbolton
Copy link
Member

A 404 is not necessarily permanent...It should be a 410 response to indicate "Gone" for a permanent one... I imagine that upstream caching is prevented because a 404 could be temporary.

Absolutely this is surely the intention behind the existing directive no-cache headers. But... it might be 'cacheable' for, say, the next 2 seconds, or the next 10. It sounds like a silly distinction, but in certain scenarios, it's the difference between 1k hits to the 404 page or 999 hits to varnish and 1 to php within per second.

There is a purity of saying 'no 404 may last forever, so don't cache it' that I don't necessarily want to give up, but... logistically and realistically, allowing downstreams to cache that it's a 404 for a couple seconds can make a huge difference in some stacks.

Fine leaving it as-is... the only alternatives I can think of would be to:

  1. clear out (not specify) the cache-control and expires headers
  2. set them to an extremely short duration that would be 'universally acceptable' as being true (10s as a start)

ob_start( [ self::class, 'finish_output_buffering' ] );

// Clean up the buffer.
ob_get_clean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, while testing this in a client site, I noticed PHP can run multiple buffers in a request. In my testing environment, there was only one, but in the client site, by the time the code gets here, it is buffer number 2. So it returns the wrong value (empty).

I'm inclined to remove this line since it will exit anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following error while running the unit tests, but not while testing on a site. 🤔

Test code or tested code did not (only) close its own output buffers

Copy link
Contributor

@renatonascalves renatonascalves Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I learned that buffers can actually be nested and it is pretty hard to match the status with the initial value.

So here, we are clearing any previous buffers before starting our own.

8017c6a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be closing the output buffer that we're creating. We could also, potentially, keep the output buffer that already exists if output buffering is already active.

ob_get_status will tell us the current status of output buffering. If there is an active buffer, it will return a non-empty array. If there is not an active buffer, it will return an empty array. We could use that function to detect whether an output buffer is currently active and set a flag, which could control whether we spin up a new output buffer or use the current one, or whether we capture anything that's already in an existing output buffer for use later.

Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown and capture the output there and add it to the cache.

Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page, as it would break this feature (as written). Having an active output buffering session on the 404 page that doesn't come from this plugin is likely to be an edge case and could better be handled in that codebase rather than trying to code around it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I revisited this morning and I had 3 buffers already present before this one in a regular site and client site. No idea yet how. ¯_(ツ)_/¯

But I'll take your notes and do a bit more digging and hopefully apply a solution that works by default.

Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown and capture the output there and add it to the cache.

🤔 I'd expect more buffers to be available at this point, added by plugins, etc. I'll test this approach just in case.

Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page,

I'll try to find a way to make it work for "any" sites.

Copy link
Contributor

@renatonascalves renatonascalves Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick feedback after some testing:

  • Hooking into shutdown hook: the buffer always returns empty. My guess is that it is already cleared by the time it gets there. It's not that easy to track buffers.
  • Using ob_get_status is useful, but any attempt to clear the buffer after that, clears the html and it returns empty instead of the 404 page buffer. I tried a mixture of flushing the previous buffer and flushing ours later. I'm at a lost at why it is empty.

So far, any attempt to clear the buffer after ob_start( [ self::class, 'finish_output_buffering' ] );, it is essentially clearing the HTML from the buffer and returning "". Meaning that we are not caching the html page.

@renatonascalves
Copy link
Contributor

Something to consider for implementation here:

  • The feature assumes that the site uses HTTPS;
  • The cache doesn’t differentiate based on mobile or desktop (user agent);
  • The cache doesn’t differentiate based on geo region.

The first one is an easy fix. But the others are more complicated since it depends on the site and how it is built.

@renatonascalves
Copy link
Contributor

For supporting different caches based on the user agent. It’s technically possible to create and serve different caches using the wp_is_mobile helper function. I'd suggest we add it behind a filter hook.

…equires-ssl

The "Full-Page Caching for 404s" feature requires ssl
@dlh01
Copy link
Member

dlh01 commented Mar 5, 2024

I don't have any feedback about the code in particular, but I've had some lingering questions after taking in the comments in the PR. They're not showstoppers, just things about which I think it would be helpful to have our current thinking on the record.

First, are we certain that Alleyvate is still the best place for this feature?

I've been wondering this as more code has been added to the feature as I've seen comments like:

the others are more complicated since it depends on the site and how it is built

could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page, as it would break this feature (as written)

I'll try to find a way to make it work for "any" sites

Alleyvate is meant for the "essential" customizations whose features would be kept enabled by project leads without a second thought, but this particular feature seems to need more consideration before it's adopted by a site. Is it better for it to be somewhere that's opt-in rather than opt-out?

Second, what kind performance benefit can sites expect to see once this feature is enabled?

I understand why the default behavior for 404 pages can be a performance problem, as outlined in #9, but this question has been nagging me since the switch was made in 4d57b69 to serve the cached template on template_redirect.

The caching means that WordPress no longer has to find the 404 template and render it, from here to here: https://github.com/WordPress/wordpress-develop/blob/7002bce8abe6f6f1858bdd2f02cd1168ef50e4d8/src/wp-includes/template-loader.php#L13-L106

That's not nothing, but the WordPress bootstrapping process still occurs beforehand — plugins are loaded, the main query runs to find the requested content, init and wp_loaded fire, etc. Does the bootstrap represent a lot of the processing for a 404 request? A little?

For me, this question is a companion to the question above regarding the relative complexity of this feature. If the feature presents more edge cases than the usual Alleyvate feature but brings a 40% performance improvement with it, that's a tradeoff that most project leads would probably take. (I picked "40%" at random — that's not a target I'm trying to suggest!)

Copy link
Member

@dlh01 dlh01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, but this seems like it's been tested pretty thoroughly, so I don't have much to add from a code perspective.

@renatonascalves
Copy link
Contributor

This feature was moved into https://github.com/alleyinteractive/wp-404-caching

@renatonascalves renatonascalves deleted the feature/9/caching-404s branch March 13, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Full-Page Caching for 404s
5 participants