-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Full-Page Caching for 404s #46
Conversation
Important Auto Review SkippedDraft detected. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the WalkthroughThe changes introduce a new feature to the Alleyvate WordPress plugin, Changes
TipsChat with CodeRabbit Bot (
|
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (3)
- src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1 hunks)
- src/alley/wp/alleyvate/load.php (1 hunks)
- tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (1 hunks)
Files skipped from review due to trivial changes (1)
- src/alley/wp/alleyvate/load.php
Additional comments: 7
src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1)
- 1-243: The class
Full_Page_Cache_404
is well-structured and follows good practices. It implements theFeature
interface and provides methods for managing the cache. The use of constants for cache keys, cache group, and cache times is a good practice. The methods are well-documented with comments. The use of WordPress hooks and filters is appropriate. The use of output buffering to capture the HTML of the 404 page for caching is a clever approach. The use ofwp_cache_get
,wp_cache_set
, andwp_cache_delete
functions for managing the cache is appropriate. The use ofwp_schedule_event
andwp_schedule_single_event
for scheduling cache replenishment is a good practice. The use ofwpcom_vip_file_get_contents
for making a request to the guaranteed 404 page to populate the cache is appropriate. The use ofheader
function for sending custom headers for debugging is a good practice. Overall, the code is well-written and follows good practices.tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (6)
35-37: The
setUp
method is correctly setting up theFeature
instance.42-58: The
test_full_page_cache_404_returns_cache
method is correctly testing the caching of a 404 page.63-69: The
test_full_page_cache_not_returned_for_non_404
method is correctly testing that a non-404 page does not return the cache.74-77: The
set_404_cache
helper method is correctly setting the cache content.84-86: The
get_404_html
helper method is correctly retrieving the cache content.91-94: The
tearDown
method is correctly deleting the cache.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…nit-tests [Feature] Unit tests for the Full-Page Caching for 404s feature
@coderabbitai review |
This is... probably not the best place for my off-hand observations, but I was working on a server recently (monolith, old school) re: PHP-fpm's inability to keep up with 404's, and I had two observations:
In my specific case, nginx needed extra instrumentation to ignore WordPress' Cache ( If not, we should consider that... |
This doesn't change the Cache-Control Expires HTTP header... It makes no change to the existing HTTP headers. It does add a new HTTP header A 404 is not necessarily permanent...It should be a 410 response to indicate "Gone" for a permanent one... I imagine that upstream caching is prevented because a 404 could be temporary. |
Absolutely this is surely the intention behind the existing directive no-cache headers. But... it might be 'cacheable' for, say, the next 2 seconds, or the next 10. It sounds like a silly distinction, but in certain scenarios, it's the difference between 1k hits to the 404 page or 999 hits to varnish and 1 to php within per second. There is a purity of saying 'no 404 may last forever, so don't cache it' that I don't necessarily want to give up, but... logistically and realistically, allowing downstreams to cache that it's a 404 for a couple seconds can make a huge difference in some stacks. Fine leaving it as-is... the only alternatives I can think of would be to:
|
ob_start( [ self::class, 'finish_output_buffering' ] ); | ||
|
||
// Clean up the buffer. | ||
ob_get_clean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, while testing this in a client site, I noticed PHP can run multiple buffers in a request. In my testing environment, there was only one, but in the client site, by the time the code gets here, it is buffer number 2. So it returns the wrong value (empty).
I'm inclined to remove this line since it will exit
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following error while running the unit tests, but not while testing on a site. 🤔
Test code or tested code did not (only) close its own output buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I learned that buffers can actually be nested and it is pretty hard to match the status with the initial value.
So here, we are clearing any previous buffers before starting our own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be closing the output buffer that we're creating. We could also, potentially, keep the output buffer that already exists if output buffering is already active.
ob_get_status will tell us the current status of output buffering. If there is an active buffer, it will return a non-empty array. If there is not an active buffer, it will return an empty array. We could use that function to detect whether an output buffer is currently active and set a flag, which could control whether we spin up a new output buffer or use the current one, or whether we capture anything that's already in an existing output buffer for use later.
Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown
and capture the output there and add it to the cache.
Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page, as it would break this feature (as written). Having an active output buffering session on the 404 page that doesn't come from this plugin is likely to be an edge case and could better be handled in that codebase rather than trying to code around it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I revisited this morning and I had 3 buffers already present before this one in a regular site and client site. No idea yet how. ¯_(ツ)_/¯
But I'll take your notes and do a bit more digging and hopefully apply a solution that works by default.
Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown and capture the output there and add it to the cache.
🤔 I'd expect more buffers to be available at this point, added by plugins, etc. I'll test this approach just in case.
Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page,
I'll try to find a way to make it work for "any" sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick feedback after some testing:
- Hooking into
shutdown
hook: the buffer always returns empty. My guess is that it is already cleared by the time it gets there. It's not that easy to track buffers. - Using
ob_get_status
is useful, but any attempt to clear the buffer after that, clears the html and it returns empty instead of the 404 page buffer. I tried a mixture of flushing the previous buffer and flushing ours later. I'm at a lost at why it is empty.
So far, any attempt to clear the buffer after ob_start( [ self::class, 'finish_output_buffering' ] );
, it is essentially clearing the HTML from the buffer and returning ""
. Meaning that we are not caching the html page.
Something to consider for implementation here:
The first one is an easy fix. But the others are more complicated since it depends on the site and how it is built. |
For supporting different caches based on the user agent. It’s technically possible to create and serve different caches using the |
…equires-ssl The "Full-Page Caching for 404s" feature requires ssl
I don't have any feedback about the code in particular, but I've had some lingering questions after taking in the comments in the PR. They're not showstoppers, just things about which I think it would be helpful to have our current thinking on the record. First, are we certain that Alleyvate is still the best place for this feature? I've been wondering this as more code has been added to the feature as I've seen comments like:
Alleyvate is meant for the "essential" customizations whose features would be kept enabled by project leads without a second thought, but this particular feature seems to need more consideration before it's adopted by a site. Is it better for it to be somewhere that's opt-in rather than opt-out? Second, what kind performance benefit can sites expect to see once this feature is enabled? I understand why the default behavior for 404 pages can be a performance problem, as outlined in #9, but this question has been nagging me since the switch was made in 4d57b69 to serve the cached template on The caching means that WordPress no longer has to find the 404 template and render it, from here to here: https://github.com/WordPress/wordpress-develop/blob/7002bce8abe6f6f1858bdd2f02cd1168ef50e4d8/src/wp-includes/template-loader.php#L13-L106 That's not nothing, but the WordPress bootstrapping process still occurs beforehand — plugins are loaded, the main query runs to find the requested content, For me, this question is a companion to the question above regarding the relative complexity of this feature. If the feature presents more edge cases than the usual Alleyvate feature but brings a 40% performance improvement with it, that's a tradeoff that most project leads would probably take. (I picked "40%" at random — that's not a target I'm trying to suggest!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, but this seems like it's been tested pretty thoroughly, so I don't have much to add from a code perspective.
This feature was moved into https://github.com/alleyinteractive/wp-404-caching |
Summary
This feature addresses a common performance issue where numerous 404 requests can strain server resources, as each request bypasses the standard full-page cache and requires PHP to generate the 404 page anew.
Resolves #9
Key Features
X-Alleyvate-404-Cache
HTTP headerRequirements
Notes for reviewers
Changelog entries
Added
Changed
Deprecated
Removed
Fixed
Security
Summary by CodeRabbit
New Features:
Full_Page_Cache_404
, that enhances the website's performance by caching 404 pages and retrieving them when needed. This reduces server load and improves user experience during navigation.Tests:
Test_Full_Page_Cache_404
, to ensure the correct functionality of the 404 page caching feature. This includes tests for caching a 404 page and ensuring non-404 pages do not return the cache, ensuring the reliability of the new feature.