[Feature] Full-Page Caching for 404s #46

mslinnea · 2023-06-15T20:04:45Z

Summary

This feature addresses a common performance issue where numerous 404 requests can strain server resources, as each request bypasses the standard full-page cache and requires PHP to generate the 404 page anew.

Resolves #9

Key Features

Hourly cron job populates cache.
Stale cache also maintained (24 hour lifespan)
Hooks in on template_redirect hook at priority 1
Adds X-Alleyvate-404-Cache HTTP header
Logged in users bypass this cache

Requirements

Persistant Object Cache
SSL

Notes for reviewers

Looking for feedback on any potential compatibility issues or other concerns.

Changelog entries

Added

Changed

Deprecated

Removed

Fixed

Security

Summary by CodeRabbit

New Features:

Introduced a new feature, Full_Page_Cache_404, that enhances the website's performance by caching 404 pages and retrieving them when needed. This reduces server load and improves user experience during navigation.

Tests:

Added a new test suite, Test_Full_Page_Cache_404, to ensure the correct functionality of the 404 page caching feature. This includes tests for caching a 404 page and ensuring non-404 pages do not return the cache, ensuring the reliability of the new feature.

src/alley/wp/alleyvate/features/class-full-page-cache-404.php

…en't happening

coderabbitai · 2023-11-01T14:32:35Z

Important

Auto Review Skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The changes introduce a new feature to the Alleyvate WordPress plugin, Full_Page_Cache_404, which enables caching of 404 pages. This feature is added to the list of features in the plugin's load file. Additionally, a new test suite is created to ensure the functionality of the caching feature, including tests for caching 404 pages and not caching non-404 pages.

Changes

File Path	Summary
`src/.../class-full-page-cache-404.php`	Introduced a new class `Full_Page_Cache_404` that implements the `Feature` interface. This class adds functionality for caching 404 pages and retrieving them from the cache.
`src/.../load.php`	Added the `Full_Page_Cache_404` feature to the list of features in the plugin's load file.
`tests/.../test-full-page-cache-404.php`	Created a new test suite `Test_Full_Page_Cache_404` for the caching feature. It includes tests for caching 404 pages and not caching non-404 pages, with helper methods for setting and retrieving the cache content.

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai help to get help.
@coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

mslinnea · 2023-11-01T14:33:08Z

@coderabbitai review

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 4665bec and 730dd63.

Files selected for processing (3)

src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1 hunks)
src/alley/wp/alleyvate/load.php (1 hunks)
tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (1 hunks)

Files skipped from review due to trivial changes (1)

src/alley/wp/alleyvate/load.php

Additional comments: 7

src/alley/wp/alleyvate/features/class-full-page-cache-404.php (1)

1-243: The class Full_Page_Cache_404 is well-structured and follows good practices. It implements the Feature interface and provides methods for managing the cache. The use of constants for cache keys, cache group, and cache times is a good practice. The methods are well-documented with comments. The use of WordPress hooks and filters is appropriate. The use of output buffering to capture the HTML of the 404 page for caching is a clever approach. The use of wp_cache_get, wp_cache_set, and wp_cache_delete functions for managing the cache is appropriate. The use of wp_schedule_event and wp_schedule_single_event for scheduling cache replenishment is a good practice. The use of wpcom_vip_file_get_contents for making a request to the guaranteed 404 page to populate the cache is appropriate. The use of header function for sending custom headers for debugging is a good practice. Overall, the code is well-written and follows good practices.

tests/alley/wp/alleyvate/features/test-full-page-cache-404.php (6)

35-37: The setUp method is correctly setting up the Feature instance.

42-58: The test_full_page_cache_404_returns_cache method is correctly testing the caching of a 404 page.

63-69: The test_full_page_cache_not_returned_for_non_404 method is correctly testing that a non-404 page does not return the cache.

74-77: The set_404_cache helper method is correctly setting the cache content.

84-86: The get_404_html helper method is correctly retrieving the cache content.

91-94: The tearDown method is correctly deleting the cache.

tests/alley/wp/alleyvate/features/test-full-page-cache-404.php

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

src/alley/wp/alleyvate/features/class-full-page-cache-404.php

tests/alley/wp/alleyvate/features/test-full-page-cache-404.php

src/alley/wp/alleyvate/features/class-full-page-cache-404.php

…nit-tests [Feature] Unit tests for the Full-Page Caching for 404s feature

renatonascalves · 2024-02-14T23:56:30Z

@coderabbitai review

benpbolton · 2024-02-15T15:23:46Z

This is... probably not the best place for my off-hand observations, but I was working on a server recently (monolith, old school) re: PHP-fpm's inability to keep up with 404's, and I had two observations:

it made me happy for this PR
i noticed that the default WordPress behavior was to not allow caching 404s by passing the Cache-Control and Expires headers that basically prevent downstream caches from working.

In my specific case, nginx needed extra instrumentation to ignore WordPress' Cache (fastcgi_ignore_headers Cache-Control Expires;) ... and then I wondered ... does this PR allow the (php cached) 404 page to also be cached by upstream (nginx, varnish, CDNs of all types, etc)

If not, we should consider that...

mslinnea · 2024-02-16T23:14:44Z

This is... probably not the best place for my off-hand observations, but I was working on a server recently (monolith, old school) re: PHP-fpm's inability to keep up with 404's, and I had two observations:

it made me happy for this PR

i noticed that the default WordPress behavior was to not allow caching 404s by passing the Cache-Control and Expires headers that basically prevent downstream caches from working.

In my specific case, nginx needed extra instrumentation to ignore WordPress' Cache (fastcgi_ignore_headers Cache-Control Expires;) ... and then I wondered ... does this PR allow the (php cached) 404 page to also be cached by upstream (nginx, varnish, CDNs of all types, etc)

If not, we should consider that...

This doesn't change the Cache-Control Expires HTTP header... It makes no change to the existing HTTP headers. It does add a new HTTP header X-Alleyvate-404-Cache for debugging purposes, to indicate whether the cache was hit...

A 404 is not necessarily permanent...It should be a 410 response to indicate "Gone" for a permanent one... I imagine that upstream caching is prevented because a 404 could be temporary.

benpbolton · 2024-02-19T14:18:12Z

A 404 is not necessarily permanent...It should be a 410 response to indicate "Gone" for a permanent one... I imagine that upstream caching is prevented because a 404 could be temporary.

Absolutely this is surely the intention behind the existing directive no-cache headers. But... it might be 'cacheable' for, say, the next 2 seconds, or the next 10. It sounds like a silly distinction, but in certain scenarios, it's the difference between 1k hits to the 404 page or 999 hits to varnish and 1 to php within per second.

There is a purity of saying 'no 404 may last forever, so don't cache it' that I don't necessarily want to give up, but... logistically and realistically, allowing downstreams to cache that it's a 404 for a couple seconds can make a huge difference in some stacks.

Fine leaving it as-is... the only alternatives I can think of would be to:

clear out (not specify) the cache-control and expires headers
set them to an extremely short duration that would be 'universally acceptable' as being true (10s as a start)

renatonascalves · 2024-02-22T00:56:38Z

src/alley/wp/alleyvate/features/class-full-page-cache-404.php

+			ob_start( [ self::class, 'finish_output_buffering' ] );
+
+			// Clean up the buffer.
+			ob_get_clean();


So, while testing this in a client site, I noticed PHP can run multiple buffers in a request. In my testing environment, there was only one, but in the client site, by the time the code gets here, it is buffer number 2. So it returns the wrong value (empty).

I'm inclined to remove this line since it will exit anyway.

I get the following error while running the unit tests, but not while testing on a site. 🤔

Test code or tested code did not (only) close its own output buffers

So, I learned that buffers can actually be nested and it is pretty hard to match the status with the initial value.

So here, we are clearing any previous buffers before starting our own.

8017c6a

We should be closing the output buffer that we're creating. We could also, potentially, keep the output buffer that already exists if output buffering is already active.

ob_get_status will tell us the current status of output buffering. If there is an active buffer, it will return a non-empty array. If there is not an active buffer, it will return an empty array. We could use that function to detect whether an output buffer is currently active and set a flag, which could control whether we spin up a new output buffer or use the current one, or whether we capture anything that's already in an existing output buffer for use later.

Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown and capture the output there and add it to the cache.

Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page, as it would break this feature (as written). Having an active output buffering session on the 404 page that doesn't come from this plugin is likely to be an edge case and could better be handled in that codebase rather than trying to code around it here.

So I revisited this morning and I had 3 buffers already present before this one in a regular site and client site. No idea yet how. ¯_(ツ)_/¯

But I'll take your notes and do a bit more digging and hopefully apply a solution that works by default.

Also, the current implementation relies on a callback for when the buffer is naturally flushed, but we could perhaps hook into shutdown and capture the output there and add it to the cache.

🤔 I'd expect more buffers to be available at this point, added by plugins, etc. I'll test this approach just in case.

Alternately, this doesn't need to be considered an Alleyvate bug, and could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page,

I'll try to find a way to make it work for "any" sites.

Quick feedback after some testing:

Hooking into shutdown hook: the buffer always returns empty. My guess is that it is already cleared by the time it gets there. It's not that easy to track buffers.

Using ob_get_status is useful, but any attempt to clear the buffer after that, clears the html and it returns empty instead of the 404 page buffer. I tried a mixture of flushing the previous buffer and flushing ours later. I'm at a lost at why it is empty.

So far, any attempt to clear the buffer after ob_start( [ self::class, 'finish_output_buffering' ] );, it is essentially clearing the HTML from the buffer and returning "". Meaning that we are not caching the html page.

renatonascalves · 2024-02-28T11:58:36Z

Something to consider for implementation here:

The feature assumes that the site uses HTTPS;
The cache doesn’t differentiate based on mobile or desktop (user agent);
The cache doesn’t differentiate based on geo region.

The first one is an easy fix. But the others are more complicated since it depends on the site and how it is built.

renatonascalves · 2024-02-28T14:47:37Z

For supporting different caches based on the user agent. It’s technically possible to create and serve different caches using the wp_is_mobile helper function. I'd suggest we add it behind a filter hook.

…equires-ssl The "Full-Page Caching for 404s" feature requires ssl

dlh01 · 2024-03-05T13:00:23Z

I don't have any feedback about the code in particular, but I've had some lingering questions after taking in the comments in the PR. They're not showstoppers, just things about which I think it would be helpful to have our current thinking on the record.

First, are we certain that Alleyvate is still the best place for this feature?

I've been wondering this as more code has been added to the feature as I've seen comments like:

the others are more complicated since it depends on the site and how it is built

could instead be incumbent upon sites that use this plugin to ensure they don't have an active output buffering session on the 404 page, as it would break this feature (as written)

I'll try to find a way to make it work for "any" sites

Alleyvate is meant for the "essential" customizations whose features would be kept enabled by project leads without a second thought, but this particular feature seems to need more consideration before it's adopted by a site. Is it better for it to be somewhere that's opt-in rather than opt-out?

Second, what kind performance benefit can sites expect to see once this feature is enabled?

I understand why the default behavior for 404 pages can be a performance problem, as outlined in #9, but this question has been nagging me since the switch was made in 4d57b69 to serve the cached template on template_redirect.

The caching means that WordPress no longer has to find the 404 template and render it, from here to here: https://github.com/WordPress/wordpress-develop/blob/7002bce8abe6f6f1858bdd2f02cd1168ef50e4d8/src/wp-includes/template-loader.php#L13-L106

That's not nothing, but the WordPress bootstrapping process still occurs beforehand — plugins are loaded, the main query runs to find the requested content, init and wp_loaded fire, etc. Does the bootstrap represent a lot of the processing for a 404 request? A little?

For me, this question is a companion to the question above regarding the relative complexity of this feature. If the feature presents more edge cases than the usual Alleyvate feature but brings a 40% performance improvement with it, that's a tradeoff that most project leads would probably take. (I picked "40%" at random — that's not a target I'm trying to suggest!)

dlh01

I left a comment, but this seems like it's been tested pretty thoroughly, so I don't have much to add from a code perspective.

renatonascalves · 2024-03-13T14:40:03Z

This feature was moved into https://github.com/alleyinteractive/wp-404-caching

Serve cache for 404 pages - WIP

81b09b9

mslinnea commented Jun 15, 2023

View reviewed changes

src/alley/wp/alleyvate/features/class-full-page-cache-404.php Outdated Show resolved Hide resolved

mslinnea commented Jun 15, 2023

View reviewed changes

src/alley/wp/alleyvate/features/class-full-page-cache-404.php Outdated Show resolved Hide resolved

mslinnea added 5 commits June 21, 2023 19:25

add stale cache, switch to using a cron job

4d57b69

phpcs and work on tests

4598a31

use output buffering to save the cache

bc52963

Merge remote-tracking branch 'origin/main' into feature/9/caching-404s

1c97d49

schedule single event. remove stderror flag because test failures wer…

b6e709f

…en't happening

mslinnea mentioned this pull request Oct 11, 2023

[FEATURE] Full-Page Caching for 404s #9

Closed

Merge branch 'main' into feature/9/caching-404s

730dd63

coderabbitai bot reviewed Nov 1, 2023

View reviewed changes

tests/alley/wp/alleyvate/features/test-full-page-cache-404.php Outdated Show resolved Hide resolved

tests/alley/wp/alleyvate/features/test-full-page-cache-404.php Outdated Show resolved Hide resolved

mslinnea and others added 5 commits November 1, 2023 07:55

Update tests/alley/wp/alleyvate/features/test-full-page-cache-404.php

c8c9f8b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into feature/9/caching-404s

466282f

prevent outputting headers if already sent

38ddf23

Merge branch 'feature/9/caching-404s-local' into feature/9/caching-404s

6e3946d

Avoid setting cache to empty string

951296c

mslinnea linked an issue Dec 28, 2023 that may be closed by this pull request

[FEATURE] Full-Page Caching for 404s #9

Closed

mslinnea added 3 commits December 28, 2023 14:32

phpcs

71b88bb

php cs fixer

20ad3e8

Server 404 page early

a51bcc9

mslinnea marked this pull request as ready for review December 28, 2023 22:51

mslinnea requested a review from a team as a code owner December 28, 2023 22:51

Logged in users should bypass cache

4a2ae4a

mslinnea changed the title ~~Serve cache for 404 pages - WIP~~ Serve cache for 404 pages Dec 28, 2023

Fix issue where HTTP header was set incorrectly

b776b39

mslinnea changed the title ~~Serve cache for 404 pages~~ [Feature] Full-Page Caching for 404s Dec 29, 2023

renatonascalves requested changes Jan 2, 2024

View reviewed changes

dlh01 reviewed Jan 2, 2024

View reviewed changes

renatonascalves added 9 commits February 13, 2024 16:36

Set INSTALL_OBJECT_CACHE via env

6336e62

Set INSTALL_OBJECT_CACHE via env

83b95d2

Set INSTALL_OBJECT_CACHE

5b2021b

Remove INSTALL_OBJECT_CACHE: true

6bacb48

Skip tests if object cache is not available

6f9c270

Disable tests if object cache is not in use

7a8c1ee

Adding CR suggestions

f6f985f

Minor tweak

cd33707

Merge pull request #76 from alleyinteractive/feature/9/caching-404s-u…

f628f26

…nit-tests [Feature] Unit tests for the Full-Page Caching for 404s feature

mslinnea requested a review from dlh01 February 15, 2024 03:03

renatonascalves reviewed Feb 22, 2024

View reviewed changes

renatonascalves added 3 commits February 22, 2024 10:18

Sync with the latest

1590143

Dot not clean buffer too early

ac984cd

Clean up any previous buffer before starting our own

8017c6a

The "Full-Page Caching for 404s" feature requires ssl

a2c04df

renatonascalves mentioned this pull request Feb 28, 2024

The "Full-Page Caching for 404s" feature requires ssl #77

Merged

php-cs-fixer fixes

370860d

Merge pull request #77 from alleyinteractive/feature/9/caching-404s-r…

006f900

…equires-ssl The "Full-Page Caching for 404s" feature requires ssl

dlh01 reviewed Mar 5, 2024

View reviewed changes

renatonascalves mentioned this pull request Mar 7, 2024

Cache Months Dropdown in Admin #82

Merged

renatonascalves closed this Mar 13, 2024

renatonascalves deleted the feature/9/caching-404s branch March 13, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Full-Page Caching for 404s #46

[Feature] Full-Page Caching for 404s #46

mslinnea commented Jun 15, 2023 •

edited

Loading

coderabbitai bot commented Nov 1, 2023 •

edited

Loading

Auto Review Skipped

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

mslinnea commented Nov 1, 2023

coderabbitai bot left a comment

renatonascalves commented Feb 14, 2024

benpbolton commented Feb 15, 2024

mslinnea commented Feb 16, 2024

benpbolton commented Feb 19, 2024

renatonascalves Feb 22, 2024

renatonascalves Feb 22, 2024

renatonascalves Feb 22, 2024 •

edited

Loading

kevinfodness Feb 22, 2024

renatonascalves Feb 22, 2024

renatonascalves Feb 23, 2024 •

edited

Loading

renatonascalves commented Feb 28, 2024

renatonascalves commented Feb 28, 2024

dlh01 commented Mar 5, 2024

dlh01 left a comment

renatonascalves commented Mar 13, 2024

[Feature] Full-Page Caching for 404s #46

[Feature] Full-Page Caching for 404s #46

Conversation

mslinnea commented Jun 15, 2023 • edited Loading

Summary

Key Features

Requirements

Notes for reviewers

Changelog entries

Added

Changed

Deprecated

Removed

Fixed

Security

Summary by CodeRabbit

coderabbitai bot commented Nov 1, 2023 • edited Loading

Auto Review Skipped

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

mslinnea commented Nov 1, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

renatonascalves commented Feb 14, 2024

benpbolton commented Feb 15, 2024

mslinnea commented Feb 16, 2024

benpbolton commented Feb 19, 2024

renatonascalves Feb 22, 2024

Choose a reason for hiding this comment

renatonascalves Feb 22, 2024

Choose a reason for hiding this comment

renatonascalves Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

kevinfodness Feb 22, 2024

Choose a reason for hiding this comment

renatonascalves Feb 22, 2024

Choose a reason for hiding this comment

renatonascalves Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

renatonascalves commented Feb 28, 2024

renatonascalves commented Feb 28, 2024

dlh01 commented Mar 5, 2024

dlh01 left a comment

Choose a reason for hiding this comment

renatonascalves commented Mar 13, 2024

mslinnea commented Jun 15, 2023 •

edited

Loading

coderabbitai bot commented Nov 1, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)

renatonascalves Feb 22, 2024 •

edited

Loading

renatonascalves Feb 23, 2024 •

edited

Loading