Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Refactor LOAD_GLOBAL specializations to avoid reloading {globals, builtins} keys #124953

Merged
merged 6 commits into from
Oct 9, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Oct 3, 2024

Each of the LOAD_GLOBAL specializations is implemented roughly as:

  1. Load keys version.
  2. Load cached keys version.
  3. Deopt if (1) and (2) don't match.
  4. Load keys.
  5. Load cached index into keys.
  6. Load object from (4) at offset from (5).

This is not thread-safe in free-threaded builds; the keys object may be replaced in between steps (3) and (4).

This change refactors the specializations to avoid reloading the keys object and instead pass the keys object from guards to be consumed by downstream uops.

…uiltins} keys

Each of the `LOAD_GLOBAL` specializations is implemented roughly as:

1. Load keys version.
2. Load cached keys version.
3. Deopt if (1) and (2) don't match.
4. Load keys.
5. Load cached index into keys.
6. Load object from (4) at offset from (5).

This is not thread-safe in free-threaded builds; the keys object may be replaced
in between steps (3) and (4).

This change refactors the specializations to avoid reloading the keys object and
instead pass the keys object from guards to be consumed by downstream uops.
@rruuaanng
Copy link
Contributor

Can you split the change? This makes it difficult for us to review.

@mpage mpage requested a review from colesbury October 7, 2024 18:05
Python/bytecodes.c Show resolved Hide resolved
Python/executor_cases.c.h Show resolved Hide resolved
mpage added 3 commits October 7, 2024 16:32
Ensure we update the stack to reflect that we've popped the keys.
There should be nothing on the stack if we deopt.
@mpage mpage requested a review from colesbury October 7, 2024 23:57
Maybe this will stop tickling the msvc compiler bug, too?
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x86_64-pc-windows-msvc/msvc (Release) JIT failure is not new.

@colesbury colesbury enabled auto-merge (squash) October 9, 2024 14:50
@colesbury colesbury merged commit f978fb4 into python:main Oct 9, 2024
54 of 55 checks passed
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the review I did yesterday, but didn't quite finish in time 🙂

if (incorrect_keys(inst, builtins)) {
OPT_STAT_INC(remove_globals_incorrect_keys);
return 0;
}
if (interp->rare_events.builtin_dict >= _Py_MAX_ALLOWED_BUILTINS_MODIFICATIONS) {
continue;
}
if (!check_next_uop(buffer, buffer_size, pc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the optimizer passes to be (as much as possible) simple, fast scans over the uop sequence.
So, I'd like to avoid this sort of non-local check if possible.

Generally we want each pass to be a linear scan which maintains a small set of knowledge, like function_checked, etc. above.
Each case should then either update that knowledge or perform a simple optimization based on that knowledge.

FYI, I plan to merge this pass into optimizer_bytecodes.c which is also a linear pass with similar design principles (at least it should be).

@@ -4871,6 +4884,26 @@ dummy_func(
DEOPT_IF(func->func_version != func_version);
}

tier2 op(_LOAD_GLOBAL_MODULE, (index/1 -- res, null if (oparg & 1))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively pushes the global's keys then does _LOAD_GLOBAL_MODULE_FROM_KEYS.

Maybe add a tier2 op that only pushes the keys? It might make the optimizer simpler as well.

null = PyStackRef_NULL;
}

tier2 op(_LOAD_GLOBAL_BUILTINS, (index/1 -- res, null if (oparg & 1))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

@markshannon
Copy link
Member

No need to revert anything, but I would like to remove _LOAD_GLOBAL_MODULE/_LOAD_GLOBAL_BUILTINS and replace them with _LOAD_GLOBAL_KEYS/_LOAD_BUILTINS_KEYS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants