-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-122548: Implement branch taken and not taken events for sys.monitoring #122564
GH-122548: Implement branch taken and not taken events for sys.monitoring #122564
Conversation
Thanks! I will try this out in the next few days. |
Playing with it on some simple code, the basics look really good here. I'll work on adapting coverage.py to get some timings. |
I've add |
Is def foo():
for i in range(3):
print(i)
This is without requesting any Edit: mmm, I don't see |
I'm missing some branch events for this code: def foo(n):
while n<3:
print(n)
n += 1
return None
if __name__ == "__main__":
foo(0) Here's what happens:
I was expecting branch events for iterations 1 and 2. This is without ever returning |
No it should be immutable, like
It is a bug. I'll take a look. Fixed now |
This is mostly likely because I haven't merged in the fix in #122934 yet. |
Should all be fixed now. |
It is great to see progress on this, thanks for pushing it forward. I have some questions about the behavior of This is my program for seeing the sys.monitoring events: run_sysmon.py# Licensed under the Apache License: http://www.apache.org/licenses/LICENSE-2.0
# For details: https://github.com/nedbat/coveragepy/blob/master/NOTICE.txt
"""Run sys.monitoring on a file of Python code."""
import functools
import sys
print(sys.version)
the_program = sys.argv[1]
code = compile(open(the_program).read(), filename=the_program, mode="exec")
my_id = sys.monitoring.COVERAGE_ID
sys.monitoring.use_tool_id(my_id, "run_sysmon.py")
register = functools.partial(sys.monitoring.register_callback, my_id)
events = sys.monitoring.events
def bytes_to_lines(code):
"""Make a dict mapping byte code offsets to line numbers."""
b2l = {}
cur_line = 0
for bstart, bend, lineno in code.co_lines():
for boffset in range(bstart, bend, 2):
b2l[boffset] = lineno
return b2l
def show_off(label, code, instruction_offset):
if code.co_filename == the_program:
b2l = bytes_to_lines(code)
print(f"{label}: {code.co_filename}@{instruction_offset} #{b2l[instruction_offset]}")
def show_line(label, code, line_number):
if code.co_filename == the_program:
print(f"{label}: {code.co_filename} #{line_number}")
def show_off_off(label, code, instruction_offset, destination_offset):
if code.co_filename == the_program:
b2l = bytes_to_lines(code)
print(
f"{label}: {code.co_filename}@{instruction_offset}->{destination_offset} "
+ f"#{b2l[instruction_offset]}->{b2l[destination_offset]}"
)
def sysmon_py_start(code, instruction_offset):
show_off("PY_START", code, instruction_offset)
sys.monitoring.set_local_events(
my_id,
code,
events.PY_RETURN
| events.PY_RESUME
| events.LINE
| events.BRANCH_TAKEN
| events.BRANCH_NOT_TAKEN
| events.JUMP,
)
def sysmon_py_resume(code, instruction_offset):
show_off("PY_RESUME", code, instruction_offset)
return sys.monitoring.DISABLE
def sysmon_py_return(code, instruction_offset, retval):
show_off("PY_RETURN", code, instruction_offset)
return sys.monitoring.DISABLE
def sysmon_line(code, line_number):
show_line("LINE", code, line_number)
return sys.monitoring.DISABLE
def sysmon_branch(code, instruction_offset, destination_offset):
show_off_off("BRANCH", code, instruction_offset, destination_offset)
return sys.monitoring.DISABLE
def sysmon_branch_taken(code, instruction_offset, destination_offset):
show_off_off("BRANCH_TAKEN", code, instruction_offset, destination_offset)
return sys.monitoring.DISABLE
def sysmon_branch_not_taken(code, instruction_offset, destination_offset):
show_off_off("BRANCH_NOT_TAKEN", code, instruction_offset, destination_offset)
return sys.monitoring.DISABLE
def sysmon_jump(code, instruction_offset, destination_offset):
show_off_off("JUMP", code, instruction_offset, destination_offset)
return sys.monitoring.DISABLE
sys.monitoring.set_events(
my_id,
events.PY_START | events.PY_UNWIND,
)
register(events.PY_START, sysmon_py_start)
register(events.PY_RESUME, sysmon_py_resume)
register(events.PY_RETURN, sysmon_py_return)
# register(events.PY_UNWIND, sysmon_py_unwind_arcs)
register(events.LINE, sysmon_line)
#register(events.BRANCH, sysmon_branch)
register(events.BRANCH_TAKEN, sysmon_branch_taken)
register(events.BRANCH_NOT_TAKEN, sysmon_branch_not_taken)
register(events.JUMP, sysmon_jump)
exec(code) My test program (withbranches.py): from contextlib import nullcontext
def ok():
with nullcontext():
return "good"
def bad():
with nullcontext():
1/0
print(ok())
try:
print(bad())
except:
print("whoops")
print("done") I get this output:
I have two questions:
|
With statements are complicated things https://peps.python.org/pep-0343/#specification-the-with-statement. with mngr: # line 1
BODY # line 2 is equivalent to:
It looks like we decided (well, I decided and no one disagreed 🙂) that the call to |
I see, thanks. I mistakenly thought the branch was "exception happened or it didn't," but it's "exception happened and was handled by the context manager or it wasn't." I don't think that's a branch that a coverage.py user would want exposed to them, but I could be swayed. The re-visiting of the |
@nedbat def foo(a,b):
return (a +
b) Execution order:
|
I've updated this branch to use I've left the |
I'm trying to use the new events to track what branches have happened. I'm having a problem because the events report This is forloop.py, where I'm focused on the if statement on line 3: def forloop(seq):
for n in seq:
if n > 5: # line 3
break
print('Done')
forloop([1,9]) Disassembled it looks like this:
When I run it with sys.monitoring, I get these events:
The @XX notations are bytecode offsets, the #xx notations are line numbers obtained by mapping the offsets through co_lines(). There are two branch events from line 3. The first is BRANCH_NOT_TAKEN 22->28, but that maps to line 3->3. Bytecode 28 is a jump backward, but how can I properly interpret this as a branch not taken from line 3 to line 2? The second event is BRANCH_TAKEN 22->32, which maps to line 3->None. What should I do with a NOP with no line number? |
Sorry, I guess my example is duplicating #123050? |
@markshannon @iritkatriel Is there some way we can work together to push this forward? |
75bed9c
to
c5a4b17
Compare
@markshannon I'm pleased to see more activity here. What's the best way to talk about how it's behaving? I tried a simple program:
Using a build of this PR, I disassembled it as:
Looking at the sys.monitoring events, I see this:
(@ are byte offsets, # are line numbers) The BRANCH_LEFT is indicating a branch from line 1 to line 1, so we never "see" the branch from 1 to 2. Is that something you can fix, or is there some way I should be adjusting for that? |
There is no branch from line 1 to 2, the "left" branch is from line 1 to line 1. This code: 1. for i in s:
2. A
3. B this is roughly equivalent to: 1. tmp1 = iter(i); tmp2 = next(tmp1); if tmp2 is NULL goto end; s = tmp2
2. A
end:
3. B The "left" branch is from |
Python/codegen.c
Outdated
} | ||
VISIT(c, expr, (expr_ty)asdl_seq_GET(e->v.Compare.comparators, n)); | ||
ADDOP_COMPARE(c, LOC(e), asdl_seq_GET(e->v.Compare.ops, n)); | ||
ADDOP(c, LOC(e), TO_BOOL); | ||
ADDOP_JUMP(c, LOC(e), cond ? POP_JUMP_IF_TRUE : POP_JUMP_IF_FALSE, next); | ||
ADDOP(c, NO_LOCATION, NOT_TAKEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The peephole optimiser replaces a conditional jump with const condition by an unconditional jump and a NOP. We probably need another peepholer rule to remove any NOT_TAKEN this leaves behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could we not add NOT_TAKEN in codegen, and have the assembler inject it where it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this might be not work if the optimizer merges are rearranges branches.
It is possible that the correspondence to the original source could be lost.
Sorry for the delay in responding: I am about to leave on a holiday trip and things tend to get hectic then. I'd love to have this to be merged in! Being able to disable just each direction of a branch event at a time and getting a list of the possible branches without having to go through the bytecode are great additions and make these events much easier to use. Where are we with #123050 and other related issues? Not that I think resolving them should block what's been done so far. |
Let's discuss that on those issues. Being able to disable the branch events is orthogonal to mapping branch locations back to source. |
Python/codegen.c
Outdated
case POP_JUMP_IF_TRUE: | ||
case POP_JUMP_IF_NONE: | ||
case POP_JUMP_IF_NOT_NONE: | ||
case FOR_ITER: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a macro for conditional branches next to this one
#define IS_UNCONDITIONAL_JUMP_OPCODE(opcode) \ |
Maybe first we should decide: from the user's point of view, does line 1 have two possible destinations (line 2 and line 3) or does it not? Should line 1 be considered a branch for the purposes of branch coverage? If we can't map the collected data back to a user-understandable interpretation, then we haven't finished the feature. Can you help me puzzle through this? I'd like to get efficient branch coverage measurement. So far, we seem to be talking about different things (byte codes vs lines) and we aren't meeting in the middle. |
Can we discuss this on the issue? This PR is about adding the ability to disable the two branches independently.
Indeed. This PR is about disabling branches and offsets. Issue #122548 or any of the related issues that @jaltmayerpizzorno opened would the place to discuss that. |
@@ -262,6 +262,7 @@ Known values: | |||
Python 3.14a1 3607 (Add pseudo instructions JUMP_IF_TRUE/FALSE) | |||
Python 3.14a1 3608 (Add support for slices) | |||
Python 3.14a2 3609 (Add LOAD_SMALL_INT and LOAD_CONST_IMMORTAL instructions, remove RETURN_CONST) | |||
Python 3.14a3 3610 (Add NOT_TAKEN instruction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says 3610, but the implementation says 3611.
This says a3, but a3 is already released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting that.
This is a draft PR.
Before merging, it needs:
@nedbat
@jaltmayerpizzorno