Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-122548: Implement branch taken and not taken events for sys.monitoring #122564

Merged
merged 13 commits into from
Dec 19, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Aug 1, 2024

This is a draft PR.

Before merging, it needs:

  • Feedback on whether this fixes the problem
  • Docs
  • Benchmarking results

@nedbat
@jaltmayerpizzorno

@nedbat
Copy link
Member

nedbat commented Aug 1, 2024

Thanks! I will try this out in the next few days.

@nedbat
Copy link
Member

nedbat commented Aug 5, 2024

Playing with it on some simple code, the basics look really good here. I'll work on adapting coverage.py to get some timings.

@markshannon
Copy link
Member Author

I've add co_branches() iterator to this PR, which provides an iterator of src, not_taken, taken triples.
To do this I've changed the BRANCH_NOT_TAKEN event source offset so that it matches the BRANCH_TAKEN source offset.
That way you can track offsets during recording and only map back to locations during reporting, which should make things a bit more efficient.

@jaltmayerpizzorno
Copy link

jaltmayerpizzorno commented Aug 14, 2024

Is co_branches() supposed to change contents as the function is executed? I am not sure I mind if it changes, but it's surprising to me -- I thought it'd be a static static list of the branches present. I'm seeing it change for this:

def foo():
    for i in range(3):
        print(i)
>>> list(foo.__code__.co_branches())
[(24, 30, 58)]
>>> foo()
0
1
2
>>> list(foo.__code__.co_branches())
[]

This is without requesting any sys.monitoring.

Edit: mmm, I don't see co_branches() change in other cases, this may be a bug.

@jaltmayerpizzorno
Copy link

jaltmayerpizzorno commented Aug 14, 2024

I'm missing some branch events for this code:

def foo(n):
    while n<3:
        print(n)
        n += 1

    return None

if __name__ == "__main__":
    foo(0)

Here's what happens:

  1          0       RESUME                   0

  2          2       LOAD_FAST                0 (n)
             4       LOAD_CONST               1 (3)
             6       COMPARE_OP              18 (bool(<))
            10       POP_JUMP_IF_FALSE       26 (to L3)
            14       NOT_TAKEN

  3   L1:   16       LOAD_GLOBAL              1 (print + NULL)
            26       LOAD_FAST                0 (n)
            28       CALL                     1
            36       POP_TOP

  4         38       LOAD_FAST                0 (n)
            40       LOAD_CONST               2 (1)
            42       BINARY_OP               13 (+=)
            46       STORE_FAST               0 (n)

  2         48       LOAD_FAST                0 (n)
            50       LOAD_CONST               1 (3)
            52       COMPARE_OP              18 (bool(<))
            56       POP_JUMP_IF_FALSE        2 (to L2)
            60       JUMP_BACKWARD           24 (to L1)
      L2:   64       NOT_TAKEN

  6   L3:   66       RETURN_CONST             0 (None)

line ex3.py:1
line ex3.py:8
brch ex3.py 8:3-8:25 "__name__ == "__main__"" (@16) -> 9:4-9:7 "foo" (@22)
line ex3.py:9
line ex3.py:2
brch ex3.py 2:10-2:13 "n<3" (foo@10) -> 3:8-3:13 "print" (foo@16)
line ex3.py:3
0
line ex3.py:4
line ex3.py:2
line ex3.py:3
1
line ex3.py:4
line ex3.py:2
line ex3.py:3
2
line ex3.py:4
line ex3.py:2
brch ex3.py 2:10-2:13 "n<3" (foo@56) -> 2:10-2:13 "n<3" (foo@64)
brch ex3.py 2:10-2:13 "n<3" (foo@60) -> 6:11-6:15 "None" (foo@66)
line ex3.py:6

I was expecting branch events for iterations 1 and 2. This is without ever returning DISABLE (and with the same handler for both BRANCH_TAKEN and BRANCH_NOT_TAKEN).

@markshannon
Copy link
Member Author

markshannon commented Aug 15, 2024

Is co_branches() supposed to change contents as the function is executed?

No it should be immutable, like co_lines.

... this may be a bug.

It is a bug. I'll take a look.


Fixed now

@markshannon
Copy link
Member Author

I'm missing some branch events for this code:

def foo(n):
    while n<3:
    ...

This is mostly likely because I haven't merged in the fix in #122934 yet.

@markshannon
Copy link
Member Author

Should all be fixed now.

@nedbat
Copy link
Member

nedbat commented Aug 15, 2024

It is great to see progress on this, thanks for pushing it forward. I have some questions about the behavior of with statements.

This is my program for seeing the sys.monitoring events:

run_sysmon.py
# Licensed under the Apache License: http://www.apache.org/licenses/LICENSE-2.0
# For details: https://github.com/nedbat/coveragepy/blob/master/NOTICE.txt

"""Run sys.monitoring on a file of Python code."""

import functools
import sys

print(sys.version)
the_program = sys.argv[1]

code = compile(open(the_program).read(), filename=the_program, mode="exec")

my_id = sys.monitoring.COVERAGE_ID
sys.monitoring.use_tool_id(my_id, "run_sysmon.py")
register = functools.partial(sys.monitoring.register_callback, my_id)
events = sys.monitoring.events


def bytes_to_lines(code):
    """Make a dict mapping byte code offsets to line numbers."""
    b2l = {}
    cur_line = 0
    for bstart, bend, lineno in code.co_lines():
        for boffset in range(bstart, bend, 2):
            b2l[boffset] = lineno
    return b2l


def show_off(label, code, instruction_offset):
    if code.co_filename == the_program:
        b2l = bytes_to_lines(code)
        print(f"{label}: {code.co_filename}@{instruction_offset} #{b2l[instruction_offset]}")

def show_line(label, code, line_number):
    if code.co_filename == the_program:
        print(f"{label}: {code.co_filename} #{line_number}")

def show_off_off(label, code, instruction_offset, destination_offset):
    if code.co_filename == the_program:
        b2l = bytes_to_lines(code)
        print(
            f"{label}: {code.co_filename}@{instruction_offset}->{destination_offset} "
            + f"#{b2l[instruction_offset]}->{b2l[destination_offset]}"
        )

def sysmon_py_start(code, instruction_offset):
    show_off("PY_START", code, instruction_offset)
    sys.monitoring.set_local_events(
        my_id,
        code,
        events.PY_RETURN
        | events.PY_RESUME
        | events.LINE
        | events.BRANCH_TAKEN
        | events.BRANCH_NOT_TAKEN
        | events.JUMP,
    )


def sysmon_py_resume(code, instruction_offset):
    show_off("PY_RESUME", code, instruction_offset)
    return sys.monitoring.DISABLE


def sysmon_py_return(code, instruction_offset, retval):
    show_off("PY_RETURN", code, instruction_offset)
    return sys.monitoring.DISABLE


def sysmon_line(code, line_number):
    show_line("LINE", code, line_number)
    return sys.monitoring.DISABLE


def sysmon_branch(code, instruction_offset, destination_offset):
    show_off_off("BRANCH", code, instruction_offset, destination_offset)
    return sys.monitoring.DISABLE


def sysmon_branch_taken(code, instruction_offset, destination_offset):
    show_off_off("BRANCH_TAKEN", code, instruction_offset, destination_offset)
    return sys.monitoring.DISABLE


def sysmon_branch_not_taken(code, instruction_offset, destination_offset):
    show_off_off("BRANCH_NOT_TAKEN", code, instruction_offset, destination_offset)
    return sys.monitoring.DISABLE


def sysmon_jump(code, instruction_offset, destination_offset):
    show_off_off("JUMP", code, instruction_offset, destination_offset)
    return sys.monitoring.DISABLE


sys.monitoring.set_events(
    my_id,
    events.PY_START | events.PY_UNWIND,
)
register(events.PY_START, sysmon_py_start)
register(events.PY_RESUME, sysmon_py_resume)
register(events.PY_RETURN, sysmon_py_return)
# register(events.PY_UNWIND, sysmon_py_unwind_arcs)
register(events.LINE, sysmon_line)
#register(events.BRANCH, sysmon_branch)
register(events.BRANCH_TAKEN, sysmon_branch_taken)
register(events.BRANCH_NOT_TAKEN, sysmon_branch_not_taken)
register(events.JUMP, sysmon_jump)

exec(code)

My test program (withbranches.py):

from contextlib import nullcontext

def ok():
    with nullcontext():
        return "good"

def bad():
    with nullcontext():
        1/0

print(ok())

try:
    print(bad())
except:
    print("whoops")
print("done")

I get this output:

% python3.14 run_sysmon.py withbranches.py
3.14.0a0 (heads/pr/122564:538da381d47, Aug 15 2024, 08:09:47) [Clang 15.0.0 (clang-1500.3.9.4)]
PY_START: withbranches.py@0 #0
LINE: withbranches.py #1
LINE: withbranches.py #3
LINE: withbranches.py #7
LINE: withbranches.py #11
PY_START: withbranches.py@0 #3
LINE: withbranches.py #4
LINE: withbranches.py #5
LINE: withbranches.py #4
PY_RETURN: withbranches.py@58 #4
good
LINE: withbranches.py #13
LINE: withbranches.py #14
PY_START: withbranches.py@0 #7
LINE: withbranches.py #8
LINE: withbranches.py #9
LINE: withbranches.py #8
BRANCH_NOT_TAKEN: withbranches.py@80->86 #8->8
LINE: withbranches.py #15
LINE: withbranches.py #16
whoops
LINE: withbranches.py #17
done
PY_RETURN: withbranches.py@96 #17

I have two questions:

  1. When an exception happens in the with, there's a BRANCH_NOT_TAKEN event on the with line, but when there is no exception, there's no BRANCH_TAKEN event. A with statement isn't where I would expect a branch event at all. Is this correct?
  2. Although I return sys.monitoring.DISABLE for every event, I get two LINE events for the with lines (4 and 8).

@markshannon
Copy link
Member Author

markshannon commented Aug 15, 2024

With statements are complicated things https://peps.python.org/pep-0343/#specification-the-with-statement.

with mngr:  # line 1
    BODY    # line 2

is equivalent to:

__enter = mngr.__enter__,   # line 1
__exit = mngr.__exit__  # line 1
__enter()  # line 1
try:
    BODY  # line 2
except:
    __flag = __exit(*sys.exc_info())     # line 1
    if not __flag:   #  <--   this is the branch
        raise
else:
    __exit(None, None, None)     # line 1

It looks like we decided (well, I decided and no one disagreed 🙂) that the call to __exit__ when leaving the with statement should have the location of the with. #88099.

@nedbat
Copy link
Member

nedbat commented Aug 15, 2024

I see, thanks. I mistakenly thought the branch was "exception happened or it didn't," but it's "exception happened and was handled by the context manager or it wasn't." I don't think that's a branch that a coverage.py user would want exposed to them, but I could be swayed.

The re-visiting of the with line during exit I'm familiar with, was just surprised that a disabled line event was repeated. It's OK with me to leave it that way since it will only be two events for the line, no matter how many times the with is entered. We should update the sys.monitoring docs to indicate that it could happen.

@jaltmayerpizzorno
Copy link

#123044

@jaltmayerpizzorno
Copy link

#123048

@jaltmayerpizzorno
Copy link

#123050

@markshannon
Copy link
Member Author

@nedbat with statements aren't the only case where two line events will be generated for the same line:

def foo(a,b):
    return (a +
        b)

Execution order: a then b then + resulting in two line events for the line return (a +

DISABLE will disable each event separately, not all events for the same line.

@markshannon
Copy link
Member Author

markshannon commented Aug 16, 2024

I've updated this branch to use BRANCH_LEFT and BRANCH_RIGHT instead of BRANCH_NOT_TAKEN and BRANCH_TAKEN.

I've left the BRANCH_NOT_TAKEN and BRANCH_TAKEN names in sys.monitoring.events as pseudonyms, to not disrupt any experiments or tests.

@jaltmayerpizzorno
Copy link

#123076

@nedbat
Copy link
Member

nedbat commented Sep 3, 2024

I'm trying to use the new events to track what branches have happened. I'm having a problem because the events report
on bytecode offsets, but I need line numbers. Mapping from the offsets to the line numbers isn't giving me the data I need.

This is forloop.py, where I'm focused on the if statement on line 3:

def forloop(seq):
    for n in seq:
        if n > 5:     # line 3
            break
    print('Done')

forloop([1,9])

Disassembled it looks like this:

1:0-1:0             0       RESUME                   0

2:13-2:16           2       LOAD_FAST                0 (seq)
2:13-2:16           4       GET_ITER
2:13-2:16    L1:    6       FOR_ITER                14 (to L3)
2:13-2:16          10       NOT_TAKEN
2:8-2:9            12       STORE_FAST               1 (n)

3:11-3:12          14       LOAD_FAST                1 (n)
3:15-3:16          16       LOAD_CONST               1 (5)
3:11-3:16          18       COMPARE_OP             148 (bool(>))
3:11-3:16          22       POP_JUMP_IF_TRUE         3 (to L2)
3:11-3:16          26       NOT_TAKEN
3:11-3:16          28       JUMP_BACKWARD           13 (to L1)

  --         L2:   32       NOP

4:12-4:17          34       POP_TOP
4:12-4:17          36       JUMP_FORWARD             2 (to L4)

2:13-2:16    L3:   38       END_FOR
2:13-2:16          40       POP_TOP

5:4-5:9      L4:   42       LOAD_GLOBAL              1 (print + NULL)
5:10-5:16          52       LOAD_CONST               2 ('Done')
5:4-5:17           54       CALL                     1
5:4-5:17           62       POP_TOP
5:4-5:17           64       RETURN_CONST             0 (None)

When I run it with sys.monitoring, I get these events:

3.14.0a0 (heads/pr/122564:8fc38dd8506, Aug 25 2024, 07:11:06) [Clang 15.0.0 (clang-1500.3.9.4)]
PY_START: forloop.py@0 #0
LINE: forloop.py #1
LINE: forloop.py #7
PY_START: forloop.py@0 #1
LINE: forloop.py #2
BRANCH_NOT_TAKEN: forloop.py@6->12 #2->2
LINE: forloop.py #3
BRANCH_NOT_TAKEN: forloop.py@22->28 #3->3
JUMP: forloop.py@28->6 #3->2
LINE: forloop.py #2
BRANCH_TAKEN: forloop.py@22->32 #3->None
LINE: forloop.py #4
JUMP: forloop.py@36->42 #4->5
LINE: forloop.py #5
Done
PY_RETURN: forloop.py@64 #5
PY_RETURN: forloop.py@28 #7

The @XX notations are bytecode offsets, the #xx notations are line numbers obtained by mapping the offsets through co_lines().

There are two branch events from line 3. The first is BRANCH_NOT_TAKEN 22->28, but that maps to line 3->3. Bytecode 28 is a jump backward, but how can I properly interpret this as a branch not taken from line 3 to line 2?

The second event is BRANCH_TAKEN 22->32, which maps to line 3->None. What should I do with a NOP with no line number?

@nedbat
Copy link
Member

nedbat commented Sep 4, 2024

Sorry, I guess my example is duplicating #123050?

@nedbat
Copy link
Member

nedbat commented Sep 16, 2024

@markshannon @iritkatriel Is there some way we can work together to push this forward?

@jaltmayerpizzorno
Copy link

jaltmayerpizzorno commented Sep 16, 2024

Sorry, I guess my example is duplicating #123050?

Yes, I think so; with the current code in this pull, the example from #123050 also yields a NOP without a line number.

@markshannon markshannon force-pushed the monitoring-branch-taken branch from 75bed9c to c5a4b17 Compare December 11, 2024 17:40
@nedbat
Copy link
Member

nedbat commented Dec 15, 2024

@markshannon I'm pleased to see more activity here. What's the best way to talk about how it's behaving? I tried a simple program:

for i in range(10):
    a = i
assert a == 9

Using a build of this PR, I disassembled it as:

  0          0       RESUME                   0

  1          2       LOAD_NAME                0 (range)
             4       PUSH_NULL
             6       LOAD_SMALL_INT          10
             8       CALL                     1
            16       GET_ITER
      L1:   18       FOR_ITER                 6 (to L2)
            22       NOT_TAKEN
            24       STORE_NAME               1 (i)

  2         26       LOAD_NAME                1 (i)
            28       STORE_NAME               2 (a)
            30       JUMP_BACKWARD            8 (to L1)

  1   L2:   34       END_FOR
            36       POP_TOP

  3         38       LOAD_NAME                2 (a)
            40       LOAD_SMALL_INT           9
            42       COMPARE_OP              88 (bool(==))
            46       POP_JUMP_IF_TRUE         3 (to L3)
            50       NOT_TAKEN
            52       LOAD_COMMON_CONSTANT     0 (AssertionError)
            54       RAISE_VARARGS            1
      L3:   56       LOAD_CONST               0 (None)
            58       RETURN_VALUE

Looking at the sys.monitoring events, I see this:

3.14.0a2+ (heads/pr/122564:a604a085fb3, Dec 14 2024, 11:47:37) [Clang 16.0.0 (clang-1600.0.26.6)]
PY_START: foo.py@0 #0
LINE: foo.py #1
BRANCH_LEFT: foo.py@18->24 #1->1
LINE: foo.py #2
JUMP: foo.py@30->18 #2->1
LINE: foo.py #1
BRANCH_RIGHT: foo.py@18->38 #1->3
LINE: foo.py #3
BRANCH_RIGHT: foo.py@46->56 #3->3
PY_RETURN: foo.py@58 #3

(@ are byte offsets, # are line numbers)

The BRANCH_LEFT is indicating a branch from line 1 to line 1, so we never "see" the branch from 1 to 2. Is that something you can fix, or is there some way I should be adjusting for that?

@markshannon
Copy link
Member Author

There is no branch from line 1 to 2, the "left" branch is from line 1 to line 1.

This code:

1.    for i in s:
2.        A
3.    B

this is roughly equivalent to:

1.    tmp1 = iter(i); tmp2 = next(tmp1); if tmp2 is NULL goto end; s = tmp2
2.        A
   end:
3.    B

The "left" branch is from if tmp2 is NULL goto end to s = tmp2 which are both on line 1.

@markshannon markshannon marked this pull request as ready for review December 18, 2024 09:45
Doc/library/sys.monitoring.rst Show resolved Hide resolved
Python/codegen.c Outdated Show resolved Hide resolved
Python/codegen.c Outdated
}
VISIT(c, expr, (expr_ty)asdl_seq_GET(e->v.Compare.comparators, n));
ADDOP_COMPARE(c, LOC(e), asdl_seq_GET(e->v.Compare.ops, n));
ADDOP(c, LOC(e), TO_BOOL);
ADDOP_JUMP(c, LOC(e), cond ? POP_JUMP_IF_TRUE : POP_JUMP_IF_FALSE, next);
ADDOP(c, NO_LOCATION, NOT_TAKEN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peephole optimiser replaces a conditional jump with const condition by an unconditional jump and a NOP. We probably need another peepholer rule to remove any NOT_TAKEN this leaves behind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we not add NOT_TAKEN in codegen, and have the assembler inject it where it's needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry this might be not work if the optimizer merges are rearranges branches.
It is possible that the correspondence to the original source could be lost.

@jaltmayerpizzorno
Copy link

Sorry for the delay in responding: I am about to leave on a holiday trip and things tend to get hectic then.

I'd love to have this to be merged in! Being able to disable just each direction of a branch event at a time and getting a list of the possible branches without having to go through the bytecode are great additions and make these events much easier to use.

Where are we with #123050 and other related issues? Not that I think resolving them should block what's been done so far.

@markshannon
Copy link
Member Author

Where are we with #123050 and other related issues?

Let's discuss that on those issues. Being able to disable the branch events is orthogonal to mapping branch locations back to source.

Python/codegen.c Outdated
case POP_JUMP_IF_TRUE:
case POP_JUMP_IF_NONE:
case POP_JUMP_IF_NOT_NONE:
case FOR_ITER:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a macro for conditional branches next to this one

#define IS_UNCONDITIONAL_JUMP_OPCODE(opcode) \

@nedbat
Copy link
Member

nedbat commented Dec 19, 2024

There is no branch from line 1 to 2, the "left" branch is from line 1 to line 1.

Maybe first we should decide: from the user's point of view, does line 1 have two possible destinations (line 2 and line 3) or does it not? Should line 1 be considered a branch for the purposes of branch coverage?

If we can't map the collected data back to a user-understandable interpretation, then we haven't finished the feature. Can you help me puzzle through this? I'd like to get efficient branch coverage measurement. So far, we seem to be talking about different things (byte codes vs lines) and we aren't meeting in the middle.

@markshannon
Copy link
Member Author

Maybe first we should decide: from the user's point of view, does line 1 have two possible destinations (line 2 and line 3) or does it not? Should line 1 be considered a branch for the purposes of branch coverage?

Can we discuss this on the issue? This PR is about adding the ability to disable the two branches independently.

So far, we seem to be talking about different things (byte codes vs lines) and we aren't meeting in the middle.

Indeed. This PR is about disabling branches and offsets.
The other discussion, about mapping offsets to lines, is just as important. But this PR isn't the place for that discussion.

Issue #122548 or any of the related issues that @jaltmayerpizzorno opened would the place to discuss that.

@markshannon markshannon merged commit d2f1d91 into python:main Dec 19, 2024
57 checks passed
@markshannon markshannon deleted the monitoring-branch-taken branch December 19, 2024 16:59
@@ -262,6 +262,7 @@ Known values:
Python 3.14a1 3607 (Add pseudo instructions JUMP_IF_TRUE/FALSE)
Python 3.14a1 3608 (Add support for slices)
Python 3.14a2 3609 (Add LOAD_SMALL_INT and LOAD_CONST_IMMORTAL instructions, remove RETURN_CONST)
Python 3.14a3 3610 (Add NOT_TAKEN instruction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says 3610, but the implementation says 3611.

This says a3, but a3 is already released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting that.

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Dec 23, 2024
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants