Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6417 x86-32 client.flush: Fix tag for client events #6480

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Nov 29, 2023

Fixes the client.flush test on x86-32 by passing the correct tag to the
bb instrumentation event.

The failure is due to a mismatch between the tag provided to the
instrumentation callback and that provided to the fragment delete
callback. For the vsyscall page fragment on x86-32, the instrumentation
event gets as the tag the pc of the first non-deleted instruction in
vsyscall but the fragment delete event gets the address of the first
instruction.

Issue: #6417

@abhinav92003
Copy link
Contributor Author

The client.flush failure seems related to hook_vsyscall. Doesn't happen when -no_hook_vsyscall. Here are some logs: https://gist.github.com/abhinav92003/38d38c7ad6e1d9921ae7a59bd6b389b2. tmate doesn't allow too long history in the terminal, so had to limit the output using grep.

@abhinav92003 abhinav92003 changed the title i#6417 x86-32 flakiness: debug client.flush i#6417 x86-32 flakiness: Use dr_fragment_app_pc in client.flush test Dec 13, 2023
@abhinav92003 abhinav92003 changed the title i#6417 x86-32 flakiness: Use dr_fragment_app_pc in client.flush test i#6417 flaky x86-32: Fix client.flush test failures Dec 13, 2023
@abhinav92003 abhinav92003 marked this pull request as ready for review December 13, 2023 18:34
@abhinav92003
Copy link
Contributor Author

abhinav92003 commented Dec 13, 2023

@derekbruening The CI shows green for 7c449a2, which runs only client.flush on x86-32.

@abhinav92003
Copy link
Contributor Author

Verified that client.flush didn't fail in the latest x86-32 CI run too.

@@ -79,7 +79,7 @@ find(void *tag)
}

static void
increment(void *tag)
increment(app_pc tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a deeper issue with the DR interface. The tag is supposed to be the canonical identifier and should track the fragment from creation to deletion. I thought the tag was already tweaked for hooks and I was disturbed when you described it varying: that is not supposed to happen. I think we have to fix that underlying issue. The alternative of changing the docs to say "ignore the tag" seems too disruptive and I suspect quite a few tools, both our own code and 3rd-party, would have to be rewritten. Some very large and complex tools like Dr. Memory use tags for this type of tracking -- and they have been run extensively on Windows which has many more hooks than just this 32-bit-only vsyscall on Linux. That's what's so confusing: if the tag is really wrong at creation time, why didn't all these other tools break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you mentioned that clients should be using dr_fragment_app_pc and not just using the tag directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If clients want to get a real PC (e.g., to attribute to a library/executable) they should use dr_fragment_app_pc: they should not assume a tag is anything but a random number.

There are numerous other interfaces which take tags: dr_replace_fragment, dr_fragment_exists_at, dr_mark_trace_head, etc. If tags really aren't reliable how did all of those work all this time? If we say no one should use tags, we have to change all those interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document anywhere that the fragment tag can be used directly? On Windows, dr_fragment_app_pc seems to be doing more work and perhaps is more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. you mean clients should still be able to uniquely identify fragments using tags. They shouldn't need to go through dr_fragment_app_pc for just identifying fragments. That I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two directions of conversion, tag to pc and pc to tag. The underlying code is get_app_pc_from_intercept_pc and get_intercept_pc_from_app_pc. I think we need to add the this Linux hok to the latter to solve this and get the right tag. See the call to that function -- I guess guarded by could_be_hook_occluded_pc() so that needs a look too. That's where bb->start_pc is updated. bb->start_pc becomes the tag. Err...later there is another game played with selfmod....I guess ignore that for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document anywhere that the fragment tag can be used directly?

bb event says " * - \p tag is a unique identifier for the basic block fragment.

  • Use dr_fragment_app_pc() to translate it to an application address."

@abhinav92003 abhinav92003 changed the title i#6417 flaky x86-32: Fix client.flush test failures i#6417 x86-32 client.flush: Fix tag for client events Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants