Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new cache management instruction support to drmemtrace #7111

Open
derekbruening opened this issue Dec 9, 2024 · 6 comments
Open

Add new cache management instruction support to drmemtrace #7111

derekbruening opened this issue Dec 9, 2024 · 6 comments
Assignees

Comments

@derekbruening
Copy link
Contributor

My pending comment in #7109 which adds CLFLUSHOPT opcode decoding support:

drmemtrace needs to be udpated as well: it looks like CLFLUSHOPT also invalidates all cache levels and so should be treated like CLFLUSH in memtraces (or would a simulator want to know about the subtle differences?). So just adding to 2 spots in instru.cpp https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instru.cpp#L227 and https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instru.cpp#L244 should be enough (optionally extend invariant_checker_test). Hmm. Maybe enough questions this should be separated out.

@derekbruening
Copy link
Contributor Author

We may also want to mark other cache ops such as CLWB and CLDEMOTE.

derekbruening added a commit that referenced this issue Dec 10, 2024
Adds handling to drmemtrace for the x86 CLFLUSHOPT opcode to treat it
just like CLFLUSH; the two are similar enough for most uses.

Augments the existing CLFLUSH invariant_checker test as a sanity test
of the new opcode handling.

Issue: #7111
derekbruening added a commit that referenced this issue Dec 10, 2024
Adds handling to drmemtrace for the x86 CLFLUSHOPT opcode to treat it
just like CLFLUSH; the two are similar enough for most uses.

Augments the existing CLFLUSH invariant_checker test as a sanity test of
the new opcode handling.

Issue: #7111
@derekbruening derekbruening changed the title Add CLFLUSHOPT support to drmemtrace Add new cache management instruction support to drmemtrace Dec 11, 2024
@derekbruening
Copy link
Contributor Author

In fact, e24dbb0 has CLWB marked as a load: which it is not. If we do nothing, the trace will have it labeled as a read==load record. That highlights that this affects all analyzers and not just those that want to model cache behavior: those looking at loads will do the wrong thing. So this is not just about augmenting the trace but correcting the trace.

Today a functional cache simulator like dr$sim doesn't need any architecture-specific code: everything that affects the caches is a first-class record type. If we don't do the same with these operations, we're forcing dr$sim and others to add decoding and architecture-specific handling of the various cache operations. The complexity of figuring out what these operations do has to live somewhere: the question is, does it live in the trace itself and the consumers like dr$sim remain arch-agnostic and simple, or does it live in the consumers and they become complicated? It could live in a library to share among analyzers.

We added the whole series of TRACE_TYPE_PREFETCH_INSTR_L1, TRACE_TYPE_PREFETCH_INSTR_L1_NT, etc. record types, along with cache flush types, back when we didn't have embedded encodings: so there was no simple way to get that info (though you could map the binaries and decode it was rarely done). So is there less of an argument to continue in the pre-existing approach of having all cache-related operations as first-class record types because it is now a little easier to decode?

Since we have to correct the marking of this and related operations as loads, we need some kind of new record type. Trying to boil this down to these choices:

  • A. Add a catch-all "cache operation" first-class record type and:
    1. Add trace markers with the particular cache operations. This tries to avoid breaking analyzers/simulators by having to deal with new top-level record types: new markers are more ignorable as they are just hints.
    2. Add a library that provides an enum of cache operations produced from decoding: basically putting the subtypes into a library's enum instead of trace record types.
    3. Leave it up to each analyzer to decode with ISA-specific dispatch to figure out the exact semantics.
  • B. Add first-class record types for each type of operation: this aligns with what is already there for prefetches and cache flushes.

@khuey
Copy link
Contributor

khuey commented Dec 11, 2024

In fact, e24dbb0 has CLWB marked as a load: which it is not. If we do nothing, the trace will have it labeled as a read==load record. That highlights that this affects all analyzers and not just those that want to model cache behavior: those looking at loads will do the wrong thing. So this is not just about augmenting the trace but correcting the trace.

fwiw I have a similar issue. Our use of DR is confined to the decoding/encoding components. We examine blocks of instructions and capture their memory inputs so that the block can later be re-executed (with appropriate rewriting) outside of the original program by providing the initial register states and said memory inputs. We end up special casing the prefetch/cache flush instructions to avoid unnecessarily capturing their input memory operands precisely because they are not architectural loads.

@derekbruening
Copy link
Contributor Author

Should there be an IR API routine to query whether a load actually issues a regular load? A different version of opc_is_not_a_real_memory_load(); and maybe export that one too.

@khuey
Copy link
Contributor

khuey commented Dec 12, 2024

Yeah that might be nice. I looked at adding an equivalent of instr_is_prefetch() for these cache flushing instructions but it looked non-trivial for ARM because there's not a separate opcode there.

@derekbruening
Copy link
Contributor Author

Yeah that might be nice. I looked at adding an equivalent of instr_is_prefetch() for these cache flushing instructions but it looked non-trivial for ARM because there's not a separate opcode there.

Some of us have lobbied to separate the DR versions of some Arm opcodes to make these things easier. See the discussion at
#4386 (comment) on
opcode philosophy of separate opcodes for separate semantics; also xref #4388 on making drmemtrace's Arm prefetch operation dispatch cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants