Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drmemory 2.6.19984/cronbuild on Win10/Workstation on IBM 7945/XEON crashes with ASSERT FAILURE #2514

Open
gisburn opened this issue Sep 21, 2024 · 6 comments

Comments

@gisburn
Copy link

gisburn commented Sep 21, 2024

Describe the bug
drmemory crashes with the following assertion:
---- snip ----

$ drmemory -debug -- myhorribleapp.exe
~~Dr.M~~ Dr. Memory version 2.6.19984
~~Dr.M~~ Running "C:\myhorribleapp.exe"
~~Dr.M~~ Using system call file C:\Users\roland_mainz\AppData\Roaming\Dr. Memory\symcache\syscalls_x64.txt
~~Dr.M~~ ERROR: Failed to find "main" for limiting memory dump
ASSERT FAILURE
D:\a\drmemory\drmemory\ext\drx\scatter_gather_x86.c:122: proc_avx_enabled() (Scatter/gather instrs not available)

---- snip ----

To Reproduce
Steps to reproduce the behavior:

  1. Get a IBM 7945/XEON machine
  2. Install Win10/Workstation on it
  3. $ bcdedit.exe /set nx AlwaysOff ## (just to be sure)
  4. Run any app with $ drmemory -debug -- myhorribleapp.exe

Expected behaviour
drmemory works

** Actual behaviour:
Crash

Versions

  • Windows version:
    Microsoft Windows [Version 10.0.19045.4894]
  • What version of Dr. Memory are you using?
    $ drmemory -version
    Dr. Memory version 2.6.19984 -- build 0
  • Does the latest build from
    https://drmemory.org/page_download.html#sec_latest_build solve the problem?
    No
  • Is your application 32-bit or 64-bit?
    64bit
@gisburn
Copy link
Author

gisburn commented Sep 21, 2024

The ASSERT which gets triggered is https://github.com/DynamoRIO/dynamorio/blob/7f7544b6f593dcdbb2213fca3922055092b35af1/ext/drx/scatter_gather_x86.c#L122 - it seems it wants to use AVX on a machine which has no AVX.

@derekbruening Is AVX now mandatory for DrMemory on Windows 10/64bit AMD64 ?

@derekbruening
Copy link
Contributor

The default builds are whatever the compiler defaults are on the Github Actions VMs: which are generally as old as Github supports (VS2019, Ubuntu20); I don't think it's reasonable to ask the small set of existing maintainers to support more than that as maintaining the existing multiple platforms is challenging enough. This is an open-source project so you could certainly do a custom build with flags telling the compiler not to use AVX. You could submit a pull request adding CMake options for that and a regression test.

@cedricblancher
Copy link

cedricblancher commented Sep 24, 2024

The default builds are whatever the compiler defaults are on the Github Actions VMs: which are generally as old as Github supports (VS2019, Ubuntu20);

Just my 0,03 francs:

  1. Most universities and scientific institutes get their servers replaced every 8-10 years. So far ALL of our XEON machines do NOT support the avx extension, and still have a lifetime of 6 years left.
    It is a instruction set extension after all!!

  2. Most industry computers, e.g. SIMATIC PCs, use CPUs which have long-term availability (>20 years) by the vendor (Intel). As far I have tested this morning none of the current and up to date SIMATIC and Rockwell PC lineup supports the avx extension. Not being able to run drmemory on such systems is a BIG DEAL, as this is industry which actually throws money at stuff. But maybe not in this case, as valgrind is running perfectly fine on such machines.

  3. Misconception: @gisburn is wrong, you don't have to compile drmemory without avx support.
    Just drmemory should not make avx support fatally mandatory if no avx extension support is available in the cpu. No avx support in the cpu means it is not needed to expect avx instructions and bail out with an error for that. if(has_avx_support()) {...} related code, and the bug is gone

@derekbruening
Copy link
Contributor

Misconception: @gisburn is wrong, you don't have to compile drmemory without avx support.
Just drmemory should not make avx support fatally mandatory if no avx extension support is available in the cpu. No avx support in the cpu means it is not needed to expect avx instructions and bail out with an error for that. if(has_avx_support()) {...} related code, and the bug is gone

It sounds like you know how to fix it and could send a pull request?

If the compilers are not emitting AVX then it should indeed work. Looking at that drx code: looks like an oversight with no deliberate intention; looks like it has just #ifdef PLATFORM_SUPPORTS_SCATTER_GATHER guarding the scatter-gather setup when it should also have the proc_avx_enabled() for x86 (and SVE check for AArch64). This code is from December 2021 DynamoRIO/dynamorio#5252 so it sounds like non-AVX usage is actually pretty rare for any DynamoRIO-based tool as no one had reported any issue in the last 3 years. Pinging the author to raise awareness: @abhinav92003

This would affect any DynamoRIO tool using the drx library, so I would suggest filing a companion bug in the DynamoRIO tracker. It is best for you to file it to A) show this is a real issue hit in the real world; B) make it easier for you to confirm it solves issues on your machines; C) you will get updates to that issue.

The most important and maybe most difficult part is: how to add a regression test to prevent accidental breakage of non-AVX in the future? Ideally it would be run on a machine that actually has no AVX: a donated self-hosted Github Actions runner from someone with one of these machines? Just a build and running a couple of tests would be enough. Halfway solutions would be flags to pretend cpuid returns no AVX (only tests certain paths); run DR under itself to look for AVX opcodes (a la drcpusim; but again only tests certain aspects such as compiler emitting AVX, though it can also fake cpuid).

@derekbruening
Copy link
Contributor

derekbruening commented Sep 24, 2024

I have not tested it on a non-AVX machine, but that code in question looks like it will run just fine in release build: so this is a DR-debug-build-only issue, is that right? That may explain why no one else has hit it. That also means there is a workaround when using DR debug build: -ignore_assert_list "*", or -dr_ops "-ignore_assert_list *" from drmemory.

@abhinav92003
Copy link

The use of PLATFORM_SUPPORTS_SCATTER_GATHER without proc_avx_enabled to guard the scatter-gather setup existed even before DynamoRIO/dynamorio#5252 (see DynamoRIO/dynamorio#3955 where PLATFORM_SUPPORTS_SCATTER_GATHER was introduced). But looks like DynamoRIO/dynamorio#5252 added init- and exit-time logic (the get_mov_scratch_mm_opcode_and_size calls in drx_scatter_gather_thread_init and drx_scatter_gather_thread_exit) that would be invoked without seeing an actual scatter-gather instruction and would fail if !proc_avx_enabled.

but that code in question looks like it will run just fine in release build

Yes it does look like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants