Add valgrind constant-time test to `make check` #723

real-or-random · 2020-03-04T16:43:17Z

I'm not sure about the second commit. The local ./libtool works for me and I see no reason why we should use the system libtool. Let's see what travis says.

@gmaxwell
I don't understand really libtool --mode=execute to be honest. Can you explain why we need this here?

sipa · 2020-03-04T16:49:19Z

Ping @theuni: any ideas about libtool?

gmaxwell · 2020-03-04T17:39:55Z

@real-or-random libtool --mode-execute is needed because when libtool is used to build a shared library the "binary" (e.g. the benches, or valgrind executable) you run isn't a binary. -- look at it, it's a shell script and the real binary is hidden in .libs. The reason for this is so that it can run the real binary linked against your non-installed copy of the library rather than getting linked to the system library.

If you valgrind the wrapper script, that won't be terribly informative! :)

There is probably some official way to use libtool --mode=execute in a make check. I hadn't even noticed that there was a local copy, that's probably part of it.

real-or-random · 2020-03-04T17:45:33Z

Ah thanks! We had just figured that out in IRC at the same time when you replied.

But we were wondering if there is a reason why we don't simply pass -static as we do for the other tests?

theuni · 2020-03-04T19:09:08Z

Yup, passing -static makes sense. I did the same thing just the other day for benches when it was interfering with my test wrapping: theuni@a9577fd

gmaxwell · 2020-03-05T01:03:25Z

Then you're no longer testing the production libraries and small compiler differences change the constant time behaviour. The tests are compiled with special instrumentation and don't use the libraries in any case (and completely static builds are not supported by some OSes today, e.g. fedora doesn't ship static versions of most libraries anymore).

sipa

ACK

.travis.yml

real-or-random · 2020-03-05T11:59:52Z

There is probably some official way to use libtool --mode=execute in a make check. I hadn't even noticed that there was a local copy, that's probably part of it.

I couldn't find anything but the current way seems to work, so I think we should use it.

Then you're no longer testing the production libraries and small compiler differences change the constant time behaviour. The tests are compiled with special instrumentation and don't use the libraries in any case (and completely static builds are not supported by some OSes today, e.g. fedora doesn't ship static versions of most libraries anymore).

Okay, yes, I was suspecting that this is the reason. Makes sense.

theuni · 2020-03-05T17:36:08Z

That's fine if it matters for instrumentation.

But be aware (because it's really non-obvious) that in this case "-static" is a libtool flag, not a linker one. Libtool builds 2 versions of the lib regardless. -static just tells the particular binaries which to link against. It doesn't actually attempt to create a static binary.

elichai · 2020-03-08T09:58:32Z

Can you try using $(LIBTOOL) it Makefile.am? I think this is how autotools expects it to be used.
Although that means not calling the script so not sure

Moreover, this changes the way Travis runs the constant-time tests by adding a `--error-exitcode=1` parameter to valgrind. Without that parameter, valgrind simply passes the exit code through, even in the case of valgrind errors. That is, Travis would succeed even if valgrind found problems.

This has been not been caught by the new constant-time tests because valgrind currently gives us a zero exit code even if finds errors, see bitcoin-core#723 (comment) . Note that the timing leak here was the bit whether a secret key was out of range. This leak is harmless and not exploitable. It is just our overcautious practice to prefer constant-time code even here.

This has been not been caught by the new constant-time tests because valgrind currently gives us a zero exit code even if finds errors, see bitcoin-core#723 (comment) . This commit also simplifies the arithmetic in memczero. Note that the timing leak here was the bit whether a secret key was out of range. This leak is harmless and not exploitable. It is just our overcautious practice to prefer constant-time code even here.

elichai · 2020-05-02T19:03:29Z

FWIW, According to my tests in adding macOS to travis, the local libtool works on macOS but the system's libtool doesn't. apparently Apple has it's own tool named libtool which has nothing to do with the GNU libtool

from https://formulae.brew.sh/formula/libtool:

In order to prevent conflicts with Apple's own libtool we have prepended a "g"
so, you have instead: glibtool and glibtoolize.

also racket/racket#337 (comment)

elichai · 2020-06-04T07:47:55Z

Needs a rebase.
Also, maybe instead of mapping over error 127 and mapping to 77 etc we check in autotools if valgrind exists?
we currently look for the valgrind header in VALGRIND_ENABLED but we could also look for the actual executable via https://www.gnu.org/software/autoconf-archive/ax_valgrind_check.html

Summary: * Suppress a harmless variable-time optimization by clang in memczero This has been not been caught by the new constant-time tests because valgrind currently gives us a zero exit code even if finds errors, see bitcoin-core/secp256k1#723 (comment) . This commit also simplifies the arithmetic in memczero. Note that the timing leak here was the bit whether a secret key was out of range. This leak is harmless and not exploitable. It is just our overcautious practice to prefer constant-time code even here. * Add test for memczero() This is a backport of libsecp256k1 [[bitcoin-core/secp256k1#728 | PR728]] Test Plan: ninja all check check-secp256k1 Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6363

elichai · 2020-07-23T10:12:42Z

@real-or-random What's the status here?
Core might pull another libsecp update to fix because of #769 and I'd really like to see the ct tests being run by everyone running make check in core.

gmaxwell · 2020-07-23T11:40:56Z

I'm a little dubious about this running in make check in bitcoin core. You're going to get reports of errors which will be hard to address. e.g. there are ones with current gcc on ppc64. On one hand good to learn about them, OTOH there are already known ones which AFAIK no one is working on resolving. It's really bad to give users errors that you won't respond to, because it will habituate them to not reporting ones that you will respond to. (including by them googling the error and then finding messages saying it's known/ignore it)

elichai · 2020-07-23T16:06:17Z

I'm a little dubious about this running in make check in bitcoin core. You're going to get reports of errors which will be hard to address. e.g. there are ones with current gcc on ppc64. On one hand good to learn about them, OTOH there are already known ones which AFAIK no one is working on resolving. It's really bad to give users errors that you won't respond to, because it will habituate them to not reporting ones that you will respond to. (including by them googling the error and then finding messages saying it's known/ignore it)

Won't documenting those properly in the README(or an issue) will solve the problem? (and maybe someone will actually fix those)

gmaxwell · 2020-07-23T20:18:19Z

Unless the failure is extremely rare and specialized, documenting a failure thrown out by make check will just perpetually stop users from reporting other vaguely similar looking issues because they will find it and assume the authors already know or don't care.

There is also plenty of collateral damage possible. A test failure here will cause some users to feel this library is unsafe and go pick some other code that not only doesn't have constant time testing but which doesn't make any effort to be constant time and isn't. In the past we had issues where debian started refusing to upgrade their Bitcoin package after we added tests that failed on bigendian (at the time BE was totally unsupported by the software)-- debian had been shipping a completely non-functional BE package, once make check failed, the packaging process failed, and they refused to upgrade to a newer version that failed even though all the failure did was document the existing broken behaviour.

As far as users submitting fixes go-- If the worlds foremost experts in the software won't fix an issue, what chance does anyone else have?

These sorts of issues also are at high risk of being no one's bug: The language makes no promises about constant time behaviour, so a compiler is not faulty just for making obviously constant time code variable time. Simultaneously, if we can't get constant time behaviour from the compiler, there is sometimes little we can do (other than write more of the impacted routines in ASM).

The test serves a purpose because it helps find the introduction of behaviours that are grossly non-constant or subtle but easily avoided-- as a tool for reviewers.

If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search. Without first having our own comprehensive-ish coverage we couldn't really tell that any proposed fix wasn't actually making the matter worse someplace else. But that isn't where this test is right now.

I'd much rather see this PR make it a separate check-like target-- but while there are known issues I think it would be counterproductive to have uninformed end users running it. -- besides, valgrind isn't normally a dependency: So if the test optionally runs when valgrind is present you'll have make check fail or pass randomly based on if valgrind is installed.

real-or-random · 2020-07-24T09:34:29Z

Hm, I'd really like to see this test enabled by default because then there's a chance that it will be included in packaging process of distros. If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I'd really like to know about this. At the moment we're just blind. The fact that C does not guarantee constant-time compilation is exactly the reason why the user should run these tests.

If it were the case that that extensive effort had gone into making sure the tests passed on a huge variety of targets and all issues fixed such that the discovery was limiting additional improvements then there would be a case for using users to expand the search.

Yes, given unlimited resources, we should do this extensive effort, and then I see merit in your argument. But I don't believe that's realistic. Unless someone funds another three cryptographic people working full time on this library, this won't happen. With that said, I prefer to learn about issues now than never.

For users who switch to other libraries. Ok yes, there's a risk. But if we want to see adoption, there are other low-hanging fruits and we just need to get our hands dirty. There are so many open issues here that make people use other code: no example code, experimental ECDH, no public hash module (okay, debatable), no docs for manual building, and most importantly no release. And let's be honest, who's really working on this? A handful of people who have hundreds of other things that are more interesting than maintaining a library. We had better phases this year and last year but at the moment, we don't even have the resources to review incoming PRs.

edit: If we want, we could still disable the test in Core, e.g., by setting TESTS. But I think it should be enabled here by default.

ps: Urghs, I accidentally edited your comment instead of mine. Sorry, reverted...

gmaxwell · 2020-07-24T18:18:29Z

As it stands now the test fails on platforms these distributions targets! So the result would be what I pointed out above w/ debian-- suddenly the distribution refuses to upgrade to new versions because the tests fail.

Having bitcoin core not running libsecp256k1 tests at all because a constant time test fails is not a good tradeoff: To avoid a test that we know fails but only indicates behaviour that is usually inconsequential in most places bitcoin core is deployed, we'd disable detection of miscompilation that can cause a total consensus fault, direct leak of secret keys, etc. -- reports of issues that won't get ignored.

Again: We already know the test fails on some systems. You say "If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I'd really like to know about this"-- but this isn't true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.

So what happens then when users show up reporting the same issue because make check fails? We already know it will fail in at least fedora's build process (and likely debian since it already supports PPC64LE). Even if it weren't the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)-- how can you expect that anyone is going to find the time to fix any additional reported issues?

At best all this does is teach them to not run make check because it reports issues that the developers don't care about and send off a signal that this library is unmaintained. Or it fixes distributors on old worse versions that don't fail, or worst it drives users to software which isn't even remotely constant time or doesn't even have tests (which would be pretty much every alternative to this library).

sipa · 2020-07-24T18:21:43Z

Could it be enabled by default just on platforms where we expect it to work?

gmaxwell · 2020-07-24T18:35:14Z

When it fails will someone fix the failures? If not, there is only downsides from running it.

real-or-random · 2020-07-24T18:58:25Z

Again: We already know the test fails on some systems. You say "If the Debian/Fedora/Arch Linux secp256k1 packaging process uses a compiler that produces variable-time code, then I'd really like to know about this"-- but this isn't true: We already know the fedora compiler on PPC64LE fails this test, and no one is interested in doing anything about it.

I think this should be fixed I'm very interested in doing something about it! But I didn't know it. It's not documented anywhere (except that you hinted at it above). We need to write this down at least.

Even if it weren't the case that it already failed, if no one maintaining this library can afford the time to spend a couple afternoons first getting this likely-to-fail test run one time on those distro targets (e.g. by finding people with relevant systems and begging them to run the tests, or asking a distro packager to do an experimental package build that runs the test)-- how can you expect that anyone is going to find the time to fix any additional reported issues?

Why is it set in stone that it's a couple of afternoons? The last one took me a few minutes (#728). Sure, this was on a system I could test on but in the end I looked more at compiler output than running tests. But if we don't get these reports, we don't even know about the issues.

real-or-random · 2020-07-24T19:00:54Z

Could it be enabled by default just on platforms where we expect it to work?

That sounds reasonable as a first thing to do.

When it fails will someone fix the failures? If not, there is only downsides from running it.

I believe when it fails on a platform where we except it to work, that's a bug, and sure we'll fix bugs.

gmaxwell · 2020-07-24T20:34:28Z

I think this should be fixed I'm very interested in doing something about it! But I didn't know it. It's not documented anywhere (except that you hinted at it above).

I reported it.

#708 (comment)

and

http://gnusha.org/secp256k1/2020-01-11.log at 11:29 and 11:37

(and probably in several other places if I were to go search)

Working on this and repetitively getting zero response from anyone is extremely demoralizing.

I expect that it also fails elsewhere, but why bother testing more exotic platforms when there is a known failure in a less exotic one that no one seems interested in? I stopped testing more systems once I saw no interest in this known failure.

Similarly, why waste my time arguing me in a loop over this without pointing out that you didn't know that it already failed? Your response is much less surprising to me now that I realize you didn't know that it was already known to fail. If I thought that the test passed everywhere I would have been somewhat more supportive of running it by default.

Why is it set in stone that it's a couple of afternoons? The last one took me a few minutes

It usually doesn't take much effort to fix a constant time behaviour ... Except for complicated ones like "oh gee, we can't use < in constant time code"-- for that I think we're facing a big complicated rewrite, potential performance loss, and having to retest all over to make sure the fix doesn't move the non-constant time bug to another platform.

But whatever effort it takes to fix an issue it takes much less effort to get a single passing test result, which was really my point.

If the project can't find the effort to get some tests run then it's not going to be able to muster the effort to fix issues that crop up. Without the ability to retest fixes broadly you can't be confident that a "fix" isn't making the problem worse except on platforms you can test. Having a year long iteration cycle where you try a fix for a users platform then 6 months later learn you broke a different user's platform (potentially an established one with many users) isn't a convergent process.

The best you can really do is have a set of platforms you're able to test (either directly or via community help) and be able to make some promises that the tests pass on those platforms. Anything you can't test a proposed fix on is just a crapshoot because any change to the software, particularly any constant time fix for another uncommon platform, might break them.

hebasto · 2023-02-22T15:21:37Z

Needs a rebase?

real-or-random · 2023-02-22T18:26:36Z

Well, yeah, if we want this at all.

Enabling it by default just on platforms where we expect it to work still sounds reasonable to me.

real-or-random · 2023-04-11T12:22:50Z

Could it be enabled by default just on platforms where we expect it to work?

I think we should do this. If not, we should at least enable it in dev-mode, so that we contributors run tests on a regular basis.

Widely available version of GCC and Clang beat our field asm on -O2. In particular, GCC 10.0.5, which is Bitcoin Core's current compiler for official x86_64 builds, produces code that is > 20% faster for fe_mul and > 10% faster for signature verification (see bitcoin-core#723). These are the alternatives to this PR: We could replace our current asm with the fastest compiler output that we can find. This is potentially faster, but it has multiple drawbacks: - It's more coding work because it needs detailed benchmarks (e.g., with many compiler/options). - It's more review work because we need to deal with inline asm (including clobbers etc.) and there's lack of experts reviewers in this area. - It's not unlikely that we'll fall behind again in a few compiler version, and then we have to deal with this again, i.e., redo the benchmarks. Given our history here, I doubt that we'll revolve this timely. We could change the default of the asm build option to off. But this will also disable the scalar asm, which is still faster. We could split the build option into two separate options for field and scalar asm and only disable the field asm by default. But this adds complexity to the build and to the test matrix. My conclusion is that this PR gets the low-hanging fruit in terms of performance. It simplifies our code significantly. It's clearly an improvement, and it's very easy to review. Whether re-introducing better asm (whether from a compiler or from CryptOpt) is worth the hassle can be evaluated separately, and should not hold up this improvement. Solves bitcoin-core#723.

Widely available versions of GCC and Clang beat our field asm on -O2. In particular, GCC 10.0.5, which is Bitcoin Core's current compiler for official x86_64 builds, produces code that is > 20% faster for fe_mul and > 10% faster for signature verification (see bitcoin-core#723). These are the alternatives to this PR: We could replace our current asm with the fastest compiler output that we can find. This is potentially faster, but it has multiple drawbacks: - It's more coding work because it needs detailed benchmarks (e.g., with many compiler/options). - It's more review work because we need to deal with inline asm (including clobbers etc.) and there's lack of experts reviewers in this area. - It's not unlikely that we'll fall behind again in a few compiler versions, and then we have to deal with this again, i.e., redo the benchmarks. Given our history here, I doubt that we'll revolve this timely. We could change the default of the asm build option to off. But this will also disable the scalar asm, which is still faster. We could split the build option into two separate options for field and scalar asm and only disable the field asm by default. But this adds complexity to the build and to the test matrix. My conclusion is that this PR gets the low-hanging fruit in terms of performance. It simplifies our code significantly. It's clearly an improvement, and it's very easy to review. Whether re-introducing better asm (whether from a compiler or from CryptOpt) is worth the hassle can be evaluated separately, and should not hold up this improvement. Solves bitcoin-core#723.

Widely available versions of GCC and Clang beat our field asm on -O2. In particular, GCC 10.0.5, which is Bitcoin Core's current compiler for official x86_64 builds, produces code that is > 20% faster for fe_mul and > 10% faster for signature verification (see bitcoin-core#723). These are the alternatives to this PR: We could replace our current asm with the fastest compiler output that we can find. This is potentially faster, but it has multiple drawbacks: - It's more coding work because it needs detailed benchmarks (e.g., with many compiler/options). - It's more review work because we need to deal with inline asm (including clobbers etc.) and there's a lack of experts reviewers in this area. - It's not unlikely that we'll fall behind again in a few compiler versions, and then we have to deal with this again, i.e., redo the benchmarks. Given our history here, I doubt that we'll revolve this timely. We could change the default of the asm build option to off. But this will also disable the scalar asm, which is still faster. We could split the build option into two separate options for field and scalar asm and only disable the field asm by default. But this adds complexity to the build and to the test matrix. My conclusion is that this PR gets the low-hanging fruit in terms of performance. It simplifies our code significantly. It's clearly an improvement, and it's very easy to review. Whether re-introducing better asm (whether from a compiler or from CryptOpt) is worth the hassle can be evaluated separately, and should not hold up this improvement. Solves bitcoin-core#723.

real-or-random · 2024-07-01T16:23:12Z

Tracked in #1560

real-or-random force-pushed the 202003-ct-make-check branch from e4af0f4 to 540dbaf Compare March 4, 2020 17:18

sipa reviewed Mar 5, 2020

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

real-or-random force-pushed the 202003-ct-make-check branch 2 times, most recently from bf9c81e to 110bc4f Compare March 5, 2020 11:53

real-or-random force-pushed the 202003-ct-make-check branch from 110bc4f to c81ae4c Compare March 6, 2020 11:31

real-or-random mentioned this pull request Mar 24, 2020

Context isn't freed in the ECDH benchmark #722

Merged

real-or-random added 2 commits March 25, 2020 09:46

Run valgrind constant-time tests with locally generated ./libtool

35a648f

real-or-random force-pushed the 202003-ct-make-check branch from c81ae4c to 35a648f Compare March 25, 2020 08:48

real-or-random mentioned this pull request Mar 25, 2020

Suppress a harmless variable-time optimization by clang in memczero #728

Merged

elichai mentioned this pull request May 3, 2020

Add macOS to the CI #750

Merged

real-or-random added this to the initial release (1.0.0-rc.1) milestone May 8, 2020

real-or-random mentioned this pull request Aug 11, 2020

secp256k1_scalar_check_overflow is not constant time on S390 #784

Open

real-or-random mentioned this pull request Jan 30, 2023

build: Add CMake-based build system #1113

Merged

real-or-random mentioned this pull request Nov 23, 2023

field: Remove x86_64 asm #1446

Merged

This was referenced Jul 1, 2024

discussion: Config used by Core vs defaults and config in CI/testing #1549

Closed

Run constant-time tests as part of test suite (except on rare platforms to be problematic) #1560

Open

real-or-random closed this Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add valgrind constant-time test to `make check` #723

Add valgrind constant-time test to `make check` #723

real-or-random commented Mar 4, 2020

sipa commented Mar 4, 2020

gmaxwell commented Mar 4, 2020

real-or-random commented Mar 4, 2020 •

edited

Loading

theuni commented Mar 4, 2020

gmaxwell commented Mar 5, 2020

sipa left a comment

real-or-random commented Mar 5, 2020

theuni commented Mar 5, 2020

elichai commented Mar 8, 2020

elichai commented May 2, 2020

elichai commented Jun 4, 2020 •

edited

Loading

elichai commented Jul 23, 2020

gmaxwell commented Jul 23, 2020

elichai commented Jul 23, 2020

gmaxwell commented Jul 23, 2020 •

edited by real-or-random

Loading

real-or-random commented Jul 24, 2020 •

edited

Loading

gmaxwell commented Jul 24, 2020

sipa commented Jul 24, 2020

gmaxwell commented Jul 24, 2020

real-or-random commented Jul 24, 2020

real-or-random commented Jul 24, 2020

gmaxwell commented Jul 24, 2020 •

edited

Loading

hebasto commented Feb 22, 2023

real-or-random commented Feb 22, 2023

real-or-random commented Apr 11, 2023

real-or-random commented Jul 1, 2024

Add valgrind constant-time test to make check #723

Add valgrind constant-time test to make check #723

Conversation

real-or-random commented Mar 4, 2020

sipa commented Mar 4, 2020

gmaxwell commented Mar 4, 2020

real-or-random commented Mar 4, 2020 • edited Loading

theuni commented Mar 4, 2020

gmaxwell commented Mar 5, 2020

sipa left a comment

Choose a reason for hiding this comment

real-or-random commented Mar 5, 2020

theuni commented Mar 5, 2020

elichai commented Mar 8, 2020

elichai commented May 2, 2020

elichai commented Jun 4, 2020 • edited Loading

elichai commented Jul 23, 2020

gmaxwell commented Jul 23, 2020

elichai commented Jul 23, 2020

gmaxwell commented Jul 23, 2020 • edited by real-or-random Loading

real-or-random commented Jul 24, 2020 • edited Loading

gmaxwell commented Jul 24, 2020

sipa commented Jul 24, 2020

gmaxwell commented Jul 24, 2020

real-or-random commented Jul 24, 2020

real-or-random commented Jul 24, 2020

gmaxwell commented Jul 24, 2020 • edited Loading

hebasto commented Feb 22, 2023

real-or-random commented Feb 22, 2023

real-or-random commented Apr 11, 2023

real-or-random commented Jul 1, 2024

Add valgrind constant-time test to `make check` #723

Add valgrind constant-time test to `make check` #723

real-or-random commented Mar 4, 2020 •

edited

Loading

elichai commented Jun 4, 2020 •

edited

Loading

gmaxwell commented Jul 23, 2020 •

edited by real-or-random

Loading

real-or-random commented Jul 24, 2020 •

edited

Loading

gmaxwell commented Jul 24, 2020 •

edited

Loading