Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Hacl_AES_128_GCM_NI and Hacl_AES_128_GCM_M32 #418

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

mamonet
Copy link
Member

@mamonet mamonet commented Jul 15, 2023

This patch adds the following changes:

  • Extracted C files of Hacl_AES_128_GCM_NI and Hacl_AES_128_GCM_M32 from Low* modules
  • Support x86 and AArch64 C intrinsics for AES and CLMUL/PMULL in libintvector.h
  • Detect cpu features of aes and pmull on AArch64
  • Integrate functions of both NI and M32 modules with EverCrypt_AEAD_encrypt/decrypt() conditionally using cpu features
  • Add BearSSL CT64 implementation of AES-GCM in third-party directory for benchmark reference
  • Impalement benchmark module for AES-GCM mode that tests encrypt and AAD functions for various message sizes

@mamonet mamonet requested a review from a team as a code owner July 15, 2023 12:54
@cla-bot
Copy link

cla-bot bot commented Jul 15, 2023

We require contributors to sign our Contributor License Agreement https://github.com/cryspen/hacl/blob/main/CLA.md ensuring that the contribution can be licensed under Apache 2.0 and MIT. In order for us to review and merge your code, please mention @cryspen/core in a comment below to get yourself added.

@mamonet mamonet closed this Jul 15, 2023
@mamonet mamonet reopened this Jul 15, 2023
@cla-bot
Copy link

cla-bot bot commented Jul 16, 2023

We require contributors to sign our Contributor License Agreement https://github.com/cryspen/hacl/blob/main/CLA.md ensuring that the contribution can be licensed under Apache 2.0 and MIT. In order for us to review and merge your code, please mention @cryspen/core in a comment below to get yourself added.

@coveralls
Copy link

coveralls commented Jul 16, 2023

Pull Request Test Coverage Report for Build 5808703668

  • 3051 of 7433 (41.05%) changed or added relevant lines in 12 files are covered.
  • 11 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-1.4%) to 51.361%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/Hacl_AES_128_GCM_CT64.c 105 153 68.63%
src/Hacl_AES_256_GCM_CT64.c 105 153 68.63%
src/Hacl_Gf128_CT64.c 1670 1756 95.1%
src/Hacl_AES_256_GCM_NI.c 0 129 0.0%
src/Hacl_AES_128_CTR32_BitSlice.c 698 976 71.52%
src/Hacl_Gf128_NI.c 0 297 0.0%
src/Hacl_AES_128_GCM_NI.c 0 353 0.0%
src/Hacl_AES_256_CTR32_BitSlice.c 206 570 36.14%
src/EverCrypt_AEAD.c 118 517 22.82%
src/Hacl_AES_128_CTR32_NI.c 0 1020 0.0%
Files with Coverage Reduction New Missed Lines %
src/EverCrypt_AEAD.c 11 26.49%
Totals Coverage Status
Change from base Build 5655291623: -1.4%
Covered Lines: 31975
Relevant Lines: 62256

💛 - Coveralls

@franziskuskiefer
Copy link
Member

Thanks! I'll try to look at this today or tomorrow.
It looks like some CI jobs are failing. Can you have a look at them?

@mamonet
Copy link
Member Author

mamonet commented Jul 18, 2023

MSVC specific tests fail along with some other tests, I'm going to investigate what triggers that.

@cla-bot cla-bot bot added the cla-signed label Jul 18, 2023
Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks really good already!

I left some comments mostly on the build and structure. I'll do another pass in the coming days on the actual code.

cpu-features/src/cpu-features.c Show resolved Hide resolved
@@ -186,15 +218,37 @@ hacl_init_cpu_features()
_sse42 = (ecx & ECX_SSE4_2) != 0;
#endif

#if defined(CPU_FEATURES_LINUX) && defined(CPU_FEATURES_ARM64) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has only been used for tests so far and is hence not entirely complete.
If we want to use it for the AES code we should add the Windows bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't have arm64 machine with Windows on it neither have access to such instance. I'd like to know any possible options to achieve that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine not doing Windows on Arm, but we should use your new AESGCM implementation on Windows when we can't use the evercrypt version. So we should make sure that the feature detection here works on Windows x86/x64. Alternatively we'll always use the portable implementation on Windows, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this the default execution flow already, hacl_init_cpu_features() doesn't do any configuration on Windows which means the global variables are left with the initial value 0. It flows down ultimately to the portable implementation.

At first, I thought you would mean to implement feature detection on Windows ARM. It makes sense to have it works at least for x86 which is more important. I will see what I can do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. You can use the __cpuid function on Windows (https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-170)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__cpuid function is a good choice to avoid having assembly file that utilizes cpuid instruction since inline assembly isn't supported for MSVC x64. We still need to have inline assembly of cpuid for other compilers on Windows, fortunately we can use cpuid() that already exist for Linux but I don't understand why this function preserves ebx register for 32-bit version https://github.com/cryspen/hacl-packages/blob/main/cpu-features/src/cpu-features.c#L65 and not do that for x64 version. I disassembled x64 version of the function built with clang and the generated machine code does preserve rbx register since it's specified in the input operands https://github.com/cryspen/hacl-packages/blob/main/cpu-features/src/cpu-features.c#L51
Technically, both versions are supposed to work properly I just think it's good to have identical scheme for both if possible. 32-bit scheme seems more reasonable to me because I'm not sure why the compiler would preserve input operand of inline assembly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change #elif defined(__GNUC__) to #elif defined(__linux__) mamonet@4b5f3ed#diff-fff0c10580c2c65492a930241a62b51ac2cc56c8f0ff4ea59f5a42e38d43bd62L30
Let me know if it implies other conditions I didn't cover

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work.

config/toolchain.cmake Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
include/Hacl_AES_128_BitSlice.h Outdated Show resolved Hide resolved
include/Hacl_AES_128_BitSlice.h Outdated Show resolved Hide resolved
@mamonet
Copy link
Member Author

mamonet commented Jul 25, 2023

I am having difficulty reproducing opam build error, opam install . --verbose --with-test --yes proceeds successfully on my setup. Do you have any suggestions?

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adressing the comments.
A few more thoughts.

@@ -186,15 +218,37 @@ hacl_init_cpu_features()
_sse42 = (ecx & ECX_SSE4_2) != 0;
#endif

#if defined(CPU_FEATURES_LINUX) && defined(CPU_FEATURES_ARM64) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine not doing Windows on Arm, but we should use your new AESGCM implementation on Windows when we can't use the evercrypt version. So we should make sure that the feature detection here works on Windows x86/x64. Alternatively we'll always use the portable implementation on Windows, right?

include/Hacl_AES_128_BitSlice.h Outdated Show resolved Hide resolved
include/Hacl_AES_128_BitSlice.h Outdated Show resolved Hide resolved
CMakeLists.txt Outdated
@@ -284,6 +321,37 @@ if(TOOLCHAIN_CAN_COMPILE_VALE)
set(HACL_CAN_COMPILE_VALE 1)
endif()

if(TOOLCHAIN_CAN_COMPILE_AESNI_PCLMUL AND TOOLCHAIN_CAN_COMPILE_VEC128 AND ((ARCHITECTURE MATCHES intel AND TOOLCHAIN_CAN_COMPILE_VEC256) OR HACL_TARGET_ARCHITECTURE MATCHES ${HACL_ARCHITECTURE_ARM64}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break this line? It's a little long and hard to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

bytes ciphertext(state.range(0) + 16, 0);

for (auto _ : state) {
Lib_IntVector_Intrinsics_vec128 *ctx = (Lib_IntVector_Intrinsics_vec128 *)KRML_HOST_CALLOC((uint32_t)352U, sizeof (uint8_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be an alloc function such that the caller/user doesn't need to know the magic numbers here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a convenient way to add such function. I don't seem to be able to alloc/dealloc memory buffers at Low*-level, these functionalities are specific to this library which only utilizes NI and CT64 function from inside EverCrypt functions that the user are supposed to use instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at other alloc functions like

Hacl_Streaming_Keccak_state *Hacl_Streaming_Keccak_malloc(Spec_Hash_Definitions_hash_alg a)
or
Lib_IntVector_Intrinsics_vec128 *Hacl_Blake2s_128_blake2s_malloc(void)
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented malloc/free for AES_GCM context on hacl side mamonet/hacl-star@c9880f1
I will update it for this library once the issue of verifying changes of EverCrypt.AEAD is solved.

Lib_IntVector_Intrinsics_vec128 *ctx = (Lib_IntVector_Intrinsics_vec128 *)KRML_HOST_CALLOC((uint32_t)352U, sizeof (uint8_t));
Hacl_AES_128_GCM_NI_aes128_gcm_init(ctx, key.data());
Hacl_AES_128_GCM_NI_aes128_gcm_encrypt(ctx, plaintext.size(), ciphertext.data(), plaintext.data(), 0, NULL, nonce.size(), nonce.data());
KRML_HOST_FREE(ctx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the alloc, do we need a free function here.


#include "internal/Hacl_Lib.h"

typedef struct __uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type needs a better name :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a KaRaMeL artifact.
We can try to rename things in Low* to make this look a bit better, but this may need an upstream change in KaRaMeL.

Copy link
Collaborator

@msprotz msprotz Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pick a name for this, by writing

let meaningful_name = uint64 & uint64 & ... & uint64

(or whatever the structural type is, on the right-hand side)

this has been done for a while in HACL to improve code quality... hit me up if it doesn't work or if you need a different kind of fix

}
__uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t;

static __uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t_uint64_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@franziskuskiefer
Copy link
Member

The ocaml build seems to fail.

@mamonet
Copy link
Member Author

mamonet commented Aug 5, 2023

I updated the PR with the following changes

  • Add AES-256 modules
  • Implement GHASH CT64
  • Integrate NI and CT64 functions in EverCrypt_AEAD_encrypt/decrypt_expand_aes*_gcm() and EverCrypt_AEAD_encrypt/decrypt_expand_aes*_gcm_no_check()

New implementations have passed Wycheproof tests successfully.

GHASH performance measured on intel i5-10300H, built with clang -O3

Low* (CT64) BearSSL (CT64) OpenSSL (no-asm)
12.349148 cpb 12.715595 cpb 13.449064 cpb

@mamonet
Copy link
Member Author

mamonet commented Aug 6, 2023

The imported C files of AESGCM are generated from this branch https://github.com/mamonet/hacl-star/tree/pnmadelaine_aes that has the latest changes committed here. PreComp implementation of ghash is removed and replaced with CT64 which is ~2.5x faster than the former. Also, hardware-accelerated implementation and CT64 share the same algorithm of 128x128->256 multiplication phase and identical 256->128 reduction phase that increases the shared code base and remove a considerable amount of distinctive definitions, properties, lemmas, and pre-computed tables.

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get the remaining comments resolved, then we can merge this.

@@ -186,15 +218,37 @@ hacl_init_cpu_features()
_sse42 = (ecx & ECX_SSE4_2) != 0;
#endif

#if defined(CPU_FEATURES_LINUX) && defined(CPU_FEATURES_ARM64) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. You can use the __cpuid function on Windows (https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-170)

#include "internal/Hacl_Spec.h"
#include "config.h"
#include "hacl-cpu-features.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we need a strategy for handling these changes.
I'm ok with landing this for now. But we need to resolve this quickly on the hacl-star side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jonathan has a clue to solve this issue, we are going to go over this tomorrow. I will keep you posted once an update comes out.

include/Hacl_AES_128_BitSlice.h Outdated Show resolved Hide resolved
@franziskuskiefer
Copy link
Member

Rust and ocaml appear to be missing the hacl CPU detection

  • OCaml: EverCrypt_AEAD.c:(.text+0x12d): undefined reference to hacl_aesgcm_support'`
  • Rust: Undefined symbols for architecture x86_64: "_hacl_aesgcm_support"

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get another review you need re-request review.

Just a few comments here.

@@ -186,15 +218,37 @@ hacl_init_cpu_features()
_sse42 = (ecx & ECX_SSE4_2) != 0;
#endif

#if defined(CPU_FEATURES_LINUX) && defined(CPU_FEATURES_ARM64) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work.

bytes ciphertext(state.range(0) + 16, 0);

for (auto _ : state) {
Lib_IntVector_Intrinsics_vec128 *ctx = (Lib_IntVector_Intrinsics_vec128 *)KRML_HOST_CALLOC((uint32_t)352U, sizeof (uint8_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at other alloc functions like

Hacl_Streaming_Keccak_state *Hacl_Streaming_Keccak_malloc(Spec_Hash_Definitions_hash_alg a)
or
Lib_IntVector_Intrinsics_vec128 *Hacl_Blake2s_128_blake2s_malloc(void)
.

Copy link
Collaborator

@msprotz msprotz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mamone,

Thanks for this PR! I'm very excited to see support for AES-GCM improving in HACL.

All of the initial modules seem good. My main question revolves around the EverCrypt layer... did you edit those files by hand? If yes, I can help you make sure the changes are in the original F* source and generate correct C code, including suitable ifdef guards and runtime tests.

I'm generally available on the Everest slack... let me know how I can help!

Cheers,

Jonathan

#define Lib_IntVector_Intrinsics_vec128_shuffle32(x0, x1, x2, x3, x4) \
(_mm_shuffle_epi32(x0, _MM_SHUFFLE(x1,x2,x3,x4)))
((uint32x4_t){((uint32x4_t)x0)[x1],((uint32x4_t)x0)[x2],((uint32x4_t)x0)[x3],((uint32x4_t)x0)[x4]})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother casting x0 hre to uint32x4_t since vec128 is already defined to be that type?

Copy link
Member Author

@mamonet mamonet Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks!
I apologize for the late response, I only got notification for your main comment and just noticed the code comments.


void Hacl_Gf128_CT64_gcm_update_blocks(uint64_t *ctx, uint32_t len, uint8_t *text);

extern void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably inadvertently used a let f = g binding in HACL* -- I would suggest adding the entire list of function arguments to to avoid generating an extern... hit me up on slack if this doesn't make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Module 'Hacl.Gf128.CT64' is forked from 'Hacl.Gf128.NI' the latter seems to handle the case properly https://github.com/mamonet/hacl-star/blob/pnmadelaine_aes/code/gf128/Hacl.Gf128.NI.fst#L39-L41 but I don't understand the difference between the two modules.

#define Spec_Cipher_Expansion_AESNI_PCLMUL_AES128 3
#define Spec_Cipher_Expansion_AESNI_PCLMUL_AES256 4
#define Spec_Cipher_Expansion_CT64_AES128 5
#define Spec_Cipher_Expansion_CT64_AES256 6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the corresponding source code change here: hacl-star/hacl-star@pnmadelaine_aes...mamonet:hacl-star:pnmadelaine_aes is this handwritten?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't know the F* source of this module resides in providers/evercrypt directory. However, I can tell reflecting the changes in hacl side is a little bit challenging, hopefully we can address this tomorrow.

@@ -26,8 +26,15 @@
#include "EverCrypt_AEAD.h"

#include "internal/Vale.h"
#ifdef HACL_CAN_COMPILE_AESNI_PCLMUL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the majority of the changes in this file are hand-written... is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no reason to hand-write the changes other than not noticing the F* source file.

@mamonet
Copy link
Member Author

mamonet commented Aug 16, 2023

@franziskuskiefer I discussed the issue of checking runtime cpu features inside Evercrypt module with Jonathan yesterday.
For x86 architectures that support AES-NI and CLMUL, the execution flow will go down Vale route and calling NI accelerated functions from inside Evercrypt only benefits armv8-a+ architectures which IMO not worth the trade-off with simplicity as there is no easy workaround for the issue. Calling AESGCM Vale functions conditionally with the portable ones in Evercrypt.AEAD is plain and simple way to make sure the function works on all architectures while unfolds Vale power on modern x86_64 processors. Let me know if this path makes sense to you, I'm working on the changes in F* side and will update the PR once I'm able to produce the generated C code of this design.

Copy link
Member

@franziskuskiefer franziskuskiefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Let me first summarise my understanding of your comment

You're saying that the new feature detection will only be for ARMv8, and thus won't give us much over using the portable when not on x86 with AES-NI.

We'll still want to use the ARMv8 version directly without evercrypt. We just don't expose it in evercrypt.
That's fine with me and will simplify the evercrypt integration significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants