-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Clean up b64encode
(1/N)
#3737
Conversation
- Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial Signed-off-by: Yiwu Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
DType.uint8, simd_width | ||
]: | ||
alias mask_2 = _repeat_until[simd_width]( | ||
SIMD[DType.uint8, 4](0b00000011, 0b11110000, 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabrieldemarmiesse These bits are not consecutive on little-endian systems. Was it by design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you detail what you mean? When processing a stream of independent bytes, endianness doesn't matter. Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type, such as a 16-bit, 32-bit, or 64-bit integer. So unless I'm missing something, things should work out, even with the bitcast afterwards.
In the bitcast after, i use rotate which does the same operation independly of endianness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type.
Indeed, we bit-cast to u16
for the bit rotation since some operations cross byte boundaries. I initially thought your implementation worked because endianness effects canceled out on both ends, yielding a functional result. Now I see I was mistaken, and your code was correct all along.
The current setup does have a few subtle advantages. We start with:
|b₃b₂b₁b₀. . . . |a₅a₄a₃a₂a₁a₀b₅b₄|
and aim for:
|. . b₅b₄b₃b₂b₁b₀|. . a₅a₄a₃a₂a₁a₀|
In your original approach, they are shuffled into:
|. . . . . . . . |a₅a₄a₃a₂a₁a₀. . | # rotate right by 2
|b₃b₂b₁b₀. . . . |. . . . . . b₅b₄| # rotate right by 4
A 16-bit rotation positions them correctly, but since these bits aren’t consecutive, the compiler fails to find certain optimisations (like vpmulhuw
Lemire2018 or vpmultishiftqb
in Lemire2020). In the paper and the current setup, a
and b
are masked so that their bits are consecutive:
|a₅a₄a₃a₂a₁a₀. . |. . . . . . . . | # rotate right by 10
|. . . . . . b₅b₄|b₃b₂b₁b₀. . . . | # rotate left by 4
In any case, thank you for your work on this and for laying such a solid foundation for further optimizations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it's possible I made a mistake there. If I understand what you are saying, the code I wrote previously shouldn't have worked in a little-endian system. But the unit test are passing and the CI is little endian. So I guess the code was still correct, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn’t very clear in my wording. Your code was indeed correct all along! The issue was my misunderstanding of why it worked (when I first ask the question here). I thought I was fixing a subtle bug, but in reality, there was nothing to fix.
[External] [stdlib] Clean up `b64encode` (1/N) - Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial - As a added bonus, this generates better assembly Co-authored-by: soraros <[email protected]> Closes #3737 MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93
Landed in ecb37c0! Thank you for your contribution 🎉 |
[External] [stdlib] Clean up `b64encode` (1/N) - Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial - As a added bonus, this generates better assembly Co-authored-by: soraros <[email protected]> Closes #3737 MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93
b64encode
to more closely align with the 2020 paper