[stdlib] Clean up `b64encode` (1/N) #3737

soraros · 2024-11-04T04:49:27Z

Revised b64encode to more closely align with the 2020 paper
Made trivial code look trivial
As a added bonus, this generates better assembly
Note that we’re now using a different algorithm from @gabrieldemarmiesse's original port, specifically in how we order the 6-bit chunks.

- Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial Signed-off-by: Yiwu Chen <[email protected]>

JoeLoser

Nice cleanup!

JoeLoser · 2024-11-04T15:05:32Z

!sync

modularbot · 2024-11-04T16:28:44Z

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

soraros · 2024-11-04T17:15:37Z

stdlib/src/base64/_b64encode.mojo

-    DType.uint8, simd_width
-]:
-    alias mask_2 = _repeat_until[simd_width](
-        SIMD[DType.uint8, 4](0b00000011, 0b11110000, 0, 0)


@gabrieldemarmiesse These bits are not consecutive on little-endian systems. Was it by design?

Can you detail what you mean? When processing a stream of independent bytes, endianness doesn't matter. Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type, such as a 16-bit, 32-bit, or 64-bit integer. So unless I'm missing something, things should work out, even with the bitcast afterwards.

In the bitcast after, i use rotate which does the same operation independly of endianness.

Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type.

Indeed, we bit-cast to u16 for the bit rotation since some operations cross byte boundaries. I initially thought your implementation worked because endianness effects canceled out on both ends, yielding a functional result. Now I see I was mistaken, and your code was correct all along.

The current setup does have a few subtle advantages. We start with:

|b₃b₂b₁b₀. . . . |a₅a₄a₃a₂a₁a₀b₅b₄|

and aim for:

|. . b₅b₄b₃b₂b₁b₀|. . a₅a₄a₃a₂a₁a₀|

In your original approach, they are shuffled into:

|. . . . . . . . |a₅a₄a₃a₂a₁a₀. . | # rotate right by 2 |b₃b₂b₁b₀. . . . |. . . . . . b₅b₄| # rotate right by 4

A 16-bit rotation positions them correctly, but since these bits aren’t consecutive, the compiler fails to find certain optimisations (like vpmulhuw Lemire2018 or vpmultishiftqb in Lemire2020). In the paper and the current setup, a and b are masked so that their bits are consecutive:

|a₅a₄a₃a₂a₁a₀. . |. . . . . . . . | # rotate right by 10 |. . . . . . b₅b₄|b₃b₂b₁b₀. . . . | # rotate left by 4

In any case, thank you for your work on this and for laying such a solid foundation for further optimizations!

Thanks, it's possible I made a mistake there. If I understand what you are saying, the code I wrote previously shouldn't have worked in a little-endian system. But the unit test are passing and the CI is little endian. So I guess the code was still correct, no?

Sorry, I wasn’t very clear in my wording. Your code was indeed correct all along! The issue was my misunderstanding of why it worked (when I first ask the question here). I thought I was fixing a subtle bug, but in reality, there was nothing to fix.

[External] [stdlib] Clean up `b64encode` (1/N) - Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial - As a added bonus, this generates better assembly Co-authored-by: soraros <[email protected]> Closes #3737 MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93

modularbot · 2024-11-05T06:23:39Z

Landed in ecb37c0! Thank you for your contribution 🎉

[External] [stdlib] Clean up `b64encode` (1/N) - Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial - As a added bonus, this generates better assembly Co-authored-by: soraros <[email protected]> Closes #3737 MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93

soraros marked this pull request as ready for review November 4, 2024 04:53

soraros requested a review from a team as a code owner November 4, 2024 04:53

soraros force-pushed the cleanup-b64 branch from 7db6b0b to 6985f7b Compare November 4, 2024 05:05

[stdlib] Clean up b64encode (1/N)

7f0fcf3

- Revised `b64encode` to more closely align with the 2020 paper - Made trivial code look trivial Signed-off-by: Yiwu Chen <[email protected]>

soraros force-pushed the cleanup-b64 branch from 6985f7b to 7f0fcf3 Compare November 4, 2024 05:31

JoeLoser approved these changes Nov 4, 2024

View reviewed changes

modular-automation bot assigned JoeLoser Nov 4, 2024

modularbot added the imported-internally Signals that a given pull request has been imported internally. label Nov 4, 2024

modularbot added the merged-internally Indicates that this pull request has been merged internally label Nov 4, 2024

soraros commented Nov 4, 2024

View reviewed changes

modularbot added the merged-externally Merged externally in public mojo repo label Nov 5, 2024

modularbot closed this Nov 5, 2024

soraros deleted the cleanup-b64 branch November 5, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Clean up `b64encode` (1/N) #3737

[stdlib] Clean up `b64encode` (1/N) #3737

soraros commented Nov 4, 2024 •

edited

Loading

JoeLoser left a comment

JoeLoser commented Nov 4, 2024

modularbot commented Nov 4, 2024

soraros Nov 4, 2024 •

edited

Loading

gabrieldemarmiesse Nov 4, 2024 •

edited

Loading

soraros Nov 4, 2024 •

edited

Loading

gabrieldemarmiesse Nov 4, 2024

soraros Nov 4, 2024

modularbot commented Nov 5, 2024

[stdlib] Clean up b64encode (1/N) #3737

[stdlib] Clean up b64encode (1/N) #3737

Conversation

soraros commented Nov 4, 2024 • edited Loading

JoeLoser left a comment

Choose a reason for hiding this comment

JoeLoser commented Nov 4, 2024

modularbot commented Nov 4, 2024

soraros Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

gabrieldemarmiesse Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

soraros Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

gabrieldemarmiesse Nov 4, 2024

Choose a reason for hiding this comment

soraros Nov 4, 2024

Choose a reason for hiding this comment

modularbot commented Nov 5, 2024

[stdlib] Clean up `b64encode` (1/N) #3737

[stdlib] Clean up `b64encode` (1/N) #3737

soraros commented Nov 4, 2024 •

edited

Loading

soraros Nov 4, 2024 •

edited

Loading

gabrieldemarmiesse Nov 4, 2024 •

edited

Loading

soraros Nov 4, 2024 •

edited

Loading