Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Clean up b64encode (1/N) #3737

Closed
wants to merge 1 commit into from

Conversation

soraros
Copy link
Contributor

@soraros soraros commented Nov 4, 2024

  • Revised b64encode to more closely align with the 2020 paper
  • Made trivial code look trivial
  • As a added bonus, this generates better assembly
  • Note that we’re now using a different algorithm from @gabrieldemarmiesse's original port, specifically in how we order the 6-bit chunks.

@soraros soraros marked this pull request as ready for review November 4, 2024 04:53
@soraros soraros requested a review from a team as a code owner November 4, 2024 04:53
- Revised `b64encode` to more closely align with the 2020 paper
- Made trivial code look trivial

Signed-off-by: Yiwu Chen <[email protected]>
Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup!

@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 4, 2024

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Nov 4, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Nov 4, 2024
DType.uint8, simd_width
]:
alias mask_2 = _repeat_until[simd_width](
SIMD[DType.uint8, 4](0b00000011, 0b11110000, 0, 0)
Copy link
Contributor Author

@soraros soraros Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldemarmiesse These bits are not consecutive on little-endian systems. Was it by design?

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail what you mean? When processing a stream of independent bytes, endianness doesn't matter. Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type, such as a 16-bit, 32-bit, or 64-bit integer. So unless I'm missing something, things should work out, even with the bitcast afterwards.

In the bitcast after, i use rotate which does the same operation independly of endianness.

Copy link
Contributor Author

@soraros soraros Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endianness only becomes relevant when we’re interpreting multiple bytes as a single larger data type.

Indeed, we bit-cast to u16 for the bit rotation since some operations cross byte boundaries. I initially thought your implementation worked because endianness effects canceled out on both ends, yielding a functional result. Now I see I was mistaken, and your code was correct all along.

The current setup does have a few subtle advantages. We start with:

|b₃b₂b₁b₀. . . . |a₅a₄a₃a₂a₁a₀b₅b₄|

and aim for:

|. . b₅b₄b₃b₂b₁b₀|. . a₅a₄a₃a₂a₁a₀|

In your original approach, they are shuffled into:

|. . . . . . . . |a₅a₄a₃a₂a₁a₀. . |  # rotate right by 2
|b₃b₂b₁b₀. . . . |. . . . . . b₅b₄|  # rotate right by 4

A 16-bit rotation positions them correctly, but since these bits aren’t consecutive, the compiler fails to find certain optimisations (like vpmulhuw Lemire2018 or vpmultishiftqb in Lemire2020). In the paper and the current setup, a and b are masked so that their bits are consecutive:

|a₅a₄a₃a₂a₁a₀. . |. . . . . . . . |  # rotate right by 10
|. . . . . . b₅b₄|b₃b₂b₁b₀. . . . |  # rotate left by 4

In any case, thank you for your work on this and for laying such a solid foundation for further optimizations!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's possible I made a mistake there. If I understand what you are saying, the code I wrote previously shouldn't have worked in a little-endian system. But the unit test are passing and the CI is little endian. So I guess the code was still correct, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn’t very clear in my wording. Your code was indeed correct all along! The issue was my misunderstanding of why it worked (when I first ask the question here). I thought I was fixing a subtle bug, but in reality, there was nothing to fix.

modularbot pushed a commit that referenced this pull request Nov 5, 2024
[External] [stdlib] Clean up `b64encode` (1/N)

- Revised `b64encode` to more closely align with the 2020 paper
- Made trivial code look trivial
- As a added bonus, this generates better assembly

Co-authored-by: soraros <[email protected]>
Closes #3737
MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93
@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Nov 5, 2024
@modularbot
Copy link
Collaborator

Landed in ecb37c0! Thank you for your contribution 🎉

@modularbot modularbot closed this Nov 5, 2024
@soraros soraros deleted the cleanup-b64 branch November 5, 2024 08:50
modularbot pushed a commit that referenced this pull request Dec 17, 2024
[External] [stdlib] Clean up `b64encode` (1/N)

- Revised `b64encode` to more closely align with the 2020 paper
- Made trivial code look trivial
- As a added bonus, this generates better assembly

Co-authored-by: soraros <[email protected]>
Closes #3737
MODULAR_ORIG_COMMIT_REV_ID: 5a8c10a6ef4f1155e4464ed4e349c0747d600e93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants