-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shift codepoint indices when trimming data from front #60
Shift codepoint indices when trimming data from front #60
Conversation
These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim. Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidkurella Thanks for taking the time to submit this PR!
I've requested a small change, but once that's complete I'd be happy to merge and cut a new release!
// We trimmed data from the front of the string. | ||
// We need to adjust the codepoint indices to reflect this, as they have shifted. | ||
removedFromFront := relevantIndices[0] | ||
newIndices := make([]int, 0, len(relevantIndices)) | ||
for _, idx := range relevantIndices { | ||
newIndices = append(newIndices, idx-removedFromFront) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this only needs to happen when the first relevantIndices
is non-zero. Can we add a check here to prevent doing an unneeded iteration over the indices?
// We trimmed data from the front of the string. | |
// We need to adjust the codepoint indices to reflect this, as they have shifted. | |
removedFromFront := relevantIndices[0] | |
newIndices := make([]int, 0, len(relevantIndices)) | |
for _, idx := range relevantIndices { | |
newIndices = append(newIndices, idx-removedFromFront) | |
} | |
// If we trimmed data from the front of the string. We need to adjust the | |
// codepoint indices to reflect this, as they have shifted. | |
if relevantIndices[0] > 0 { | |
removedFromFront := relevantIndices[0] | |
newIndices := make([]int, 0, len(relevantIndices)) | |
for _, idx := range relevantIndices { | |
newIndices = append(newIndices, idx-removedFromFront) | |
} | |
relevantIndices = newIndices | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ian, thanks for the prompt reply. You're right, we only need to do this in that case. However, this fix is actually not sufficient to solve the production issue we are encountering. I have another fix that together solves the problem, just trying to write a test case that exercises it right now.
Closing this pr and opening new one with updated fix |
These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim.
Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.