Shift codepoint indices when trimming data from front #60

sidkurella · 2024-02-08T14:48:04Z

These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim.

Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.

These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim. Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.

ianlopshire

@sidkurella Thanks for taking the time to submit this PR!

I've requested a small change, but once that's complete I'd be happy to merge and cut a new release!

ianlopshire · 2024-02-08T15:45:45Z

decode.go

+		// We trimmed data from the front of the string.
+		// We need to adjust the codepoint indices to reflect this, as they have shifted.
+		removedFromFront := relevantIndices[0]
+		newIndices := make([]int, 0, len(relevantIndices))
+		for _, idx := range relevantIndices {
+			newIndices = append(newIndices, idx-removedFromFront)
+		}


If I understand correctly, this only needs to happen when the first relevantIndices is non-zero. Can we add a check here to prevent doing an unneeded iteration over the indices?

Suggested change

// We trimmed data from the front of the string.

// We need to adjust the codepoint indices to reflect this, as they have shifted.

removedFromFront := relevantIndices[0]

newIndices := make([]int, 0, len(relevantIndices))

for _, idx := range relevantIndices {

newIndices = append(newIndices, idx-removedFromFront)

}

// If we trimmed data from the front of the string. We need to adjust the

// codepoint indices to reflect this, as they have shifted.

if relevantIndices[0] > 0 {

removedFromFront := relevantIndices[0]

newIndices := make([]int, 0, len(relevantIndices))

for _, idx := range relevantIndices {

newIndices = append(newIndices, idx-removedFromFront)

}

relevantIndices = newIndices

}

Hi Ian, thanks for the prompt reply. You're right, we only need to do this in that case. However, this fix is actually not sufficient to solve the production issue we are encountering. I have another fix that together solves the problem, just trying to write a test case that exercises it right now.

sidkurella · 2024-02-08T17:17:03Z

Closing this pr and opening new one with updated fix

sidkurella mentioned this pull request Feb 8, 2024

Potential panic or invalid data when using UTF-8 codepoint boundaries when decoding into a nested struct #61

Closed

ianlopshire requested changes Feb 8, 2024

View reviewed changes

sidkurella closed this Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift codepoint indices when trimming data from front #60

Shift codepoint indices when trimming data from front #60

sidkurella commented Feb 8, 2024

ianlopshire left a comment

ianlopshire Feb 8, 2024

sidkurella Feb 8, 2024

sidkurella commented Feb 8, 2024

Shift codepoint indices when trimming data from front #60

Shift codepoint indices when trimming data from front #60

Conversation

sidkurella commented Feb 8, 2024

ianlopshire left a comment

Choose a reason for hiding this comment

ianlopshire Feb 8, 2024

Choose a reason for hiding this comment

sidkurella Feb 8, 2024

Choose a reason for hiding this comment

sidkurella commented Feb 8, 2024