-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly update and trim codepoint indices after trimming data #62
Conversation
These codepoint indices are now invalid when trimming data from the front of the string. Shift these left to match the new byte offsets after trim. Fixes potential panic or invalid data when using UTF-8 encoded strings with nested struct members.
…when-trim-front Shift codepoint indices when trimming data from front
Trim these indices to be consistent with the trimmed string. This prevents a potential index out of bounds when padding is trimmed.
…es-when-trimming-line-data Trim codepoint indices when trimming line data
@sidkurella I think there is still a bug in this where padding at the start of a nested struct will cause a misalignment. Consider this test case: func TestDecode_Nested(t *testing.T) {
type Nested struct {
First string `fixed:"1,3"`
Second string `fixed:"4,6"`
}
type Test struct {
Nested Nested `fixed:"1,6"`
}
for _, tt := range []struct {
name string
raw []byte
expected Test
}{
{
name: "Multi-byte with trimmed values",
raw: []byte(" ☃2B☃\n"),
expected: Test{
Nested: Nested{
First: "☃2",
Second: "B☃",
},
},
},
} {
t.Run(tt.name, func(t *testing.T) {
d := NewDecoder(bytes.NewReader(tt.raw))
d.SetUseCodepointIndices(true)
var s Test
err := d.Decode(&s)
if err != nil {
t.Errorf("Unexpected err: %v", err)
}
if !reflect.DeepEqual(tt.expected, s) {
t.Errorf("Decode(%v) want %v, have %v", tt.raw, tt.expected, s)
}
})
}
} A mechanism must be added to prevent trimming on values decoded into the nested struct. I've opened the PR draft #63 as an example. |
Yeah, I think you're right, if you're decoding into a nested struct I think you just don't want to do any padding trimming at all I feel like that's an issue also in the pure-ASCII case too, no? |
{
name: "Pure ASCII with trimmed values",
raw: []byte(" A2BC\n"),
expected: Test{
Nested: Nested{
First: "A2",
Second: "BC",
},
},
}, Yeah, just tested with this, this fails too. This may break existing users though? |
So it seems there are two issues at play: (a) the potential panic caused by not correcting the relevant indices and (b) trimming for nested structs. I feel fine fixing the first; any current users affected would be seeing a panic. I'm hesitant to change the behavior of the second without some sort of opt-in setting. Would adding type Nested struct {
First string `fixed:"1,3"`
Second string `fixed:"4,6"`
}
type Test struct {
Nested `fixed:"1,6,none"`
} |
Yeah, that works |
Cool, Let's use this PR to solve the first of those issues, and I can update the second PR to add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trimming logic cluttering up the rawValueFromLine
and making it hard to understand. Can we factor out the trimming logic into methods on rawValue
?
func (rawValue) trimPrefix(prefix string) rawValue {}
func (rawValue) trimSuffix(suffix string) rawValue {}
func (rawValue) trim(cutset string) rawValue {}
Alternatively, you could use the newRawValue
function to rebuild the indices after trimming. I think that would be a performance hit, but I'd be fine with it for now.
…ethods Add raw value trim methods
Updated |
Shift codepoint indices when trimming from front of string
Trim codepoint indices to be consistent with trimmed data string
Closes #61