Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift codepoint indices when trimming data from front #60

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion decode.go
Original file line number Diff line number Diff line change
Expand Up @@ -227,9 +227,18 @@ func rawValueFromLine(value rawValue, startPos, endPos int, format format) rawVa
relevantIndices = value.codepointIndices[startPos-1 : endPos]
lineData = value.data[relevantIndices[0]:value.codepointIndices[endPos]]
}

// We trimmed data from the front of the string.
// We need to adjust the codepoint indices to reflect this, as they have shifted.
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
Comment on lines +231 to +237
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this only needs to happen when the first relevantIndices is non-zero. Can we add a check here to prevent doing an unneeded iteration over the indices?

Suggested change
// We trimmed data from the front of the string.
// We need to adjust the codepoint indices to reflect this, as they have shifted.
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
// If we trimmed data from the front of the string. We need to adjust the
// codepoint indices to reflect this, as they have shifted.
if relevantIndices[0] > 0 {
removedFromFront := relevantIndices[0]
newIndices := make([]int, 0, len(relevantIndices))
for _, idx := range relevantIndices {
newIndices = append(newIndices, idx-removedFromFront)
}
relevantIndices = newIndices
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ian, thanks for the prompt reply. You're right, we only need to do this in that case. However, this fix is actually not sufficient to solve the production issue we are encountering. I have another fix that together solves the problem, just trying to write a test case that exercises it right now.


return rawValue{
data: trimFunc(lineData),
codepointIndices: relevantIndices,
codepointIndices: newIndices,
}
} else {
if len(value.data) == 0 || startPos > len(value.data) {
Expand Down
57 changes: 57 additions & 0 deletions decode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,63 @@ func TestDecodeSetUseCodepointIndices(t *testing.T) {

}

func TestDecodeSetUseCodepointIndices_Nested(t *testing.T) {
type Nested struct {
First string `fixed:"1,3"`
Second string `fixed:"4,6"`
}

type Test struct {
First string `fixed:"1,3"`
Second Nested `fixed:"4,9"`
Third string `fixed:"10,12"`
Fourth Nested `fixed:"13,18"`
Fifth string `fixed:"19,21"`
}

for _, tt := range []struct {
name string
raw []byte
expected Test
}{
{
name: "All ASCII characters",
raw: []byte("123ABC456DEF789GHI012\n"),
expected: Test{
First: "123",
Second: Nested{First: "ABC", Second: "456"},
Third: "DEF",
Fourth: Nested{First: "789", Second: "GHI"},
Fifth: "012",
},
},
{
name: "Multi-byte characters",
raw: []byte("123x☃x456x☃x789x☃x012\n"),
expected: Test{
First: "123",
Second: Nested{First: "x☃x", Second: "456"},
Third: "x☃x",
Fourth: Nested{First: "789", Second: "x☃x"},
Fifth: "012",
},
},
} {
t.Run(tt.name, func(t *testing.T) {
d := NewDecoder(bytes.NewReader(tt.raw))
d.SetUseCodepointIndices(true)
var s Test
err := d.Decode(&s)
if err != nil {
t.Errorf("Unexpected err: %v", err)
}
if !reflect.DeepEqual(tt.expected, s) {
t.Errorf("Decode(%v) want %v, have %v", tt.raw, tt.expected, s)
}
})
}
}

// Verify the behavior of Decoder.Decode at the end of a file. See
// https://github.com/ianlopshire/go-fixedwidth/issues/6 for more details.
func TestDecode_EOF(t *testing.T) {
Expand Down
Loading