Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wkt_decode: readWhitespace does not skip all white space. #87

Open
gdey opened this issue Nov 1, 2019 · 2 comments
Open

wkt_decode: readWhitespace does not skip all white space. #87

gdey opened this issue Nov 1, 2019 · 2 comments
Assignees

Comments

@gdey
Copy link
Member

gdey commented Nov 1, 2019

readWhitespace assumes all text is ASCII. The function should decode runes correctly and then check to see if they are spaces.

Can not test runes this way:

https://github.com/go-spatial/geom/blob/master/encoding/wkt/wkt_decode.go#L55-L57

You need to read all the bytes in the run before testing it.

Here is a test case with the failures:

func TestReadWhitespace(t *testing.T) {
	type tcase struct {
		in    string
		match bool
	}
	fn := func(tc tcase) func(*testing.T) {
		decoder := NewDecoder(strings.NewReader(tc.in + "."))

		return func(t *testing.T) {
			match, err := decoder.readWhitespace()
			if err != nil {
				t.Errorf("error, expected nil got: %v", err)
				return
			}
			if match != tc.match {
				t.Errorf("match, expected %t, got %t", tc.match, match)
			}
		}
	}

	tests := map[string]tcase{
		"regular0":     tcase{in: " ", match: true},
		"tab":          tcase{in: "\t", match: true},
		"newline":      tcase{in: "\n", match: true},
		"return":       tcase{in: "\r", match: true},
		"regular":      tcase{in: "\u0020", match: true},
		"nobreak":      tcase{in: "\u00A0", match: true},
		"ogham mark":   tcase{in: "\u1680", match: true},
		"en quad":      tcase{in: "\u2000", match: true},
		"em quad":      tcase{in: "\u2001", match: true},
		"en":           tcase{in: "\u2002", match: true},
		"em":           tcase{in: "\u2003", match: true},
		"three-per-em": tcase{in: "\u2004", match: true},
		"four-per-em":  tcase{in: "\u2005", match: true},
		"six-per-em":   tcase{in: "\u2006", match: true},
		"figure":       tcase{in: "\u2007", match: true},
		"punctuation":  tcase{in: "\u2007", match: true},
		"thin":         tcase{in: "\u2009", match: true},
		"hair":         tcase{in: "\u200A", match: true},
		"nnbsp":        tcase{in: "\u202F", match: true},
		"mmsp":         tcase{in: "\u205F", match: true},
		"ideographic":  tcase{in: "\u3000", match: true},
	}
	for key, test := range tests {
		t.Run(key, fn(test))
	}

}
@ear7h
Copy link
Contributor

ear7h commented Nov 2, 2019

Do you have a WKT string test case for this?

As far as I've read, the spec does not specify any unicode characters, which is why the Decoder can/does read bytes and not runes.

I could add a check to ensure the characters are within a valid ascii range in readByte

@gdey
Copy link
Member Author

gdey commented Nov 4, 2019

We are reading in UTF8 text, the spec does not say that it must be ASCII; so we should not assume ASCII only spaces. The other characters are unlikely to have an issue with UTF8 as they are equivalent, so I did not call it out -- though the errors can be funky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants