wkt_decode: readWhitespace does not skip all white space. #87

gdey · 2019-11-01T23:42:32Z

readWhitespace assumes all text is ASCII. The function should decode runes correctly and then check to see if they are spaces.

Can not test runes this way:

https://github.com/go-spatial/geom/blob/master/encoding/wkt/wkt_decode.go#L55-L57

You need to read all the bytes in the run before testing it.

Here is a test case with the failures:

func TestReadWhitespace(t *testing.T) {
	type tcase struct {
		in    string
		match bool
	}
	fn := func(tc tcase) func(*testing.T) {
		decoder := NewDecoder(strings.NewReader(tc.in + "."))

		return func(t *testing.T) {
			match, err := decoder.readWhitespace()
			if err != nil {
				t.Errorf("error, expected nil got: %v", err)
				return
			}
			if match != tc.match {
				t.Errorf("match, expected %t, got %t", tc.match, match)
			}
		}
	}

	tests := map[string]tcase{
		"regular0":     tcase{in: " ", match: true},
		"tab":          tcase{in: "\t", match: true},
		"newline":      tcase{in: "\n", match: true},
		"return":       tcase{in: "\r", match: true},
		"regular":      tcase{in: "\u0020", match: true},
		"nobreak":      tcase{in: "\u00A0", match: true},
		"ogham mark":   tcase{in: "\u1680", match: true},
		"en quad":      tcase{in: "\u2000", match: true},
		"em quad":      tcase{in: "\u2001", match: true},
		"en":           tcase{in: "\u2002", match: true},
		"em":           tcase{in: "\u2003", match: true},
		"three-per-em": tcase{in: "\u2004", match: true},
		"four-per-em":  tcase{in: "\u2005", match: true},
		"six-per-em":   tcase{in: "\u2006", match: true},
		"figure":       tcase{in: "\u2007", match: true},
		"punctuation":  tcase{in: "\u2007", match: true},
		"thin":         tcase{in: "\u2009", match: true},
		"hair":         tcase{in: "\u200A", match: true},
		"nnbsp":        tcase{in: "\u202F", match: true},
		"mmsp":         tcase{in: "\u205F", match: true},
		"ideographic":  tcase{in: "\u3000", match: true},
	}
	for key, test := range tests {
		t.Run(key, fn(test))
	}

}

The text was updated successfully, but these errors were encountered:

ear7h · 2019-11-02T21:57:32Z

Do you have a WKT string test case for this?

As far as I've read, the spec does not specify any unicode characters, which is why the Decoder can/does read bytes and not runes.

I could add a check to ensure the characters are within a valid ascii range in readByte

gdey · 2019-11-04T17:26:32Z

We are reading in UTF8 text, the spec does not say that it must be ASCII; so we should not assume ASCII only spaces. The other characters are unlikely to have an issue with UTF8 as they are equivalent, so I did not call it out -- though the errors can be funky.

gdey assigned ear7h Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wkt_decode: readWhitespace does not skip all white space. #87

wkt_decode: readWhitespace does not skip all white space. #87

gdey commented Nov 1, 2019 •

edited

Loading

ear7h commented Nov 2, 2019

gdey commented Nov 4, 2019

wkt_decode: readWhitespace does not skip all white space. #87

wkt_decode: readWhitespace does not skip all white space. #87

Comments

gdey commented Nov 1, 2019 • edited Loading

ear7h commented Nov 2, 2019

gdey commented Nov 4, 2019

gdey commented Nov 1, 2019 •

edited

Loading