-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PCRE with UTF-8 data on Windows #145
Comments
BTW, this issue was reported at regex-pcre-builtin. |
I ran into issues using *Main Lib Text.RE.PCRE.Text> "a first hello to everyone" *=~/ [ed|$(hello)///"$1"|]
"a first \"hello\" to everyone" -- OK
*Main Lib Text.RE.PCRE.Text> "a first hello to everyone" *=~/ [ed|$(hello)///"$1"|]
"a fir\64262 \"llo t\" to everyone" -- Uh oh
*Main Lib Text.RE.PCRE.ByteString> "a first hello to everyone" *=~/ [ed|$(hello)///"$1"|]
"a fir\ACK \"hello\" to everyone" And *Main Lib Text.RE.PCRE.String> "a first hello to everyone" *=~/ [ed|$(hello)///"$1"|]
"*** Exception: utf8_correct_bs: UTF-8 decoding error
CallStack (from HasCallStack):
error, called at .\Text\RE\ZeInternals\Types\Match.lhs:248:13 in regex-1.1.0.0-H1FPxX1khLGKIhuhwowTFL:Text.RE.ZeInternals.Types.Match This does work correctly in the TDFA module, however my use case requires non-greedy matching which only appears to be supported by PCRE. My current work around is to use TDFA where I can and then manual non-regex search and replace where I require non-greedy behavior. |
regex-pcre
has never worked with UTF8 data due to regex-pcre not working properly with UTF-8 text #141 (and it was never guaranteed).Currently it is not working on Windows (at least on AppVeyor) and the Windows UTF-8/PCRE tests have been suspended.
The current method of fixing up the offsets in
regex
is hacky and inefficient.The text was updated successfully, but these errors were encountered: