Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior of regex parser with $ and | #457

Open
adastepkova opened this issue Nov 7, 2024 · 2 comments
Open

Unexpected behavior of regex parser with $ and | #457

adastepkova opened this issue Nov 7, 2024 · 2 comments
Assignees

Comments

@adastepkova
Copy link

When parsing the regex (a$|b), the resulting automaton has an edge with value 302, and no edge under 97 for a:

initial_states: [0]
final_states: [1, 2]
transitions:
0-[98]->1
1-[302]->2

Parsing a$|b yields this automaton, where a is also ignored, but no 302 appears.

initial_states: [0]
final_states: [1]
transitions:
0-[98]->1

In other cases, 301 sometimes appears instead of 302. These are the regexes on which I also encountered this issue:

((\x13bittorrent protocol|azver\x01$|get /scrape\?info_hash=get /announce\?info_hash=|get /client/bitcomet/|GET /data\?fid=)|d1:ad2:id20:|\x08'7P\)[RP])[\x00-\x7f]*
(get (/[\x00-\x7f]download/[ -~]*|/[\x00-\x7f]supernode[ -~]|/[\x00-\x7f]status[ -~]|/[\x00-\x7f]network[ -~]*|/[\x00-\x7f]files|/[\x00-\x7f]hash=[0-9a-f]*/[ -~]*) http/1[\x00-\x7f]1|user-agent: kazaa|x-kazaa(-username|-network|-ip|-supernodeip|-xferid|-xferuid|tag)|^give [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]?[0-9]?[0-9]?)[\x00-\x7f]*
[\x00-\x7f]*(<peerplat>|^get /getfilebyhash\.cgi\?|^get /queue_register\.cgi\?|^get /getupdowninfo\.cgi\?)[\x00-\x7f]*
[\x00-\x7f]*(ver [0-9]+ msnp[1-9][0-9]? [\x09-\x0d -~]*cvr0\x0d\x0a$|usr 1 [!-~]+ [0-9\. ]+\x0d\x0a$|ans 1 [!-~]+ [0-9\. ]+\x0d\x0a)
GETMP3\x0d\x0aFilename|^\x01[\x00-\x7f]?[\x00-\x7f]?[\x00-\x7f]?(\x51\x3a\+|\x51\x32\x3a)|^\x10[\x14-\x16]\x10[\x15-\x17][\x00-\x7f]?[\x00-\x7f]?[\x00-\x7f]?[\x00-\x7f]?
(notify[\x09-\x0d ]\*[\x09-\x0d ]http/1\.1[\x09-\x0d -~]*ssdp:(alive|byebye)|^m-search[\x09-\x0d ]\*[\x09-\x0d ]http/1\.1[\x09-\x0d -~]*ssdp:discover)[\x00-\x7f]*
(t\x03ni[\x00-\x7f]?[\x01-\x06]?t[\x01-\x05]s[\x0a\x0b](glob|who are you$|query data))[\x00-\x7f]*

This issue may be related to #450, but I am just guessing.

@Adda0
Copy link
Collaborator

Adda0 commented Nov 7, 2024

These are some pretty wild and random bugs. Thank you for the bug reports. We use an external parser for regexes, which was and still is a cause of many a headache for us. For our regexes, the parser works reasonably well for us (after some bug fixing). But the parser here seems to fail miserably (presuming it is an issue with the parser, that is). We will gradually fix the issues, but it seems that the parser is too unreliable to use comfortably. We may have to think about either introducing our own parser, or use a different parser for regexes.

@Adda0
Copy link
Collaborator

Adda0 commented Nov 18, 2024

When #461 is merged, we can test whether #459 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants