Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard crash parsing certain markdown files #59

Open
paul-gauthier opened this issue Aug 19, 2023 · 1 comment
Open

Hard crash parsing certain markdown files #59

paul-gauthier opened this issue Aug 19, 2023 · 1 comment

Comments

@paul-gauthier
Copy link

paul-gauthier commented Aug 19, 2023

I have been using tree_sitter_languages to parse markdown. Some of my md files are causing a hard crash of the parser.parse() call:

Assertion failed: (i == length), function deserialize, file scanner.cc, line 79.
Abort trap: 6

I have isolated a sample which can trigger the crash. I binary searched the file to find a single offending line. Then, I gradually replaced all the characters with X until the crash went away.

Some notes:

  • There is a single astrisk * remaining. The parser no longer crashes if you remove it.
  • There are still underscores _ remaining. Turning them into Xs resolves the crash.
  • Reducing the line length, or splitting the text onto multiple lines also seems to avoid the crash.

I have filed this issue with both of these projects, as I am not sure which is most likely to be able to resolve it:


code = '''
XXXXXX_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX *X XXXXXXXX XXXXXXX X XXXX X XXXXX XXXXXXX_XXXXX XXXXXXXXXXX X XXXX X XXXXX XXXXXXX_XXXXXXXXX XXXXX X XXXX X XXXXX XX_XXXXX XXXX X XXXX X XXXXX XXXXXX XXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXXXXX_XX_XXXXXXX XXXX X XXXX X XXXXX XX_XXXXXXXXX XXXX X XXXX X XXXXX XXX_XXXXXXXXX XXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXX XXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXX XXXXXXXXXX X XXXX X XXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXX X XXXXX XXXXXXX_XXXXXXX_XXXXXXXX_XXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXX_XXX_XXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXX_XXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXXX XXXXXXXXXXXXX X XXXX X XXXXX XXXX_XXXXXXXX XXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXXXXXXX XXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXX_XXXXXXXXXXX XXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXXXXXXX XXXXXXXXXXXXXXXX X XXXX X XXXXX XXXXXXX_XXXXX_XXX_XXXXX XXX X XXXX X XXXXX XXXXXX_XXXXXX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXX_XXXX_XX_XXXXXXX XXXXXXXXXXXX X XXXX X XXXXX XXXXXXXX_XXXXXXX XXX X XXXXXXXXXXXX XXXX X XXXX X XXXXX XXXXXXXXX XXX X XXXXXXXXXXXX XXXX X XXXX X XXXXX XXXXXXXX XXX X XXX XXXXXX_XXXXXXXX XXXX X XXXXXX XXX_XXXX XXXXXXXXXXXX XXXXX X XXXX X XXXX
'''

import tree_sitter_languages

print(tree_sitter_languages.__version__) # 1.7.0

parser = tree_sitter_languages.get_parser('markdown')
parser.parse(bytes(code, "utf8"))

@aguynamedben
Copy link

aguynamedben commented Nov 16, 2023

The Emacs folks debugging this in emacs-tree-sitter/elisp-tree-sitter#253 have 3-4 people reporting that it seems to be in Markdown files that have long/wide tables. Here's a screenshot of the file that crashes for me.

image

There are some other example Markdown files in that issue might be helpful for debugging this. I believe the root cause is in this library.

(btw, thank you for providing this grammar, it works great most of the time and I love it!) 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants