Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard crash parsing certain markdown files #59

Open
paul-gauthier opened this issue Aug 19, 2023 · 1 comment
Open

Hard crash parsing certain markdown files #59

paul-gauthier opened this issue Aug 19, 2023 · 1 comment

Comments

@paul-gauthier
Copy link

paul-gauthier commented Aug 19, 2023

I have been using tree_sitter_languages to parse markdown. Some of my md files are causing a hard crash of the parser.parse() call:

Assertion failed: (i == length), function deserialize, file scanner.cc, line 79.
Abort trap: 6

I have isolated a sample which can trigger the crash. I binary searched the file to find a single offending line. Then, I gradually replaced all the characters with X until the crash went away.

Some notes:

  • There is a single astrisk * remaining. The parser no longer crashes if you remove it.
  • There are still underscores _ remaining. Turning them into Xs resolves the crash.
  • Reducing the line length, or splitting the text onto multiple lines also seems to avoid the crash.

I have filed this issue with both of these projects, as I am not sure which is most likely to be able to resolve it:


code = '''

'''

import tree_sitter_languages

print(tree_sitter_languages.__version__) # 1.7.0

parser = tree_sitter_languages.get_parser('markdown')
parser.parse(bytes(code, "utf8"))

@aguynamedben
Copy link

aguynamedben commented Nov 16, 2023

The Emacs folks debugging this in emacs-tree-sitter/elisp-tree-sitter#253 have 3-4 people reporting that it seems to be in Markdown files that have long/wide tables. Here's a screenshot of the file that crashes for me.

image

There are some other example Markdown files in that issue might be helpful for debugging this. I believe the root cause is in this library.

(btw, thank you for providing this grammar, it works great most of the time and I love it!) 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants