-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track line and column data incrementally when parsing strings #71
base: master
Are you sure you want to change the base?
Conversation
column: int | ||
|
||
|
||
@dataclass(frozen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made Result
and Position
frozen dataclasses, because it seems there's no reason to treat them mutably, and they ought to be hashable if they can be.
@@ -516,6 +513,17 @@ def fail(expected: str) -> Parser: | |||
return Parser(lambda _, index: Result.failure(index, expected)) | |||
|
|||
|
|||
def make_index_update(consumed: str) -> Callable[[Position], Position]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is curried to avoid recomputing the count
and rfind
methods every single time someone uses the string
parser
Codecov ReportBase: 94.44% // Head: 94.43% // Decreases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #71 +/- ##
==========================================
- Coverage 94.44% 94.43% -0.02%
==========================================
Files 9 9
Lines 1027 1025 -2
==========================================
- Hits 970 968 -2
Misses 57 57
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks for the PR, this looks like it could be a very valuable contribution for some users. I haven't had time to look at it in depth, but it looks like it is significantly backwards incompatible. The Would there be any way to get this PR to work without these backwards incompatibilities? For example, could you make your If not, that would probably be a stow stopper. A breaking change of this size would require a new major version at least, or a fork. |
Ah, yeah I see the problem. I think it should be possible to keep this backwards compatible, let me see what I can do |
One idea for better backwards compat: Store the three-tuple on a different attribute, e.g. |
This PR replaces the integer
index
into a stream with a triple of integers representingoffset
,line
andcolumn
information.offset
is whatindex
originally was: how many individual elements of the stream have been consumed.When parser input is a string
line
andcolumn
are updated incrementally with each character of the input string.When parser input is not a string
line
andcolumn
are both set to-1
and ignored.This adds a huge speedup on large inputs to parsers which make frequent use of the
line_info
parser. For reference, using the example JSON parser updated to track line/col data on each node, a 100k line JSON file goes from taking ~210 seconds to ~7 seconds on my machine. We've gone from quadratic time position tracking to linear time position tracking.I added no new tests, since this PR does not add a new feature, only changes the implementation of an existing one. I'm happy to add more tests if needed though.
Let me know if there's anything I can do to improve this!