Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric looking token fix #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nanobowers
Copy link
Contributor

Fixes #55 for numeric looking tokens.

I have some concerns over performance (relative to the previous implementation), as I constructed a String to capture relevant part of the chars iterator. This can probably be improved, but it wasnt obvious to me that I had the right data to do it. Also it is checking i32 || f64 conversion to str as a means of deciding. It's possible a custom int/float lexer could shortcut by identifying the more trivial cases before resorting to the conversion routines.

Some tests were added with positive and integers, numeric floats and exponential floats, plus a token that starts with a number and has letters.

@dan-fritchman
Copy link
Owner

Awesome!

Re performance - I imagine you're presumably worried about the string allocations right?

The prevailing wisdom - particularly that used by rustc itself (as of last time I looked) - is something like:

  • Read the whole input into one big string at the beginning
  • Never allocate again, just make references into that big input string

I'll check on whether that remains the case.
Lef21's parser of course doesn't have that performance focus in general, at least not yet. We allocate all kinda stuff, e.g. the String layer-names in every pin and obstruction. We accordingly don't have great performance tracking or regressions (again, at least not yet). There are plenty of intermediate answers, such as pre-allocating a big bucket of strings and pulling from there.

/// Lex a number, but if not truly a number, return token with ttype [TokenType::Name]
fn lex_number(&mut self, leadchar: char) -> LefResult<Option<Token>> {
let mut nstring = String::new();
nstring.push(leadchar);
Copy link
Contributor Author

@nanobowers nanobowers Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numeric conversion routines are what they are but I am still a bit annoyed by the need to allocate a new String, prepend the the peeked char (line 210) and then tack on a str slice from self.chars.as_str().
For some reason that doesn't make sense to me - the .chars iterator is advanced one position so that I couldn't just grab a simple slice of it (without missing the necessary first character).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lef lexer issues with semi-numeric values
2 participants