-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster #7
Faster #7
Conversation
…ot overridden in the mm database)
… similar to metamath.c's
…ains an inclusion statement to itself
… apparently had no consequences...
Thanks @tirix. It looks like @david-a-wheeler has no time for this now. So maybe I can either modify https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml to use my fork of mmverify.py, or maybe better as you suggest, put a copy of this faster mmverify.py in a new folder, say |
@benjub - How about I give you rights to push changes to this directly? Just give others a chance to review the change first. |
Thanks @david-a-wheeler. As you propose, I'll merge this PR after and if this is reviewed. Maybe by @tirix or @digama0 if/when they have time. |
Thank you. And again, my apologies for my delays. I've had a lot of personal issues that have prevented me from giving these issues the time I want to give them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, took the time to review!
Nothing breaking for sure, just a few questions.
self.tokbuf.reverse() | ||
return self.tokbuf.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this is not from your code, but the reverse
here is a bit surprising at first sight.
I assume it's more efficient to pop
the last element than the first one.
A queue
might be even more efficient, but there might be overhead to build it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, there is no "stack" in Python, but lists can be used as stacks and pop/append happen at the end of the list (https://docs.python.org/3/tutorial/datastructures.html?highlight=stack#using-lists-as-stacks), hence the need to reverse it. It may not be worth importing Queue for that. Or maybe you had an intermediate solution in mind ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it again, if we're looking for performance, I guess the most efficient would probably to use an iterator. That would be something like:
self.tokbuf = line.split()
return self.tokbuf.next()
and testing for the StopIteration exception.
This shall avoid reverting each line, I suppose there could be another small performance gain.
But no need to do this in this PR, though.
mmverify.py
Outdated
# If one allows Metamath databases with multiple $f-statements for a | ||
# given var, then one should use "reversed" in the next two lines and | ||
# use 'appendleft' from 'collections.deque' to get the latest f_hyp | ||
# corresponding to the given var. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't multiple $f statements for the same variable forbidden in the same scope?
In any case I think the standard, or readability, recommends that in case a floating variable is declared at several places, it shall always be with the same typecode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was trying too hard to justify the reversed
s in the previous versions of the code.
(As for your second sentence, I would disagree for general mm databases. In many textbooks, it happens that, say, x denotes an element of an arbitrary group in one chapter, and then a real number in another chapter.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of our opinions toward it, I'm pretty sure that the second sentence is part of the spec. From p.114:
A hypothesis is a
$f
or$e
statement. The type declared by a$f
statement for a given label is global even if the variable is not (e.g., a database may not havewff P
in one local scope andclass P
in another).
…ts in the couples (typecode, var)
@tirix : I hope this answers your remarks ? |
Regarding the latest (minor) commit: improve consistency: in floating hypotheses, the order is: first typecode, second variable, both in the database (e.g. |
Note: 0999f6a introduced a bug. Investigating... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Interesting to know that len(saved_stmts)
is less performant.
Thanks for taking my remarks into account, I'm glad I could contribute to a few percents performance improvement!
Well, it seemed a good idea since the length of that array is updated rarely compared to the number of times it is called, and the updates are only of the form "+1". The timings I did did not show significant differences (although they always gave a small advantage to my method). I'm seeing in https://stackoverflow.com/questions/699177/python-do-python-lists-keep-a-count-for-len-or-does-it-count-for-each-call that the length is actually stored (C source code: https://github.com/python/cpython/blob/2.5/Objects/listobject.c#L379). On the other hand, calling len makes a global lookup, which is expensive in Python (compared to using local variables)... All in all, is this optimization worth it ? For the moment, I leave it here. |
This could subsume PR #6. In addition to it (described in #4), it achieves a 3 to 4 times speedup by verifying compressed proofs directly without converting them to normal format.
Also some bug fixes (including the one mentionned in metamath/metamath-exe#81) and code documentation.
This could be used in https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml: probably no need to split set.mm into main/mathbox thanks to speedup, and also one should probably use normal arguments instead of bash redirections to avoid the bug described in metamath/metamath-exe#81.