Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster #7

Merged
merged 52 commits into from
May 6, 2022
Merged

Faster #7

merged 52 commits into from
May 6, 2022

Conversation

benjub
Copy link
Collaborator

@benjub benjub commented Apr 4, 2022

This could subsume PR #6. In addition to it (described in #4), it achieves a 3 to 4 times speedup by verifying compressed proofs directly without converting them to normal format.

Also some bug fixes (including the one mentionned in metamath/metamath-exe#81) and code documentation.

This could be used in https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml: probably no need to split set.mm into main/mathbox thanks to speedup, and also one should probably use normal arguments instead of bash redirections to avoid the bug described in metamath/metamath-exe#81.

benjub added 30 commits March 30, 2022 19:14
@benjub
Copy link
Collaborator Author

benjub commented May 2, 2022

Thanks @tirix. It looks like @david-a-wheeler has no time for this now. So maybe I can either modify https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml to use my fork of mmverify.py, or maybe better as you suggest, put a copy of this faster mmverify.py in a new folder, say metamath/set.mm/verifiers ? Maybe @digama0 has some advice ?

@david-a-wheeler
Copy link
Owner

@benjub - How about I give you rights to push changes to this directly? Just give others a chance to review the change first.

@david-a-wheeler
Copy link
Owner

@benjub @tirix - I agree that in the longer term this should probably move to the "Metamath" group instead. In the short term, I've added @benjub as a collaborator so merges can now happen directly.

Sorry I haven't been very available, it's quite out of my control.

@benjub
Copy link
Collaborator Author

benjub commented May 3, 2022

Thanks @david-a-wheeler. As you propose, I'll merge this PR after and if this is reviewed. Maybe by @tirix or @digama0 if/when they have time.

@david-a-wheeler
Copy link
Owner

Thank you. And again, my apologies for my delays. I've had a lot of personal issues that have prevented me from giving these issues the time I want to give them.

@david-a-wheeler
Copy link
Owner

Good. I've invited @tirix and @digama0 also as collaborators. Moving this to the Metamath org might be good long-term, but this will at least get things unstuck.

Copy link
Collaborator

@tirix tirix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, took the time to review!
Nothing breaking for sure, just a few questions.

Comment on lines 98 to 99
self.tokbuf.reverse()
return self.tokbuf.pop()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this is not from your code, but the reverse here is a bit surprising at first sight.
I assume it's more efficient to pop the last element than the first one.
A queue might be even more efficient, but there might be overhead to build it.

Copy link
Collaborator Author

@benjub benjub May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, there is no "stack" in Python, but lists can be used as stacks and pop/append happen at the end of the list (https://docs.python.org/3/tutorial/datastructures.html?highlight=stack#using-lists-as-stacks), hence the need to reverse it. It may not be worth importing Queue for that. Or maybe you had an intermediate solution in mind ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it again, if we're looking for performance, I guess the most efficient would probably to use an iterator. That would be something like:

   self.tokbuf = line.split()
   return self.tokbuf.next()

and testing for the StopIteration exception.
This shall avoid reverting each line, I suppose there could be another small performance gain.

But no need to do this in this PR, though.

mmverify.py Outdated Show resolved Hide resolved
mmverify.py Outdated Show resolved Hide resolved
mmverify.py Outdated
Comment on lines 277 to 280
# If one allows Metamath databases with multiple $f-statements for a
# given var, then one should use "reversed" in the next two lines and
# use 'appendleft' from 'collections.deque' to get the latest f_hyp
# corresponding to the given var.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't multiple $f statements for the same variable forbidden in the same scope?
In any case I think the standard, or readability, recommends that in case a floating variable is declared at several places, it shall always be with the same typecode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was trying too hard to justify the reverseds in the previous versions of the code.

(As for your second sentence, I would disagree for general mm databases. In many textbooks, it happens that, say, x denotes an element of an arbitrary group in one chapter, and then a real number in another chapter.)

Copy link
Collaborator

@digama0 digama0 May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of our opinions toward it, I'm pretty sure that the second sentence is part of the spec. From p.114:

A hypothesis is a $f or $e statement. The type declared by a $f statement for a given label is global even if the variable is not (e.g., a database may not have wff P in one local scope and class P in another).

mmverify.py Outdated Show resolved Hide resolved
mmverify.py Outdated Show resolved Hide resolved
@benjub
Copy link
Collaborator Author

benjub commented May 5, 2022

@tirix : I hope this answers your remarks ?
Note that mmverify.py (old and new) allows local $v's and $f's (which to me is fine), and also local $c's (which should be changed to global $c's: I'm preparing a PR for that).

@benjub
Copy link
Collaborator Author

benjub commented May 5, 2022

Regarding the latest (minor) commit: improve consistency: in floating hypotheses, the order is: first typecode, second variable, both in the database (e.g. $f set x $.) and in the internal data structure used by mmverify.py to store it.
This is opposite to the usual writing in type theory (e.g. x:set) but at least this is consistent internally, since the typecode comes first in all Metamath math expressions.

@benjub benjub requested a review from tirix May 5, 2022 16:20
@benjub
Copy link
Collaborator Author

benjub commented May 5, 2022

Note: 0999f6a introduced a bug. Investigating...

Copy link
Collaborator

@tirix tirix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Interesting to know that len(saved_stmts) is less performant.

Thanks for taking my remarks into account, I'm glad I could contribute to a few percents performance improvement!

@benjub
Copy link
Collaborator Author

benjub commented May 6, 2022

Interesting to know that len(saved_stmts) is less performant.

Well, it seemed a good idea since the length of that array is updated rarely compared to the number of times it is called, and the updates are only of the form "+1". The timings I did did not show significant differences (although they always gave a small advantage to my method). I'm seeing in https://stackoverflow.com/questions/699177/python-do-python-lists-keep-a-count-for-len-or-does-it-count-for-each-call that the length is actually stored (C source code: https://github.com/python/cpython/blob/2.5/Objects/listobject.c#L379). On the other hand, calling len makes a global lookup, which is expensive in Python (compared to using local variables)... All in all, is this optimization worth it ? For the moment, I leave it here.

@benjub benjub merged commit f8a00b4 into david-a-wheeler:master May 6, 2022
@benjub benjub deleted the faster branch May 6, 2022 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants