Faster #7

benjub · 2022-04-04T18:38:53Z

This could subsume PR #6. In addition to it (described in #4), it achieves a 3 to 4 times speedup by verifying compressed proofs directly without converting them to normal format.

Also some bug fixes (including the one mentionned in metamath/metamath-exe#81) and code documentation.

This could be used in https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml: probably no need to split set.mm into main/mathbox thanks to speedup, and also one should probably use normal arguments instead of bash redirections to avoid the bug described in metamath/metamath-exe#81.

…roof

…o argparse

…default=stdin)

…ot overridden in the mm database)

…ument

… similar to metamath.c's

…ains an inclusion statement to itself

… apparently had no consequences...

benjub · 2022-05-02T22:06:20Z

Thanks @tirix. It looks like @david-a-wheeler has no time for this now. So maybe I can either modify https://github.com/metamath/set.mm/blob/develop/.github/workflows/verifiers.yml to use my fork of mmverify.py, or maybe better as you suggest, put a copy of this faster mmverify.py in a new folder, say metamath/set.mm/verifiers ? Maybe @digama0 has some advice ?

david-a-wheeler · 2022-05-03T15:20:47Z

@benjub - How about I give you rights to push changes to this directly? Just give others a chance to review the change first.

david-a-wheeler · 2022-05-03T15:24:38Z

@benjub @tirix - I agree that in the longer term this should probably move to the "Metamath" group instead. In the short term, I've added @benjub as a collaborator so merges can now happen directly.

Sorry I haven't been very available, it's quite out of my control.

benjub · 2022-05-03T17:23:37Z

Thanks @david-a-wheeler. As you propose, I'll merge this PR after and if this is reviewed. Maybe by @tirix or @digama0 if/when they have time.

david-a-wheeler · 2022-05-03T18:58:52Z

Thank you. And again, my apologies for my delays. I've had a lot of personal issues that have prevented me from giving these issues the time I want to give them.

david-a-wheeler · 2022-05-03T19:01:56Z

Good. I've invited @tirix and @digama0 also as collaborators. Moving this to the Metamath org might be good long-term, but this will at least get things unstuck.

tirix

Ok, took the time to review!
Nothing breaking for sure, just a few questions.

tirix · 2022-05-04T03:06:25Z

mmverify.py

                self.tokbuf.reverse()
        return self.tokbuf.pop()


Obviously this is not from your code, but the reverse here is a bit surprising at first sight.
I assume it's more efficient to pop the last element than the first one.
A queue might be even more efficient, but there might be overhead to build it.

As I understand, there is no "stack" in Python, but lists can be used as stacks and pop/append happen at the end of the list (https://docs.python.org/3/tutorial/datastructures.html?highlight=stack#using-lists-as-stacks), hence the need to reverse it. It may not be worth importing Queue for that. Or maybe you had an intermediate solution in mind ?

Looking at it again, if we're looking for performance, I guess the most efficient would probably to use an iterator. That would be something like:

self.tokbuf = line.split() return self.tokbuf.next()

and testing for the StopIteration exception.
This shall avoid reverting each line, I suppose there could be another small performance gain.

But no need to do this in this PR, though.

mmverify.py

tirix · 2022-05-04T03:23:12Z

mmverify.py

+        # If one allows Metamath databases with multiple $f-statements for a
+        # given var, then one should use "reversed" in the next two lines and
+        # use 'appendleft' from 'collections.deque' to get the latest f_hyp
+        # corresponding to the given var.


Aren't multiple $f statements for the same variable forbidden in the same scope?
In any case I think the standard, or readability, recommends that in case a floating variable is declared at several places, it shall always be with the same typecode.

You're right. I was trying too hard to justify the reverseds in the previous versions of the code.

(As for your second sentence, I would disagree for general mm databases. In many textbooks, it happens that, say, x denotes an element of an arbitrary group in one chapter, and then a real number in another chapter.)

Regardless of our opinions toward it, I'm pretty sure that the second sentence is part of the spec. From p.114:

A hypothesis is a $f or $e statement. The type declared by a $f statement for a given label is global even if the variable is not (e.g., a database may not have wff P in one local scope and class P in another).

mmverify.py

…et.mm but ymmv

…ts in the couples (typecode, var)

benjub · 2022-05-05T16:10:47Z

@tirix : I hope this answers your remarks ?
Note that mmverify.py (old and new) allows local $v's and $f's (which to me is fine), and also local $c's (which should be changed to global $c's: I'm preparing a PR for that).

benjub · 2022-05-05T16:20:16Z

Regarding the latest (minor) commit: improve consistency: in floating hypotheses, the order is: first typecode, second variable, both in the database (e.g. $f set x $.) and in the internal data structure used by mmverify.py to store it.
This is opposite to the usual writing in type theory (e.g. x:set) but at least this is consistent internally, since the typecode comes first in all Metamath math expressions.

benjub · 2022-05-05T17:53:23Z

Note: 0999f6a introduced a bug. Investigating...

tirix

Looks good to me.
Interesting to know that len(saved_stmts) is less performant.

Thanks for taking my remarks into account, I'm glad I could contribute to a few percents performance improvement!

benjub · 2022-05-06T22:19:40Z

Interesting to know that len(saved_stmts) is less performant.

Well, it seemed a good idea since the length of that array is updated rarely compared to the number of times it is called, and the updates are only of the form "+1". The timings I did did not show significant differences (although they always gave a small advantage to my method). I'm seeing in https://stackoverflow.com/questions/699177/python-do-python-lists-keep-a-count-for-len-or-does-it-count-for-each-call that the length is actually stored (C source code: https://github.com/python/cpython/blob/2.5/Objects/listobject.c#L379). On the other hand, calling len makes a global lookup, which is expensive in Python (compared to using local variables)... All in all, is this optimization worth it ? For the moment, I leave it here.

benjub added 30 commits March 30, 2022 19:14

code formatting (autopep)

caa3ac0

autopep8 --aggressive --aggressive

249282b

remove three unused variables and two superfluous 'reversed'

3e59796

streamline function 'readc'

f1bf63f

two comment typos

546a092

variable name consistency; logging message consistency

b461763

avoid duplicate fetching

92fbdc9

fix bug when begin_label is an -statement

491e277

apply_subst is not a method

889209c

refactor to avoid superfluous calls to make_assertion by decompress_p…

5d3c722

…roof

add function equal_subst; upgrade code from the deprecated optparse t…

562f671

…o argparse

add command arguments: optional verbosity level, and file to verify (…

32c0247

…default=stdin)

update documentation

8de458b

remove need for deques (unchanged behavior provided -statements are n…

5a23858

…ot overridden in the mm database)

minor

7c7373b

minor; remove experimental equal_subst since no gain; add logfile arg…

06808c6

…ument

document code; some streamlining

187ce76

typo

1b01fa2

add comments; update add_f

9ea0d66

fix comment

262c5ce

fix comment

836e3a2

more precise error messages and comments; add entry/exit log messages…

1bf3408

… similar to metamath.c's

remove two uses of 'reversed'; small speedup

0fd9779

check compressed proofs on the fly; ~55% speedup

ffb533b

formatting

a5f0abb

update code from os.path to pathlib; fix a bug when the database cont…

73bd241

…ains an inclusion statement to itself

variable renaming for clarity

707a92e

fix bug when file ends on '$]'

01817a8

fix intermediate code mistake (empty list instead of empty set) which…

6c8ca06

… apparently had no consequences...

fix bug that caused erroneous raised exception message

e2eab24

tirix reviewed May 4, 2022

View reviewed changes

benjub added 5 commits May 5, 2022 17:26

find_vars: return a set and use set comprehension; 2% faster on set.mm

96e6df1

replace .intersection() with set comprehension tricks; 1% faster on s…

0b54499

…et.mm but ymmv

remove some confusing comments; variable naming consistency

84cf129

variable naming consistency; consistency in the order of the componen…

277fdd7

…ts in the couples (typecode, var)

checking that enough subproofs have been saved

0999f6a

improve consistency: first: typecode, second: variable

d7d2985

benjub requested a review from tirix May 5, 2022 16:20

benjub added 3 commits May 5, 2022 18:32

clarify error message

72c2b50

raise if stack empty at end of proof; clear error messages

f1ea8f5

review and clarify all raised messages; aggressive autopep8

a96ed02

benjub added 3 commits May 5, 2022 20:37

fix bug and better logging

1e43d5f

improve logging

4713831

typo

ba7dd37

tirix approved these changes May 6, 2022

View reviewed changes

benjub merged commit f8a00b4 into david-a-wheeler:master May 6, 2022

benjub deleted the faster branch May 6, 2022 23:17

This was referenced May 11, 2022

Some unused variables #4

Closed

Frame in make_assertion appears to be unused. #3

Closed

What does stat_type do? #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster #7

Faster #7

benjub commented Apr 4, 2022

benjub commented May 2, 2022

david-a-wheeler commented May 3, 2022

david-a-wheeler commented May 3, 2022

benjub commented May 3, 2022

david-a-wheeler commented May 3, 2022

david-a-wheeler commented May 3, 2022

tirix left a comment

tirix May 4, 2022

benjub May 5, 2022 •

edited

Loading

tirix May 5, 2022

tirix May 4, 2022

benjub May 5, 2022

digama0 May 5, 2022 •

edited

Loading

benjub commented May 5, 2022

benjub commented May 5, 2022

benjub commented May 5, 2022

tirix left a comment

benjub commented May 6, 2022

Faster #7

Faster #7

Conversation

benjub commented Apr 4, 2022

benjub commented May 2, 2022

david-a-wheeler commented May 3, 2022

david-a-wheeler commented May 3, 2022

benjub commented May 3, 2022

david-a-wheeler commented May 3, 2022

david-a-wheeler commented May 3, 2022

tirix left a comment

Choose a reason for hiding this comment

tirix May 4, 2022

Choose a reason for hiding this comment

benjub May 5, 2022 • edited Loading

Choose a reason for hiding this comment

tirix May 5, 2022

Choose a reason for hiding this comment

tirix May 4, 2022

Choose a reason for hiding this comment

benjub May 5, 2022

Choose a reason for hiding this comment

digama0 May 5, 2022 • edited Loading

Choose a reason for hiding this comment

benjub commented May 5, 2022

benjub commented May 5, 2022

benjub commented May 5, 2022

tirix left a comment

Choose a reason for hiding this comment

benjub commented May 6, 2022

benjub May 5, 2022 •

edited

Loading

digama0 May 5, 2022 •

edited

Loading