Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sim642/ppx_deriving_hash #542

Merged
merged 23 commits into from
Jan 28, 2022
Merged

Add sim642/ppx_deriving_hash #542

merged 23 commits into from
Jan 28, 2022

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Jan 18, 2022

Part of #31.

A while back I toyed around with ppx derivers and, as mentioned in the above issue, implemented https://github.com/sim642/ppx_deriving_hash to do [@@deriving hash]. This is simpler and without dependencies, unlike ppx_hash, which require's Jane Street's Base to work.

This revealed #541, which has nothing to do with hashing. It's just that previously by sheer luck, the tuples with such different precisions had different hashes, thus they never fell into the same hashtable buckets to need equal comparison.

Another open question is, what are the best hash functions to derive for primitives, products (tuples, records) and sums (variants). Our manual implementations include both Hashtbl.hash based things and arbitrary calculations with magic constants.

TODO

  • Benchmark with derived hash implementations.

@sim642 sim642 added cleanup Refactoring, clean-up type-safety Type-safety improvements labels Jan 18, 2022
@sim642 sim642 added pr-dependency Depends or builds on another PR, which should be merged before and removed pr-dependency Depends or builds on another PR, which should be merged before labels Jan 19, 2022
@sim642
Copy link
Member Author

sim642 commented Jan 21, 2022

I ran it before vs after on sv-benchmarks ConcurrencySafety and SoftwareSystems and to my surprise, it even got ~5% faster:
image
Full results table.

I haven't put any thought into optimizing ppx_deriving_hash, so I didn't expect this. If I were to guess, this improvement is just due to replacing some Hashtbl.hash with a proper implementation. IntDomain had a many of those accidentally overridden via include Std, so that might make an impact with integers being in all contexts of all nodes.

Copy link
Member

@michael-schwarz michael-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only question is if we want to fork the ppx_deriver under our organization, such that all developers can potentially work on it?

@sim642
Copy link
Member Author

sim642 commented Jan 24, 2022

My only question is if we want to fork the ppx_deriver under our organization, such that all developers can potentially work on it>

I thought about that, but all the other ppx forks have been under personal accounts as well:

Given the obscurity of ppx deriver internals, I'm doubtful that anyone else would currently work on it anyway and deal with its publishing (which I'm planning on doing soon to get rid of the pin). If there's ever real need, then it's still possible to transfer or fork it under the organization.

@sim642
Copy link
Member Author

sim642 commented Jan 26, 2022

The package has been accepted to opam: ocaml/opam-repository#20555. Once it becomes available through the website, I will remove the pin, fix conflicts and do the merge.

@sim642 sim642 self-assigned this Jan 26, 2022
@sim642
Copy link
Member Author

sim642 commented Jan 27, 2022

There's some very weird crash in incremental tests now:

Fatal error: exception Invalid_argument("List.iter2")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from UpdateCil.update_ids.reset_fun in file "src/incremental/updateCil.ml", line 48, characters 4-74
Called from Stdlib__List.iter in file "list.ml", line 110, characters 12-15
Called from UpdateCil.update_ids in file "src/incremental/updateCil.ml", line 126, characters 2-43
Called from Maingoblint.diff_and_rename in file "src/maingoblint.ml", line 454, characters 22-92
Called from Maingoblint.main in file "src/maingoblint.ml", line 506, characters 110-130
Called from Stdlib.at_exit.new_exit in file "stdlib.ml", line 560, characters 59-63
Called from Stdlib.do_at_exit in file "stdlib.ml" (inlined), line 566, characters 20-61
Called from Std_exit in file "std_exit.ml", line 18, characters 8-20

It's somehow related to Node.hash. The derived version causes the crash, but the slightly modified version does not:

let hash = function
  (* crashes *)
  (* | Statement x0 -> (31 * 0) + (CilType.Stmt.hash x0) *)
  (* does not crash *)
  | Statement x0 -> 1 + (CilType.Stmt.hash x0)
  | FunctionEntry x0 -> (31 * 1) + (CilType.Fundec.hash x0)
  | Function x0 -> (31 * 2) + (CilType.Fundec.hash x0)

Is CompareAST, CompareCFG or UpdateCil somehow assuming that the hash of a statement node does not equal the hash of the statement itself? @jerhard @stilscher I'm extremely confused how this could cause List.iter2 to go wrong.

@sim642
Copy link
Member Author

sim642 commented Jan 27, 2022

It's also somehow related to includes/pthread.c. If that is excluded, or even if just init_routine() is removed, the crash disappears. But what's confusing is that the crash happens during compare/update of main, not pthread_once.

@sim642
Copy link
Member Author

sim642 commented Jan 28, 2022

The opam package is now published: https://opam.ocaml.org/packages/ppx_deriving_hash/. I've removed the opam pin and am merging this.

@sim642 sim642 merged commit 15ea808 into master Jan 28, 2022
@sim642 sim642 deleted the ppx_deriving_hash branch January 28, 2022 12:19
sim642 added a commit that referenced this pull request Jan 28, 2022
@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Refactoring, clean-up type-safety Type-safety improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants