Neater hashing interface #4524

widlarizer · 2024-08-06T10:50:29Z

Please read docs/source/yosys_internals/hashing.rst for what the interface is now.

We want to be able to plug-and-play hash functions to improve hashlib structure collision rates and hashing speed. Currently, in some ways, hashes are incorrectly handled: for deep structure hashing, each substructure constructs a hash from scratch, and these hashes are then combined with addition, XOR, sometimes xorshifted to make this less iffy, but overall, this is risky, as it may degrade various hash functions to varying degrees. It seems to me that the correct combination of hashes is to have a hash state that is mutated with each datum hashed in sequence. That's what this PR does.

check performance impact since this PR isn't NFC as it changes how structures are hashed
clean up things left over from experiments with inheriting from Hashable
fix pyosys
finish the plugin compatibility part of the doc

kernel/hashlib.h

widlarizer · 2024-08-22T18:20:10Z

Cool, I got hit in the face with a downright gcc bug

widlarizer · 2024-10-22T13:20:07Z

With ibex and jpeg synthesized with ORFS, I'm seeing a 1% performance regression with this PR. This is probably because we're actually using the seed function more directly, with less xorshifting involved. I wonder if a quick swap of the hashing function would change the result. However, I'm also seeing a 1% memory usage improvement with jpeg, which is pretty interesting

widlarizer · 2024-10-22T16:24:27Z

Passes like extract_fa and opt_dff take 5-10% longer with this change. This is definitely a problem

povik · 2024-10-29T08:40:53Z

Leaving a note so I don't forget it: We should provide a way for plugins to be compatible with both pre and post this change. I am thinking a define to advertise the new API.

docs/source/yosys_internals/hashing.rst

Co-authored-by: KrystalDelusion <[email protected]>

…idelines

jix

Despite the many comments I left, I really like the new hashing interface and think it's a big improvement over the previous one.

I found a bunch of nits, made a few suggestions on how to potentially improve some of the hash_eat implementations, am slightly confused by the current choice of state update in the used hasher implementation and found one instance of unconditionally recomputing an already cached hash.

edit: Oh, and to clarify, I'm fine with deferring my suggestions that would need some more benchmarking to a follow up PR and merge this once everything that's a clear issue right now or a clear benefit to change is addressed.

jix · 2024-11-14T19:11:35Z

kernel/drivertools.h

-		if (hash_ != 0) return hash_;
-
+	void updhash() const {
+		DriveSpec *that = (DriveSpec*)this;


what's that used for? also why was the check whether the hash is already known removed?

that is a dirty const laundering trick yosys had been using before as well. The idea is something like "const methods are allowed to modify members, as long as the data represented by the record is kept equivalent, typically in the operator== meaning of equivalency". I think I've previously successfully replaced this approach by declaring things mutable and should apply it here as well

ah, I somehow missed the const, considering constness this makes sense, but I'd also prefer replacing it with mutable

jix · 2024-11-14T19:14:42Z

kernel/drivertools.h

 	bool operator==(DriveSpec const &other) const {
-		if (size() != other.size() || hash() != other.hash())
+		updhash();


these updhash calls are not guarded by hash_ != 0 checks and the check within that function (or rather the function that previously contained the same logic) was removed, so this will unconditionally recompute the hash with every equality comparison.

jix · 2024-11-14T19:15:14Z

kernel/drivertools.h

+
+inline Hasher DriveSpec::hash_eat(Hasher h) const
+{
+	if (hash_ == 0)


Unlike the calls above updhash call is guarded, but I think it would be simpler to keep the check in updhash itself.

jix · 2024-11-14T19:21:04Z

kernel/hashlib.h

+	public:
+	void hash32(uint32_t i) {
+		state = djb2_xor(i, state);
+		state = mkhash_xorshift(fudge ^ state);


I was expecting to only see a djb hash update in here, but it is followed up by a xorshift. What's the reason for doing both? Without benchmarking I would be guessing that performing a djb hash update and a xorshift update is slower than either alone without necessarily having better overall runtime behavior than the better behaved one of both.

In an intermediate state, I found it necessary for lowering collisions and it doesn't cost so much as to cause a measurable regression against main on the ASIC designs I used to check. Without it, DJB2 doesn't have an avalanche property. Also see this observation basically about patterns like hash = mkhash(xorshift(mkhash(a, b)), c)

jix · 2024-11-14T19:24:40Z

kernel/hashlib.h

-		return v;
+	static inline Hasher hash_eat(const T &a, Hasher h) {
+		if constexpr (std::is_same_v<T, bool>) {
+			h.hash32(a ? 1 : 0);


nit: I would find a cast to an int type cleaner for converting a bool to a 0/1 int here. I'd expect an optimizing compiler to generate the same code, but since I find the two variants equally easy to read I'd prefer the one that doesn't require an optimization to result in the code we want.

Right, good to dispell another remnant of my C paranoia caused by embedded lore - true bools of value other than zero don't exist with C++ bool or C stdbool.h bool

jix · 2024-11-14T20:00:22Z

kernel/rtlil.h

-		#undef YOSYS_NO_IDS_REFCNT
-
-		// the global id string cache
+struct RTLIL::IdString


Sadly moving this turns the diff into a mess. Are there any functional changes to this besides updating the hashing implementation?

The problem is that rtlil.h both creates per-type partial specializations for hash_top_ops defined in hashlib.h as well as uses data structures defined in hashlib.h. No functional changes other than hash_top_ops and hash_eat.

jix · 2024-11-14T20:06:38Z

kernel/rtlil.h

+	inline Hasher hash_eat(Hasher h) const {
+		// TODO hash size
+		for (auto b : *this)
+			h.eat(b);


assuming my suggestion of hashing strings in chunks is implemented, the vector of State enums could also be hashed as if it were a string. If this is using a string representation the same is true. I'm assuming this would cause issues when we have two equal consts, one represented as states one as string. I still have to take a closer look at how the new string representation for constants is implemented to say more on this.

We need a string represented as string vs as bit vector to have equal hashes, so we have to go bit by bit. To hash a contiguous chunk, for string represented Const, we'd need to first construct a new bit vector to run it on that. So it doesn't sound like it pays off

jix · 2024-11-14T20:07:55Z

kernel/rtlil.h

 	size_t get_hash() const {
-		if (!hash_) hash();
-		return hash_;
+		log_assert(false && "deprecated");


If we log_assert(false) anyway, i.e. already remove support, why not outright remove it instead of deprecating it?

jix · 2024-11-14T20:13:45Z

kernel/hashlib.h

+			return hash_ops<uintptr_t>::hash_eat((uintptr_t) a, h);
+		} else if constexpr (std::is_same_v<T, std::string>) {
+			for (auto c : a)
+				h.hash32(c);


what I wrote for the c string hash_eat ~~below~~ above also applies here, but I initially missed that std::string is handled here

jix · 2024-11-14T20:19:40Z

kernel/drivertools.h

@@ -1181,7 +1111,8 @@ struct DriverMap
 		bool operator==(const DriveBitId &other) const { return id == other.id; }
 		bool operator!=(const DriveBitId &other) const { return id != other.id; }
 		bool operator<(const DriveBitId &other) const { return id < other.id; }
-		unsigned int hash() const { return id; }
+		// unsigned int hash() const { return id; }


nit: commented out code

povik · 2024-11-14T21:25:18Z

docs/source/yosys_internals/hashing.rst

+Let's first take a look at the external interface on a simplified level.
+Generally, to get the hash for ``T obj``, you would call the utility function
+``run_hash<T>(const T& obj)``, corresponding to ``hash_top_ops<T>::hash(obj)``,
+the default implementation of which is ``hash_ops<T>::hash_eat(Hasher(), obj)``.


I much prefer hash_into for the name of the method here, while .eat stays being the method on the Hasher

povik · 2024-11-14T21:29:58Z

docs/source/yosys_internals/hashing.rst

+
+   inline unsigned int T::hash() const {
+       Hasher h;
+       return (unsigned int)hash_eat(h).yield();


I don't understand, if I compile my plugin against v0.47 or earlier I won't have Hasher

povik · 2024-11-14T21:34:46Z

docs/source/yosys_internals/hashing.rst

+current interface and redirecting the legacy one:
+
+``void Hasher::eat(const T& t)`` hashes ``t`` into its internal state by also
+redirecting to ``hash_ops<T>``


Is this done for legacy reasons? I am not clear on why this paragraph is below "Porting plugins from the legacy interface"

povik · 2024-11-14T21:36:05Z

docs/source/yosys_internals/hashing.rst

@@ -0,0 +1,153 @@
+Hashing and associative data structures in Yosys
+------------------------------------------------


Leaving a note that I read this file

povik · 2024-11-14T21:38:12Z

docs/source/yosys_internals/hashing.rst

+
+DJB2 lacks these properties. Instead, since Yosys hashes large numbers of data
+structures composed of incrementing integer IDs, Yosys abuses the predictability
+of DJB2 to get lower hash collisions, with regular nature of the hashes


Is it to get lower hash collisions or to get better locality?

Hash collisions

Does this come from observations, or something Claire mentioned was intention? I know some of the primitives were used in a way to get better locality

This comes from my observations when counting hash collisions in hashlib per std::source_location and clear correlation with extra runtime overhead in some opt and extract_fa passes where the collisions were happening

povik reviewed Aug 12, 2024

View reviewed changes

kernel/hashlib.h Show resolved Hide resolved

widlarizer force-pushed the emil/hashlib-interface branch from 6f9aefc to 8bb5c1f Compare August 22, 2024 21:49

This was referenced Aug 27, 2024

proc_dff: process sync rules in reverse input order #4568

Closed

proc_dff: respect sync rule priorities when generating complex dffsrs #4569

Merged

widlarizer force-pushed the emil/hashlib-interface branch 2 times, most recently from de7d1b3 to c3c3a7e Compare September 3, 2024 11:31

povik mentioned this pull request Sep 9, 2024

clockgate: centralize clock enables out of FFs #4583

Merged

2 tasks

widlarizer force-pushed the emil/hashlib-interface branch from fcbf0d3 to 68e40d8 Compare October 1, 2024 13:12

widlarizer mentioned this pull request Oct 9, 2024

Break tests with a small hash function perturbation #4559

Closed

widlarizer force-pushed the emil/hashlib-interface branch from ead99bb to 6a570b3 Compare October 18, 2024 10:05

widlarizer marked this pull request as ready for review October 18, 2024 21:02

widlarizer requested review from zachjs and whitequark as code owners October 18, 2024 21:02

widlarizer mentioned this pull request Oct 18, 2024

opt_merge: hashing performance and correctness #4677

Draft

widlarizer requested a review from KrystalDelusion as a code owner November 6, 2024 17:14

KrystalDelusion reviewed Nov 6, 2024

View reviewed changes

docs/source/yosys_internals/hashing.rst Outdated Show resolved Hide resolved

widlarizer added 10 commits November 14, 2024 14:43

hashlib: redo interface for flexibility

ea0be3b

driver: add --hash-seed

fdcca7c

abc: sort stats

0c1bc86

hashlib: fix pyosys

a2880c9

hashlib: only include in one place

3e6e5be

hashlib: use hash_t across the board

4923f12

hashlib: hash_t can be set to 64-bit

7cbdc92

hashlib: fudge always

0e6f631

hashlib: don't xorshift in between upper and lower word

8f160c5

hashlib: allow forcing Hasher state, use it for IdString trivial hashing

f79e634

widlarizer and others added 12 commits November 14, 2024 14:43

hashlib: prevent naive hashing of IdString when hashing SigBit

3da1d26

hash: solo hashing interface, override for SigBit

aacb9ca

hashlib: restore hash_obj_ops for pointers to indexed types

d4b7bf3

hashlib: remove is_new from HasherDJB32, implement hash_top for IdString

f83d6af

hashlib: run_hash uses hash_top_ops, not hash_ops

950478e

docs: document the ideas behind the hashing interface

5dc3cb1

Docs: Formatting and fixes

3406b20

docs: formatting and fixes

9fcbcf3

Co-authored-by: KrystalDelusion <[email protected]>

docs: move hashing-based container details into internal docs from gu…

795c0fb

…idelines

hashlib: add deprecated mkhash function to prevent plugin breakage

7bffd7c

hashlib: acc -> eat

f2f7d3e

hashlib: legacy mkhash_add -> djb2_add

5584e74

widlarizer force-pushed the emil/hashlib-interface branch from cf5585e to 5584e74 Compare November 14, 2024 13:44

jix requested changes Nov 14, 2024

View reviewed changes

povik reviewed Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neater hashing interface #4524

Neater hashing interface #4524

widlarizer commented Aug 6, 2024 •

edited

Loading

widlarizer commented Aug 22, 2024

widlarizer commented Oct 22, 2024

widlarizer commented Oct 22, 2024

povik commented Oct 29, 2024

jix left a comment •

edited

Loading

jix Nov 14, 2024

widlarizer Nov 15, 2024

jix Nov 15, 2024

jix Nov 14, 2024

jix Nov 14, 2024

jix Nov 14, 2024

widlarizer Nov 15, 2024

jix Nov 14, 2024

widlarizer Nov 15, 2024

jix Nov 14, 2024

widlarizer Nov 15, 2024

jix Nov 14, 2024

widlarizer Nov 15, 2024

jix Nov 14, 2024

jix Nov 14, 2024 •

edited

Loading

jix Nov 14, 2024

povik Nov 14, 2024

povik Nov 14, 2024

povik Nov 14, 2024

povik Nov 14, 2024

povik Nov 14, 2024

widlarizer Nov 15, 2024

povik Nov 15, 2024

widlarizer Nov 15, 2024

		@@ -0,0 +1,153 @@
		Hashing and associative data structures in Yosys
		------------------------------------------------

Neater hashing interface #4524

Are you sure you want to change the base?

Neater hashing interface #4524

Conversation

widlarizer commented Aug 6, 2024 • edited Loading

widlarizer commented Aug 22, 2024

widlarizer commented Oct 22, 2024

widlarizer commented Oct 22, 2024

povik commented Oct 29, 2024

jix left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jix Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

widlarizer commented Aug 6, 2024 •

edited

Loading

jix left a comment •

edited

Loading

jix Nov 14, 2024 •

edited

Loading