-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Neater hashing interface #4524
base: main
Are you sure you want to change the base?
Neater hashing interface #4524
Changes from all commits
ea0be3b
fdcca7c
0c1bc86
a2880c9
3e6e5be
4923f12
7cbdc92
0e6f631
8f160c5
f79e634
3da1d26
aacb9ca
d4b7bf3
f83d6af
950478e
5dc3cb1
3406b20
9fcbcf3
795c0fb
7bffd7c
f2f7d3e
5584e74
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
Hashing and associative data structures in Yosys | ||
------------------------------------------------ | ||
|
||
Container classes based on hashing | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Yosys uses ``dict<K, T>`` and ``pool<T>`` as main container classes. | ||
``dict<K, T>`` is essentially a replacement for ``std::unordered_map<K, T>`` | ||
and ``pool<T>`` is a replacement for ``std::unordered_set<T>``. | ||
The main characteristics are: | ||
|
||
* ``dict<K, T>`` and ``pool<T>`` are about 2x faster than the std containers | ||
(though this claim hasn't been verified for over 10 years) | ||
|
||
* references to elements in a ``dict<K, T>`` or ``pool<T>`` are invalidated by | ||
insert and remove operations (similar to ``std::vector<T>`` on ``push_back()``). | ||
|
||
* some iterators are invalidated by ``erase()``. specifically, iterators | ||
that have not passed the erased element yet are invalidated. (``erase()`` | ||
itself returns valid iterator to the next element.) | ||
|
||
* no iterators are invalidated by ``insert()``. elements are inserted at | ||
``begin()``. i.e. only a new iterator that starts at ``begin()`` will see the | ||
inserted elements. | ||
|
||
* the method ``.count(key, iterator)`` is like ``.count(key)`` but only | ||
considers elements that can be reached via the iterator. | ||
|
||
* iterators can be compared. ``it1 < it2`` means that the position of ``t2`` | ||
can be reached via ``t1`` but not vice versa. | ||
|
||
* the method ``.sort()`` can be used to sort the elements in the container | ||
the container stays sorted until elements are added or removed. | ||
|
||
* ``dict<K, T>`` and ``pool<T>`` will have the same order of iteration across | ||
all compilers, standard libraries and architectures. | ||
|
||
In addition to ``dict<K, T>`` and ``pool<T>`` there is also an ``idict<K>`` that | ||
creates a bijective map from ``K`` to the integers. For example: | ||
|
||
:: | ||
|
||
idict<string, 42> si; | ||
log("%d\n", si("hello")); // will print 42 | ||
log("%d\n", si("world")); // will print 43 | ||
log("%d\n", si.at("world")); // will print 43 | ||
log("%d\n", si.at("dummy")); // will throw exception | ||
log("%s\n", si[42].c_str())); // will print hello | ||
log("%s\n", si[43].c_str())); // will print world | ||
log("%s\n", si[44].c_str())); // will throw exception | ||
|
||
It is not possible to remove elements from an idict. | ||
|
||
Finally ``mfp<K>`` implements a merge-find set data structure (aka. disjoint-set | ||
or union-find) over the type ``K`` ("mfp" = merge-find-promote). | ||
|
||
The hash function | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
The hash function generally used in Yosys is the XOR version of DJB2: | ||
|
||
:: | ||
|
||
state = ((state << 5) + state) ^ value | ||
|
||
This is an old-school hash designed to hash ASCII characters. Yosys doesn't hash | ||
a lot of ASCII text, but it still happens to be a local optimum due to factors | ||
described later. | ||
|
||
Hash function quality is multi-faceted and highly dependent on what is being | ||
hashed. Yosys isn't concerned by any cryptographic qualities, instead the goal | ||
is minimizing total hashing collision risk given the data patterns within Yosys. | ||
In general, a good hash function typically folds values into a state accumulator | ||
with a mathematical function that is fast to compute and has some beneficial | ||
properties. One of these is the avalanche property, which demands that a small | ||
change such as flipping a bit or incrementing by one in the input produces a | ||
large, unpredictable change in the output. Additionally, the bit independence | ||
criterion states that any pair of output bits should change independently when | ||
any single input bit is inverted. These properties are important for avoiding | ||
hash collision on data patterns like the hash of a sequence not colliding with | ||
its permutation, not losing from the state the information added by hashing | ||
preceding elements, etc. | ||
|
||
DJB2 lacks these properties. Instead, since Yosys hashes large numbers of data | ||
structures composed of incrementing integer IDs, Yosys abuses the predictability | ||
of DJB2 to get lower hash collisions, with regular nature of the hashes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it to get lower hash collisions or to get better locality? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hash collisions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this come from observations, or something Claire mentioned was intention? I know some of the primitives were used in a way to get better locality There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comes from my observations when counting hash collisions in hashlib per |
||
surviving through the interaction with the "modulo prime" operations in the | ||
associative data structures. For example, some most common objects in Yosys are | ||
interned ``IdString``\ s of incrementing indices or ``SigBit``\ s with bit | ||
offsets into wire (represented by its unique ``IdString`` name) as the typical | ||
case. This is what makes DJB2 a local optimum. Additionally, the ADD version of | ||
DJB2 (like above but with addition instead of XOR) is used to this end for some | ||
types, abandoning the general pattern of folding values into a state value. | ||
|
||
Making a type hashable | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Let's first take a look at the external interface on a simplified level. | ||
Generally, to get the hash for ``T obj``, you would call the utility function | ||
``run_hash<T>(const T& obj)``, corresponding to ``hash_top_ops<T>::hash(obj)``, | ||
the default implementation of which is ``hash_ops<T>::hash_eat(Hasher(), obj)``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I much prefer |
||
``Hasher`` is the class actually implementing the hash function, hiding its | ||
initialized internal state, and passing it out on ``hash_t yield()`` with | ||
perhaps some finalization steps. | ||
|
||
``hash_ops<T>`` is the star of the show. By default it pulls the ``Hasher h`` | ||
through a ``Hasher T::hash_eat(Hasher h)`` method. That's the method you have to | ||
implement to make a record (class or struct) type easily hashable with Yosys | ||
hashlib associative data structures. | ||
|
||
``hash_ops<T>`` is specialized for built-in types like ``int`` or ``bool`` and | ||
treats pointers the same as integers, so it doesn't dereference pointers. Since | ||
many RTLIL data structures like ``RTLIL::Wire`` carry their own unique index | ||
``Hasher::hash_t hashidx_;``, there are specializations for ``hash_ops<Wire*>`` | ||
and others in ``kernel/hashlib.h`` that actually dereference the pointers and | ||
call ``hash_eat`` on the instances pointed to. | ||
|
||
``hash_ops<T>`` is also specialized for simple compound types like | ||
``std::pair<U>`` by calling hash_eat in sequence on its members. For flexible | ||
size containers like ``std::vector<U>`` the size of the container is hashed | ||
first. That is also how implementing hashing for a custom record data type | ||
should be - unless there is strong reason to do otherwise, call ``h.eat(m)`` on | ||
the ``Hasher h`` you have received for each member in sequence and ``return | ||
h;``. If you do have a strong reason to do so, look at how | ||
``hash_top_ops<RTLIL::SigBit>`` is implemented in ``kernel/rtlil.h``. | ||
|
||
Porting plugins from the legacy interface | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Previously, the interface to implement hashing on custom types was just | ||
``unsigned int T::hash() const``. This meant hashes for members were computed | ||
independently and then ad-hoc combined with the hash function with some xorshift | ||
operations thrown in to mix bits together somewhat. A plugin can stay compatible | ||
with both versions prior and after the break by implementing the aforementioned | ||
current interface and redirecting the legacy one: | ||
|
||
``void Hasher::eat(const T& t)`` hashes ``t`` into its internal state by also | ||
redirecting to ``hash_ops<T>`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this done for legacy reasons? I am not clear on why this paragraph is below "Porting plugins from the legacy interface" |
||
|
||
.. code-block:: cpp | ||
:caption: Example hash compatibility wrapper | ||
:name: hash_plugin_compat | ||
|
||
inline unsigned int T::hash() const { | ||
Hasher h; | ||
return (unsigned int)hash_eat(h).yield(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand, if I compile my plugin against v0.47 or earlier I won't have |
||
} | ||
|
||
To get hashes for Yosys types, you can temporarily use the templated deprecated | ||
``mkhash`` function until the majority of your plugin's users switch to a newer | ||
version and live with the warnings, or set up a custom ``#ifdef``-based solution | ||
if you really need to. | ||
Feel free to contact Yosys maintainers with related issues. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,3 +38,4 @@ as reference to implement a similar system in any language. | |
formats/index | ||
extending_yosys/index | ||
techmap | ||
hashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving a note that I read this file