Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf-test get_used_symbols and its variants #324

Open
Adda0 opened this issue Sep 6, 2023 · 1 comment
Open

Perf-test get_used_symbols and its variants #324

Adda0 opened this issue Sep 6, 2023 · 1 comment
Assignees
Labels
For:library The issue is related to library (c++ implementation) Module:nfa The issue is related to Nondeterministic Finite Automata Priority:normal Work on this sooner rather than later. Type:required A required implementation/change necessary in near future

Comments

@Adda0
Copy link
Collaborator

Adda0 commented Sep 6, 2023

There are numerous variants of get_used_symbols(). We need to perf-test them and keep only the most performant one. The remaining variants should be removed.

@Adda0 Adda0 added For:library The issue is related to library (c++ implementation) Module:nfa The issue is related to Nondeterministic Finite Automata Type:required A required implementation/change necessary in near future Priority:normal Work on this sooner rather than later. labels Sep 6, 2023
@koniksedy koniksedy self-assigned this Nov 1, 2024
@koniksedy
Copy link
Collaborator

koniksedy commented Nov 21, 2024

The performance of different implementations of the method get_used_states (implemented by @kilohsakul) was tested on real-world and random automata. Interestingly, the performance improves with repeated calls to the method (to get the average performance). Sometimes it even changes which implementation is the fastest. Therefore we measured the average performance and the performance of the first call.

Tested implementations:

  1. vec - using get_used_symbols_vec() with std::vector
  2. set - using get_used_symbols_set() with std::set
  3. sps - get_used_symbols_sps() with utils::SparseSet
  4. bv1 - get_used_symbols_bv() with utils::BitVector and without preallocation of the returned vector
  5. bv2 - get_used_symbols_bv() with utils::BitVector and with preallocation of the returned vector
  6. chv - get_used_symbols_ch() with BoolVector that is inherited from std::vector<uint8_t>

Tested automata:

  1. Random Tree Automata
  2. Automata for Regualr Expression .*{n}
  3. Tabakov-Vardi Random Automata
  4. ARMC Automata
  5. Netbench Symbolic Automata
  6. Automata for Email Validation
  7. Automat for Regular Expressions

_STATIC_DATA_STRUCTURES_

Static data structures drastically improve the performance of the sparse set. However, we present only the results of benchmarks without the static data structure, as the sparse set is not our preferred choice for the get_used_states implementation.

Cactus Plot With and Without _STATIC_DATA_STRUCTURES_

Overall

The vector implementation (vec) we currently use seems to be the slowest variant. The bitvector (bv1, bv2) and BoolVector (chv) implementations appear to be the best. It is necessary to confirm this by running experiments with Z3-Noodler.

Results

Random Tree Automata

Automata were generated with 20 branches, each containing $|\Sigma|$ symbols with p symbols per transition. The depth of each branch was determined as $d \sim |\Sigma| / p$.

Runtime

Runtime based on the alphabet size $|\Sigma|$ and the number of symbols per transition $p$.
avg

Cactus Plot

Scatter Plot Matrix

scatter-avg

Automata for Regular Expression .*{n}

Automata were generated for different n and the size of the alphabet $|\Sigma|$.

Runtime

Runtime based on the alphabet size $|\Sigma|$ and the number of self-loops.
avg

Cactus Plot

Scatter Plot Matrix

scatter-avg

Tabakov-Vardi Random Automata

Automata were generated using the function crrate_random_nfa_tabakov_vardi() with different numbers of states, alphabet size $|\Sigma|$, and transition density.

Runtime

Runtime based on the alphabet size $|\Sigma|$, transition density, and the number of states.
avg

Cactus Plot

Scatter Plot Matrix

scatter-avg

ARMC Automata

Cactus Plot

Scatter Plot Matrix

scatter-avg

Netbench Symbolic Automata

Cactus Plot

Scatter Plot Matrix

scatter-avg

Automata for Email Validation

Cactus Plot

Scatter Plot Matrix

scatter-avg

Automat for Regular Expressions

Cactus Plot

Here, the **vec** variant was the best on average but not so good regarding the single run.

Scatter Plot Matrix

scatter-avg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For:library The issue is related to library (c++ implementation) Module:nfa The issue is related to Nondeterministic Finite Automata Priority:normal Work on this sooner rather than later. Type:required A required implementation/change necessary in near future
Projects
None yet
Development

No branches or pull requests

2 participants