Skip to content

Commit

Permalink
Merge pull request #465 from VeriFIT/transducers
Browse files Browse the repository at this point in the history
Implement nondeterministic finite transducers (NFTs) #minor
  • Loading branch information
Adda0 authored Dec 2, 2024
2 parents c211c86 + 4edb9d7 commit c277508
Show file tree
Hide file tree
Showing 47 changed files with 14,742 additions and 122 deletions.
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![Python-Binding (build-&-test)](https://github.com/VeriFIT/mata/actions/workflows/python-binding.yml/badge.svg?branch=devel)](https://github.com/VeriFIT/mata/actions/workflows/python-binding.yml)
[![codecov](https://codecov.io/gh/VeriFIT/mata/branch/devel/graph/badge.svg?token=9VAVD19N4D)](https://codecov.io/gh/VeriFIT/mata)

Mata is an open source automata library that offers interface for different kinds of automata (NFA, etc.). Currently, Mata offers two interfaces:
Mata is an open source automata library that offers interface for different kinds of automata, e.g. (non-)deterministic finite automata (NFAs), (non-)deterministic finite transducers, etc. Currently, Mata offers two interfaces:

1. An efficient library implemented in C/C++
2. A flexible wrapper implemented in Python that uses the efficient library
Expand Down Expand Up @@ -215,6 +215,29 @@ The usage of the binding copies (to certain levels) the usage of the C++ library

You can either run your scripts directly using `python` or compile it using the `cython` project.

# Supported Automata Models

While we try to keep the provided libray interface stable, Mata is still a project in its infancy.
Hence, the interface is subject to change.
For further information, see section [Versioning](README.md#Versioning).

Mata currently supports (non-)deterministic finite automata (NFAs), defined in `include/mata/nfa/`.

> [!WARNING]
> Mata provides an experimental (unstable) support for (non-)deterministic finite transducers (NFTs), defined in `include/mata/nft/`.
> The provided interface may change.
# Versioning

Mata follows the versioning scheme `GENERATION.MAJOR.MINOR`, e.g. `1.2.3` where `1` is the generation number, and `2` the major version, and `3` the minor version.
The `MAJOR` and `MINOR` versions follow loosely the [Semantic Versioning](https://semver.org/) scheme.

- The `GENERATION` number represents our subjective generation of the Mata library.
Typically, we increment the `GENERATION` number when a substantial change to the library is introduced, such as adding support for new automata models, extensive rewrite of (a part of) the Mata library, or to denote a new scientific paper documenting the latest developments in the Mata library is published, etc.
- Generally, the changes in the `MINOR` version should not break the existing interface, but exceptions might occur.
- The changes in the `MAJOR` version might introduce breaking changes that modify the existing interface and might require changing the code in your projects that use the Mata library.


# Publications
- Chocholatý, D., Fiedor, T., Havlena, V., Holík, L., Hruška, M., Lengál, O., & Síč, J. (2023). [Mata, a Fast and Simple Finite Automata Library](https://doi.org/10.1007/978-3-031-57249-4_7). In *Proc. of TACAS'24*, volume 14571 of LNCS, pages 130-151, 2024. Springer.
- Chocholatý, D., Fiedor, T., Havlena, V., Holík, L., Hruška, M., Lengál, O., Síč, J.: [A replication package for reproducing the results of paper “Mata: A fast and simple finite automata library”](https://doi.org/10.5281/zenodo.10044515) (Oct 2023).
Expand Down
2 changes: 1 addition & 1 deletion bindings/python/libmata/nfa/nfa.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ cdef extern from "mata/nfa/nfa.hh" namespace "mata::nfa":
# Constructor
CNfa() except +
CNfa(unsigned long) except +
CNfa(unsigned long, StateSet, StateSet, CAlphabet*)
CNfa(unsigned long, CSparseSet[State], CSparseSet[State], CAlphabet*)
CNfa(const CNfa&)

# Public Functions
Expand Down
4 changes: 2 additions & 2 deletions bindings/python/libmata/nfa/nfa.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -197,11 +197,11 @@ cdef class Nfa:
:param alph.Alphabet alphabet: alphabet corresponding to the automaton
"""
cdef CAlphabet* c_alphabet = NULL
cdef StateSet empty_default_state_set
cdef CSparseSet[State] empty_default_sparse_set
if alphabet:
c_alphabet = alphabet.as_base()
self.thisptr = make_shared[CNfa](
mata_nfa.CNfa(state_number, empty_default_state_set, empty_default_state_set, c_alphabet)
mata_nfa.CNfa(state_number, empty_default_sparse_set, empty_default_sparse_set, c_alphabet)
)
self.label = label

Expand Down
26 changes: 18 additions & 8 deletions include/mata/alphabet.hh
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,15 @@ public:

bool operator==(const Alphabet &) const = delete;

/**
* Checks whether the alphabet has any symbols.
*/
virtual bool empty() const = 0;

virtual void clear() { throw std::runtime_error("Unimplemented"); }

protected:
virtual const void *address() const { return this; }
virtual const void* address() const { return this; }
}; // class Alphabet.

/**
Expand Down Expand Up @@ -117,6 +122,8 @@ public:

IntAlphabet& operator=(const IntAlphabet& int_alphabet) = delete;

bool empty() const override { return false; }

void clear() override { throw std::runtime_error("Nonsensical use of clear() on IntAlphabet."); }

protected:
Expand Down Expand Up @@ -175,17 +182,16 @@ public:
explicit EnumAlphabet(const EnumAlphabet* const alphabet): EnumAlphabet(*alphabet) {}
EnumAlphabet(EnumAlphabet&& rhs) = default;

EnumAlphabet& operator=(const EnumAlphabet& rhs) = default;
EnumAlphabet& operator=(EnumAlphabet&& rhs) = default;

utils::OrdVector<Symbol> get_alphabet_symbols() const override { return symbols_; }
utils::OrdVector<Symbol> get_complement(const utils::OrdVector<Symbol>& symbols) const override {
return symbols_.difference(symbols);
}

std::string reverse_translate_symbol(Symbol symbol) const override;

public:
EnumAlphabet& operator=(const EnumAlphabet& rhs) = default;
EnumAlphabet& operator=(EnumAlphabet&& rhs) = default;

/**
* @brief Expand alphabet by symbols from the passed @p symbols.
*
Expand Down Expand Up @@ -239,6 +245,8 @@ public:
*/
size_t get_number_of_symbols() const { return symbols_.size(); }

bool empty() const override { return symbols_.empty(); }

private:
mata::utils::OrdVector<Symbol> symbols_{}; ///< Map of string transition symbols to symbol values.
Symbol next_symbol_value_{ 0 }; ///< Next value to be used for a newly added symbol.
Expand Down Expand Up @@ -293,9 +301,6 @@ public:
explicit OnTheFlyAlphabet(const OnTheFlyAlphabet* const alphabet): OnTheFlyAlphabet(*alphabet) {}
explicit OnTheFlyAlphabet(StringToSymbolMap str_sym_map) : symbol_map_(std::move(str_sym_map)) {}

OnTheFlyAlphabet& operator=(const OnTheFlyAlphabet& rhs) = default;
OnTheFlyAlphabet& operator=(OnTheFlyAlphabet&& rhs) = default;

/**
* Create alphabet from a list of symbol names.
* @param symbol_names Names for symbols on transitions.
Expand All @@ -321,6 +326,9 @@ public:
std::string reverse_translate_symbol(Symbol symbol) const override;

public:
OnTheFlyAlphabet& operator=(const OnTheFlyAlphabet& rhs) = default;
OnTheFlyAlphabet& operator=(OnTheFlyAlphabet&& rhs) = default;

/**
* @brief Expand alphabet by symbols from the passed @p symbol_names.
*
Expand Down Expand Up @@ -391,6 +399,8 @@ public:
*/
const StringToSymbolMap& get_symbol_map() const { return symbol_map_; }

bool empty() const override { return symbol_map_.empty(); }

private:
StringToSymbolMap symbol_map_{}; ///< Map of string transition symbols to symbol values.
Symbol next_symbol_value_{}; ///< Next value to be used for a newly added symbol.
Expand Down
2 changes: 1 addition & 1 deletion include/mata/nfa/algorithms.hh
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Nfa product(const Nfa& lhs, const Nfa& rhs, const std::function<bool(State,State
* Supports epsilon symbols when @p use_epsilon is set to true.
* @param[in] lhs First automaton to concatenate.
* @param[in] rhs Second automaton to concatenate.
* @param[in] epsilon Epsilon to be used co concatenation (provided @p use_epsilon is true)
* @param[in] epsilon Epsilon to be used for concatenation (provided @p use_epsilon is true)
* @param[in] use_epsilon Whether to concatenate over epsilon symbol.
* @param[out] lhs_state_renaming Map mapping lhs states to result states.
* @param[out] rhs_state_renaming Map mapping rhs states to result states.
Expand Down
58 changes: 29 additions & 29 deletions include/mata/nfa/delta.hh
Original file line number Diff line number Diff line change
Expand Up @@ -118,13 +118,13 @@ public:
using super::empty, super::size;
using super::to_vector;
// dangerous, breaks the sortedness invariant
using super::emplace_back;
// dangerous, breaks the sortedness invariant
using super::push_back;
using super::push_back, super::emplace_back;
// is adding non-const version as well ok?
using super::front;
using super::back;
using super::pop_back;
using super::filter;
using super::clear;

using super::erase;

Expand Down Expand Up @@ -301,50 +301,50 @@ public:
};

/**
* @brief Get constant reference to the state post of @p src_state.
* @brief Get constant reference to the state post of @p source.
*
* If we try to access a state post of a @p src_state which is present in the automaton as an initial/final state,
* If we try to access a state post of a @p source which is present in the automaton as an initial/final state,
* yet does not have allocated space in @c Delta, an @c empty_post is returned. Hence, the function has no side
* effects (no allocation is performed; iterators remain valid).
* @param state_from[in] Source state of a state post to access.
* @return State post of @p src_state.
* @param source[in] Source state of a state post to access.
* @return State post of @p source.
*/
const StatePost& state_post(const State src_state) const {
if (src_state >= num_of_states()) {
const StatePost& state_post(const State source) const {
if (source >= num_of_states()) {
return empty_state_post;
}
return state_posts_[src_state];
return state_posts_[source];
}

/**
* @brief Get constant reference to the state post of @p src_state.
* @brief Get constant reference to the state post of @p source.
*
* If we try to access a state post of a @p src_state which is present in the automaton as an initial/final state,
* If we try to access a state post of a @p source which is present in the automaton as an initial/final state,
* yet does not have allocated space in @c Delta, an @c empty_post is returned. Hence, the function has no side
* effects (no allocation is performed; iterators remain valid).
* @param state_from[in] Source state of a state post to access.
* @return State post of @p src_state.
* @param source[in] Source state of a state post to access.
* @return State post of @p source.
*/
const StatePost& operator[](State src_state) const { return state_post(src_state); }
const StatePost& operator[](const State source) const { return state_post(source); }

/**
* @brief Get mutable (non-constant) reference to the state post of @p src_state.
* @brief Get mutable (non-constant) reference to the state post of @p source.
*
* The function allows modifying the state post.
*
* BEWARE, IT HAS A SIDE EFFECT.
*
* If we try to access a state post of a @p src_state which is present in the automaton as an initial/final state,
* yet does not have allocated space in @c Delta, a new state post for @p src_state will be allocated along with
* If we try to access a state post of a @p source which is present in the automaton as an initial/final state,
* yet does not have allocated space in @c Delta, a new state post for @p source will be allocated along with
* all state posts for all previous states. This in turn may cause that the entire post data structure is
* re-allocated. Iterators to @c Delta will get invalidated.
* Use the constant 'state_post()' is possible. Or, to prevent the side effect from causing issues, one might want
* to make sure that posts of all states in the automaton are allocated, e.g., write an NFA method that allocate
* @c Delta for all states of the NFA.
* @param state_from[in] Source state of a state post to access.
* @return State post of @p src_state.
* @param source[in] Source state of a state post to access.
* @return State post of @p source.
*/
StatePost& mutable_state_post(State src_state);
StatePost& mutable_state_post(State source);

void defragment(const BoolVector& is_staying, const std::vector<State>& renaming);

Expand Down Expand Up @@ -383,15 +383,15 @@ public:
*/
size_t num_of_transitions() const;

void add(State state_from, Symbol symbol, State state_to);
void add(State source, Symbol symbol, State target);
void add(const Transition& trans) { add(trans.source, trans.symbol, trans.target); }
void remove(State src, Symbol symb, State tgt);
void remove(const Transition& trans) { remove(trans.source, trans.symbol, trans.target); }
void remove(State source, Symbol symbol, State target);
void remove(const Transition& transition) { remove(transition.source, transition.symbol, transition.target); }

/**
* Check whether @c Delta contains a passed transition.
*/
bool contains(State src, Symbol symb, State tgt) const;
bool contains(State source, Symbol symbol, State target) const;
/**
* Check whether @c Delta contains a transition passed as a triple.
*/
Expand Down Expand Up @@ -429,11 +429,11 @@ public:
/**
* @brief Add transitions to multiple destinations
*
* @param state_from From
* @param source From
* @param symbol Symbol
* @param states Set of states to
* @param targets Set of states to
*/
void add(const State state_from, const Symbol symbol, const StateSet& states);
void add(const State source, const Symbol symbol, const StateSet& targets);

using const_iterator = std::vector<StatePost>::const_iterator;
const_iterator cbegin() const { return state_posts_.cbegin(); }
Expand Down Expand Up @@ -498,7 +498,7 @@ public:
* @brief Get the maximum non-epsilon used symbol.
*/
Symbol get_max_symbol() const;
private:
protected:
std::vector<StatePost> state_posts_;
}; // class Delta.

Expand Down
44 changes: 34 additions & 10 deletions include/mata/nfa/nfa.hh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <utility>
#include <vector>
#include <queue>
#include <optional>

#include "mata/alphabet.hh"
#include "mata/parser/parser.hh"
Expand All @@ -44,9 +45,9 @@
namespace mata::nfa {

/**
* A struct representing an NFA.
* A class representing an NFA.
*/
struct Nfa {
class Nfa {
public:
/**
* @brief For state q, delta[q] keeps the list of transitions ordered by symbols.
Expand Down Expand Up @@ -77,9 +78,9 @@ public:
*
* @param[in] num_of_states Number of states for which to preallocate Delta.
*/
explicit Nfa(const unsigned long num_of_states, StateSet initial_states = {},
StateSet final_states = {}, Alphabet* alphabet = nullptr)
: delta(num_of_states), initial(initial_states), final(final_states), alphabet(alphabet) {}
explicit Nfa(const size_t num_of_states, utils::SparseSet<State> initial_states = {},
utils::SparseSet<State> final_states = {}, Alphabet* alphabet = nullptr)
: delta(num_of_states), initial(std::move(initial_states)), final(std::move(final_states)), alphabet(alphabet) {}

/**
* @brief Construct a new explicit NFA from other NFA.
Expand All @@ -105,6 +106,27 @@ public:
*/
State add_state(State state);

/**
* Inserts a @p word into the NFA from a source state @p source to a target state @p target.
* Creates new states along the path of the @p word.
*
* @param source The source state where the word begins. It must already be a part of the automaton.
* @param word The nonempty word to be inserted into the NFA.
* @param target The target state where the word ends.
* @return The state @p target where the inserted @p word ends.
*/
State insert_word(State source, const Word& word, State target);

/**
* Inserts a @p word into the NFA from a source state @p source to a new target state.
* Creates new states along the path of the @p word.
*
* @param source The source state where the word begins. It must already be a part of the automaton.
* @param word The nonempty word to be inserted into the NFA.
* @return The newly created target state where the inserted @p word ends.
*/
State insert_word(State source, const Word& word);

/**
* @brief Get the current number of states in the whole automaton.
*
Expand Down Expand Up @@ -280,14 +302,16 @@ public:
/**
* @brief Prints the automaton in DOT format
*
* @param[in] ascii Whether to use ASCII characters for the output.
* @return automaton in DOT format
*/
std::string print_to_dot() const;
std::string print_to_dot(const bool ascii = false) const;
/**
* @brief Prints the automaton to the output stream in DOT format
*
* @param[in] ascii Whether to use ASCII characters for the output.
*/
void print_to_dot(std::ostream &output) const;

void print_to_dot(std::ostream &output, const bool ascii = false) const;
/**
* @brief Prints the automaton to the file in DOT format
* @param filename Name of the file to print the automaton to
Expand Down Expand Up @@ -404,7 +428,7 @@ public:
* you can get all words by calling
* get_words(aut.num_of_states())
*/
std::set<Word> get_words(unsigned max_length) const;
std::set<Word> get_words(size_t max_length) const;

/**
* @brief Get any arbitrary accepted word in the language of the automaton.
Expand Down Expand Up @@ -471,7 +495,7 @@ public:
* @pre @c this is a deterministic automaton.
*/
Nfa& complement_deterministic(const mata::utils::OrdVector<Symbol>& symbols, std::optional<State> sink_state = std::nullopt);
}; // struct Nfa.
}; // class Nfa.

// Allow variadic number of arguments of the same type.
//
Expand Down
Loading

0 comments on commit c277508

Please sign in to comment.