-
Notifications
You must be signed in to change notification settings - Fork 38
Add minimization algorithm for Acyclic FST #49
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I left some comments, please make the changes and then I will review again.
Also:
- Add tests
- Add a benchmark so we can tell how changes effect times.
@@ -151,5 +151,8 @@ Graph viterbiScore(const Graph& g); | |||
*/ | |||
Graph viterbiPath(const Graph& g); | |||
|
|||
/** Minimize an Acyclic FST */ | |||
Graph minimizeAcyclicFST(const Graph& g); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just call this minimize
. Ideally we would throw an error if a cycle is detected in the graph.
|
||
|
||
//Initialization | ||
//a. Find all states with no outgoing arcs. (Since we are dealing with an acyclic FST, it is always possible.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would do is rename the function to minimize
, and throw an error if this condition is not met (e.g. if we don't find any start or accept states with 0 outgoing nodes can assume that the graph is cyclic). In the future we can attempt to handle this case.
//a. Find all states with no outgoing arcs. (Since we are dealing with an acyclic FST, it is always possible.) | ||
//b. Split the resulting set into 4 sets according to their START and ACCEPT status. | ||
|
||
int nodeStartAccept = -1 , nodeStartNoAccept = -1, nodeNoStartAccept = -1, nodeNoStartNoAccept = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we assume in GTN that graphs are "trim" meaning there are no paths which lead to dangling nodes. You are welcome to make the same assumption, or alternatively if you find any node that is not accepting that has 0 outgoing arcs we should ignore it since it does nothing for the set of paths the graph allows and really shouldn't be part of a "minimal" graph.
Graph minimizeAcyclicFST(const Graph& g){ | ||
Graph graph; | ||
std::vector<int> oldToNew(g.numNodes(), -1); // a map between the nodes of g and the minimized graph. | ||
std::vector<int> oldProcessed; // store which nodes has been processed in the g graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make this a vector<bool> processed
(the name old is redundant). Make it the size of the old graph number of nodes and initialize every element to false
. Then to check if a node is already processed check processed[n]
.
if (std::all_of(g.out(predNode).begin(), g.out(predNode).end(), | ||
[&g, &oldProcessed](int a) {return std::count(oldProcessed.begin(), oldProcessed.end(), g.dstNode(a)) > 0;})){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change oldProcessed
to processed
as mentioned above and simplify this accordingly.
if ( std::equal(g.out(node1).begin(), g.out(node1).end(), g.out(node2).begin(), [&g, &oldToNew](int a1, int a2){ | ||
return (g.ilabel(a1) == g.ilabel(a2) && | ||
g.olabel(a1) == g.olabel(a2) && | ||
oldToNew[g.dstNode(a1)] == oldToNew[g.dstNode(a2)]);}) | ||
){ | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace this block with return std::equal(...
The implementation follow the Bottom-Up Algorithm from http://www.cs.jhu.edu/~hajic/courses/cs226/alg.html
Tests are coming soon.