Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Adds the function
debugMultiTokenize()
similar to the previously existingdebugTokenize()
, but with support for multi-tokenization. The function generates a graph in DOT format.Details
Each tokenization corresponds to a path in the graph. We assign a color to each such path and color the edges accordingly. If an edge is included in more than one path it will have more than one color.
This feature also adds a legend to the graph to show which path corresponds to which color.
Screenshots
Possible Issues
There are a few issues that I would be happy to get opinions on.
Colors
The colors are generated by selecting equidistant angles in the HSB color model, starting from the green color which was previously used in the
debugTokenize()
function.Pros
Cons
Legend
As far as I know DOT does not have a simple way to make legends. The one being used right now is made as a custom subgraph cluster. By letting DOT handle positions and lengths of edges, I think the legend ends up being a bit unnecessarily wide. Maybe there is a better way to create it.
The legend is placed in the bottom left, which might not be ideal.