Skip to content

Latest commit

 

History

History
626 lines (447 loc) · 24 KB

FAQ.md

File metadata and controls

626 lines (447 loc) · 24 KB

% FAQ % Ivan Lazar Miljenovic

Fortuitously Anticipated Queries (FAQ)

Note that to distinguish it from Graphviz, the library shall be henceforth referred to as graphviz.

Graphviz vs graphviz

What is the difference between Graphviz and graphviz?

Graphviz is an open source library and collection of utility programs using that library to visualise graphs (which are specified using the Dot language).

graphviz is a library for the purely functional programming language Haskell that provides "bindings" to Graphviz's programs. It does so by allowing programmers to specify the layout of the graph and then converts that to Dot code before calling the appropriate program to perform the visualisation.

Why should I use graphviz over one of the other Haskell Graphviz libraries?

Various Haskell libraries have support for Graphviz to one extent or another; however graphviz has the most comprehensive support available out of all of them:

  • There are four different representations of Dot graphs:

     1. Canonical, which provides a clean separated definition of a
        Dot graph (that matches the former layout of `dot -Tcanon`).
     2. Generalised, which allows statements to be in any order.
     3. A graph-based one that allows manipulation of the Dot graph.
     4. A monadic interface for embedding relatively static graphs
        in Haskell.
    

    There are also conversion functions between them.

  • The ability to parse and generate most aspects of Dot syntax and attributes. This includes taking into account escaping and quoting rules where applicable.

  • The ability to use a custom node type for Dot graphs.

  • Support for all stated layout algorithm programs and all specified output formats as well as the ability to use custom programs, etc.

  • Functions to convert FGL graphs and other graph-like structures (albeit not as nicely) to and from the internal Dot representations. In future, this will be expanded to a much larger range of graph-like values once a suitable abstraction is available.

  • The ability to augment Dot and FGL graphs with positioning information by round-trip passing through Graphviz.

  • Pure Haskell implementations of dot -Tcanon and tred.

  • graphviz is continually being worked upon and expanded to better suit/match the requirements of Graphviz, to improve performance and to make it easier for the programmer to use.

Is the API of graphviz stable?

For the most part, yes: the only items that are likely to change in the future are those with bugs/errors or if a radically better way of doing things is found. For most uses, however, the API should not change for the foreseeable future.

Note that graphviz's version numbers follow the package versioning policy; this means that you can immediately tell when the API has had a backwards-incompatible change by comparing the first two elements of the version. However, these changes won't always affect most users.

What aspects of Dot syntax and attributes are covered?

It's easier to state which aspects of Dot syntax and attributes aren't covered:

Overall syntax items not covered

  • Cannot specify a sub-graph as an end point in an edge;

  • Comments, pre-processor lines and split lines are (currently) not supported within HTML-like labels.

  • graphviz only uses UTF-8 encoding for printing and parsing (whereas Graphviz allows Latin1 encoding with the charset attribute).

  • Graphviz is more liberal in accepting "invalid" values (e.g. accepting a floating-point value when only integer values are meant to be accepted); graphviz is more strict in this aspect (and will indeed throw an exception if it cannot parse something properly).

  • No extensions (e.g. postscript-specific attributes) are available.

Attribute and value items not covered

  • The global orientation attribute is not defined; however its behaviour is duplicated by the rotate attribute.

  • The deprecated overlap algorithms have not been defined, and the ability to specify an integer prefix for use with the fdp layout tool is not available.

  • The deprecated shapefile attribute is not available; instead, you should specify the file on the command line.

  • The deprecated z attribute is not available; use the optional third dimension for the pos attribute instead.

  • Only polygon-based shapes are available (i.e. no custom shapes as yet).

  • The charset attribute is not available as graphviz assumes that all Dot graphs will be in UTF-8 for simplicity; if Latin1-encoded graphs need to be parsed then you shall need to do all I/O for them by hand.

  • colorscheme attributes are parsed, but the behaviour is not quite the same: consider the following minimal Dot graph:

    digraph {
        a [ style=filled, fillcolor=gray, colorscheme=svg ]
    }
    

    Despite the fact that the color is specified before the colorscheme, Graphviz will use that colorscheme to parse the color (as an SVG gray differs from the X11 gray); graphviz, however, will use the default colorscheme of x11 to parse the color, and then set the colorscheme to be svg (despite it not being used after it is set).

Available items of note

There are a few items of note that are available that are worthy of special note (as they may not be immediately obvious from the generated documentation):

  • graphviz is able to parse (but not print) the following special aspects of specifying edges in Dot code:

    • The node:port method of specifying of head/tail portPos values.

    • Stating multiple edges with common interior nodes (e.g. a -> b -> c).

    • Stating edges with a grouping of nodes (e.g. a -> {b c}).

  • Sub-graphs are specified as being clusters when the subgraph name starts with either "cluster" or "cluster_"; note that this prefix is removed when determining the subraph's name for the internal datatypes.

  • Anonymous subgraphs (where not even the subgraph keyword is specified) are also parseable.

  • HTML-like and record labels are available, and feature proper escaping/unescaping when printing/parsing.

  • Other syntactic issues are taken care of for you automatically (such as escaping/unescaping quotation marks). Even newlines are automatically escaped (but not unescaped) for you, defaulting to centered lines.

Getting graphviz and more documentation

Where can I obtain graphviz?

The best place to get graphviz is from its HackageDB page.

Where can I find the API documentation for graphviz?

Also on its HackageDB page.

Is it safe to install and use graphviz from its git repository?

No; unlike other projects I make no guarantees as to the stability of the live version of graphviz. Whilst the git repository is usually stable, it's often in a state of flux and at times patches that break the repository are recorded (when it's simpler/cleaner to break one patch into several smaller patches).

How is graphviz licensed?

graphviz is licensed under a 3-Clause BSD License (note that the ColorBrewer Color Schemes found in Data.GraphViz.Attributes.Colors.Brewer are covered under their own license).

Simplistically, this means that you can do whatever you want with graphviz as long as you cite both myself and Matthew Sackman (the original author) as being the authors of graphviz. However, I would appreciate at least an email letting me know how graphviz is being used.

Where can I find more information on graphviz?

From its home page.

Are there any tutorials on how to use graphviz?

A basic tutorial on how to visualise graph-like data is available; more will come if people ask for it.

What other packages use graphviz?

This is a list of all known packages that use graphviz: if you know of any others please let me know and I'll add it to the list.

What is the history of graphviz?

graphviz was originally written by Matthew Sackman (if you want his reasons for doing so, you'll have to ask him yourself) with the first known release being on 10 July, 2008. In 2008 I (Ivan Miljenovic) needed a library that provided bindings to Graphviz with clustering support; at the time graphviz was the most fully featured and closest to what I wanted, so I submitted a patch that provided support for both clustering and undirected graphs.

In April 2009, Matthew wanted to step down from maintaining graphviz and asked if I wanted to take over. Since then the library has been almost completely re-written with greatly improved coverage of the Dot language and extra features. However, the original outline of the library still remains.

Using graphviz

Can I start using graphviz without knowing anything about Graphviz?

You can, but if you want to start doing anything more advanced then you should be reading Graphviz's documentation as well as graphviz's. This is because the layout and design of graphviz is heavily based upon the Dot language and the various attributes that Graphviz supports.

Can I just use graphviz without reading its documentation?

You should at least read the various messages about possible ambiguities, etc. at the top of each module and for the attributes you use before you use graphviz.

Do I need to have Graphviz installed to use graphviz?

Technically, no if you're only dealing with the Dot language aspects. However, usage of the functions in the Commands module, or the augmentation of pretty-printing functions in the GraphViz module do require Graphviz to be installed.

Why didn't you use FFI to bind to the Graphviz library?

Because I just kept working where Matthew Sackman left off and it was already using Graphviz's tools rather than the actual library. However, most other language bindings (for Python, Perl, etc.) seem to do the same: generate Dot code and pass that to the relevant tool.

This, however, does provide a fortunate side effect where the ability to print and parse Dot code means that graphviz can be used for more than just visualising graphs created solely in Haskell: it can also import pre-defined graphs, or else generate Dot code for use with other tools.

What's the difference between the different DotGraph types?

graphviz has four different "implementations" of Dot code:

Canonical:

: matches the (former) output of dot -Tcanon. Recommended for use when converting existing data into Dot (especially with the graphElemsToDot function in Data.GraphViz).

Generalised:

: most closely matches the layout of actual Dot code, as such this is preferred when parsing in arbitrary Dot graphs. Also useful in cases where you want to use the common Graphviz "hack" of specifying global attributes that don't apply to sub-graphs after the sub-graphs in question.

Graph:

: provides common graph operations on Dot graphs, based upon those found in the FGL.

Monadic:

: a nicer way of defining relatively static Dot graphs to embed within Haskell code, etc. Loosely based (with permission!) upon Andy Gill's dotgen library.

What's the best way to parse Dot code?

Use the parseDotGraph function (rather than the general parsing functions that are available) to parse your Dot code: this is will strip out comments and pre-processor lines and join together split lines (if any of these remain the parser will fail). Also, if you are not sure what the type of the nodes are, use either String or else the GraphID type as it explicitly caters for both Strings and numbers (whereas just assuming it being a String will result in numbers being stored internally as a String).

Unless you are very sure of the representation of the Dot code you have been provided, you should parse in any Dot code as the Generalised.DotGraph type. Afterwards you can use FromGeneralisedDot to convert to whichever representation you prefer.

There are too many attributes!!! Which ones should I use?

The Data.GraphViz.Attributes module contains a cut-down list of recommended and commonly used attributes.

The entire list of attributes can be found in Data.GraphViz.Attributes.Complete. In particular, the following attributes are not recommended for use:

  • Color for anything except edge colours or gradients for nodes, clusters and graphs when using Graphviz >= 2.29.0 (and if you must, the border colour for a node).

  • ColorScheme: just stick with X11 colours.

  • Comment: pretty useless. Enough said.

Can I use any attribute wherever I want?

No: attributes are all defined in one big datatype for the sake of simplicity, but not all attributes are valid in all places. Read the documentation (either for Graphviz or graphviz) to determine which is suitable where.

How can I use graphviz to visualise non-FGL graphs?

The graphElemsToDot function allows you to visualise any graph for which you can specify a list of labelled nodes and a list of labelled edges.

How can I use/process multiple graphs like Graphviz does?

At one stage, graphviz supported dealing with lists of DotGraphs; however, it was found to be faster to deal with each graph individually rather than try to get Graphviz to deal with them all in one go. In future, once the problem causing this has been tracked down and fixed this feature will be returned.

How can I use custom datatypes for node IDs?

The important thing here is to ensure that your custom datatype has defined instances of PrintDot and ParseDot. Probably the easiest way of doing this is to have functions that convert between your type and String or Text and let graphviz determine how to print and parse those. Here is an example of a more difficult type that should be printed like "1: Foo":

data MyType = MyType Int String

instance PrintDot MyType where
  unqtDot (MyType i s) = unqtDot i <> colon <+> unqtDot s

  -- We have a space in there, so we need quotes.
  toDot = doubleQuotes . unqtDot

instance ParseDot MyType where
  parseUnqt = MyType <$> parseUnqt
                     <*  character ':'
                     <*  whitespace1
                     <*> parseUnqt

  -- Has at least one space, so it will be quoted.
  parse = quotedParse parseUnqt

Things to note from this example:

  • Whilst PrintDot and ParseDot have default definitions for toDot and parse, they assume the datatype doesn't need quotes; as such if the value will need quoting, then you should do so explicitly.

  • It is better to use the PrintDot instances for common types such as Int and String rather than using the pretty-printers inbuilt conversion functions (int, text, etc.) to ensure that quotations, etc. are dealt with correctly.

  • Be as liberal as you can when parsing, especially with whitespace: when printing only one space is used, yet when parsing we use the whitespace1 parsing combinator that will parse all whitespace characters (but it must consume at least one; there is a variant that does not need to parse any).

When parsing Dot code, do I have to worry about the case?

Not at all: graphviz's parser is case-insensitive; however, the correct case is checked first so there is a slight degradation in performance when the wrong case is used.

How do I set portPos values for nodes in edges?

Graphviz allows you to specify edges such as from:a -> to:b where the nodes "from" and "to" are defined with either RecordLabel or Html.Label labels and have different sections; the edge is then drawn from the "a" section of the "from" node to the "b" section of the "to" node.

Whilst graphviz can parse this, you can't define this yourself; instead, do it the manual way:

DotEdge "from" "to" True [ TailPort (LabelledPort (PN "a") Nothing)
                         , HeadPort (LabelledPort (PN "b") Nothing)
                         ]

I realise that doing this manually isn't as convenient, but I am open to suggestions on how this can be improved.

Note where TailPort and HeadPort are used; the next question explains this.

Is there anything else I should know?

A few other things of note that you should know about:

  • For an edge a -> b, Graphviz terms "a" to be the tail node and "b" to be the head node.

  • When creating GraphID values for the graphs and sub-graphs, you should ensure that they won't clash with any of the nodeID values when printed to avoid possible problems.

  • It is a good idea to have unique IDs for sub-graphs to ensure that global attributes are applied only to items in that sub-graph and so that clusters aren't combined (it took me a long time to find out that this was the case).

  • You should specify an ID for the overall graph when outputting to a format such as SVG as it becomes the title of that image.

  • Graphviz allows a node to be "defined" twice with different attributes; in practice they are combined into one node. Running Dot code through dot -Tcanon before parsing removes this problem.

  • Several attributes are defined with taking a list of items; all of these assume that the provided lists are non-empty (sub-values are a different story).

  • If a particular Dot graph is not parseable, the parser throws an error rather than failing gracefully.

Design Decisions

Why does graphviz use Polyparse rather than Parsec?

Short answer: because graphviz was already using Polyparse when I started working on it (and I hadn't done any parsing before so I had no preference either way).

Longer answer: Polyparse has several advantages I feel over Parsec:

  • Simpler types.
  • Avoids the whole "but Parsec-3 is slower than Parsec-2" debate (with its associated baggage/problems).
  • Few inbuilt combinators: since there is no inbuilt character parsing combinator, there are no problems with graphviz using its own case-less one.
  • Easier backtracking

Why do you have four different representations of Dot graphs?

graphviz has four different representations of Dot graphs. Apart from the reasons given before, the canonical implementation was the original representation, whereas the generalised one was only introduced in the 2999.8.0.0 release and the other two in the 2999.12.0.0 release.

Note, however, that I was thinking of adding something like the generalised implementation back around the time of the 2999.0.0.0 release, yet people didn't like the idea.

The graph-based implementation was added solely so I could write an (as-yet finished) tutorial, and thought others might find it useful. The monadic implementation came about as an attempt to encourage more people to use graphviz rather than other libraries such as dotgen, and I thought a nicer way of writing Dot graphs might help (the initial plans involved complicated type-hackery to try and almost make it a DSL for actual Dot code; however it ended up being too complicated and unwieldy).

Why are only FGL graphs supported?

Love them or hate them, FGL currently provides the best graph datatype and library available for Haskell at this time. As such, if any one graph type had to be chosen to have conversion functions written for it then FGL is the best option. Furthermore, I needed FGL graph support (which is the much more important reason!).

Why are the version numbers so high?

To make sure the latest release has the highest version number: Matthew Sackman originally made releases with date-based versioning, but when I switched to using the package versioning policy I had to change this. I could have started with 2010.x.y.z or so, but at the time I had initial hopes of introducing compatibility with other graphs (not just FGL ones) soon and wanted to make that the 3000.0.0.0 release; however that has not yet come to pass.

Why do you use the American spelling of colour in graphviz?

Because that's how Graphviz spells it, and I was following upstream to avoid confusion.

Bugs, Feature Requests and Development

Do you have any future plans for graphviz?

Yes, I do! See the TODO file for more information.

Does graphviz have a test suite?

Yes, there is, using the in-built support for test suites in Cabal:

cabal install graphviz --enable-tests

Then run the graphviz-testsuite executable. This test suite uses QuickCheck to ensure that graphviz can parse the Dot code it generates (as well as a few other things). Note that it isn't perfect: there are no guarantees that the Dot graphs that are generated are indeed valid, and those more extensive tests are not yet available.

Furthermore, you can do more controlled testing to try and track down the source of a bug as the above flag will also expose several testing modules which give you access to the various tests used as well as the Arbitrary instances for use with QuickCheck.

For proper testing of real-life Dot code, there is also the TestParsing.hs script that comes in the graphviz tarball (but is not installed). Once you have graphviz installed you can just run this script, passing it any files containing Dot graphs you wish to test. It will attempt to parse each Dot graph as a Generalised.DotGraph, and then test to see if the canonicalised form is parseable as a DotGraph.

I've found a bug!

Oh-oh... please file a report at the GitHub repository to tell me the specifics of what you were doing (including the Dot file in question if it's a parsing problem) and I'll get right on it.

I have a feature request.

Is it in the TODO? If not, file an issue at the GitHub repository and I'll consider implementing it (depending on time and how well I think it will fit in the overall library).

I want to help out with developing graphviz.

Great! Whether you have a specific feature in mind or want to help clear the TODO list, please create a pull-request on the GitHub repository.

What is the purpose of the AttributeGenerator.hs file?

Graphviz has a large number of attributes. Rather than try to edit everything manually each time I want to change how I use the large Attribute datatype, the AttributeGenerator script generates the datatype, instances, etc. for me.