Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go through an intermediate CSV before doing graphs #26

Open
tbarbette opened this issue Feb 9, 2022 · 4 comments
Open

Go through an intermediate CSV before doing graphs #26

tbarbette opened this issue Feb 9, 2022 · 4 comments

Comments

@tbarbette
Copy link
Owner

tbarbette commented Feb 9, 2022

  • People get confused about the cache, the graphs and CSV.
  • When you create a figure, then move to other experiments and add other variables, you can't re-do the first figure without "removing" the new variables, because the cache doesn't know what variables were added. E.g. if you do a graph about TCP stuffs, then decide to try with CONGESTION={vegas,cubic,bbr}, you'll have to re-run old tests because before that point, the congestion control used was whatever the default on the system.
    In paper rush, it might be stressful to have to re-do tests or deactivate new variables to rebuild the graph (and dangerous, imagine BBR was set system-wide, then you're actually not in the same conditions).

So the idea would be to keep the cache idea as hidden as possible, and always export a CSV that will be used to create the graphs. The npf commands will continue to build graph automatically but a new npf-graph command would allow to rebuild the very same graph from the CSV.

The question left therefore would be what would be the appropriate CSV format. Knowing we have multiple output variables, and multiple runs per parameters, and also multiple series when using npf-compare.

Imagine we compare netperf and iperf, have one variable "ZEROCOPY" that can have values 0 and 1 and have two outputs results THROUGHPUT and LATENCY, and do 2 runs :

series,run_number,ZEROCOPY,THROUGHPUT,LATENCY
iperf,1,0,...
iperf,1,1,...
iperf,2,0,...
iperf,2,1,...
netperf,1,0...
netperf,1,1...
netperf,2,0...
netperf,2,1...
The problem is still that some "output" (results) can have multiple values in the same run. We could use another (a bit non standard) separator to have multiple results in a single column, eg using the "+" sign (using a ";" might lead to bad interpretation of CSV).

Any input on this?

@MassimoGirondi
Copy link
Contributor

MassimoGirondi commented Feb 16, 2022

What about storing all the intermediate data in a binary format? Pickle is the first that come in mind.

Not the most elegant solution but it could abstract from having to save each individual combination of parameters for each particular run and have to write your custom csv syntax.
Then it's a matter of separating the testing and graphing parts, invoking each before or after the (de)serializer when you want to do the graphing or only export the results.

You can see it as a sort of "snapshot" of the results in that particular run.

@tbarbette
Copy link
Owner Author

tbarbette commented Feb 16, 2022

In the first versions I used pickle actually. But as the format evolved I suffered from backward incompatible opening and had to re-execute tests.
The advantage of a kind of CSV is that it's human readable. But yes, it begins to be complex with multiple results.

And I did not mention the problem of time series... How to store a dozen results over the duration of the experiment, at time intervals that are different for each experiment...

@tbarbette
Copy link
Owner Author

Maybe the CSV is still the best way to handle this, with just weird format for weird use cases (multiple-results per run). And maybe one CSV file per time series (again, time series are not necessary in all experiments).

@MassimoGirondi
Copy link
Contributor

MassimoGirondi commented Feb 16, 2022

JSON? I'm not a huge fan for cases like this but it could be a good tradeoff, allowing basic human readability but allowing nesting and objects-like syntax...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants