-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go through an intermediate CSV before doing graphs #26
Comments
What about storing all the intermediate data in a binary format? Pickle is the first that come in mind. Not the most elegant solution but it could abstract from having to save each individual combination of parameters for each particular run and have to write your custom csv syntax. You can see it as a sort of "snapshot" of the results in that particular run. |
In the first versions I used pickle actually. But as the format evolved I suffered from backward incompatible opening and had to re-execute tests. And I did not mention the problem of time series... How to store a dozen results over the duration of the experiment, at time intervals that are different for each experiment... |
Maybe the CSV is still the best way to handle this, with just weird format for weird use cases (multiple-results per run). And maybe one CSV file per time series (again, time series are not necessary in all experiments). |
JSON? I'm not a huge fan for cases like this but it could be a good tradeoff, allowing basic human readability but allowing nesting and objects-like syntax... |
In paper rush, it might be stressful to have to re-do tests or deactivate new variables to rebuild the graph (and dangerous, imagine BBR was set system-wide, then you're actually not in the same conditions).
So the idea would be to keep the cache idea as hidden as possible, and always export a CSV that will be used to create the graphs. The npf commands will continue to build graph automatically but a new npf-graph command would allow to rebuild the very same graph from the CSV.
The question left therefore would be what would be the appropriate CSV format. Knowing we have multiple output variables, and multiple runs per parameters, and also multiple series when using npf-compare.
Imagine we compare netperf and iperf, have one variable "ZEROCOPY" that can have values 0 and 1 and have two outputs results THROUGHPUT and LATENCY, and do 2 runs :
series,run_number,ZEROCOPY,THROUGHPUT,LATENCY
iperf,1,0,...
iperf,1,1,...
iperf,2,0,...
iperf,2,1,...
netperf,1,0...
netperf,1,1...
netperf,2,0...
netperf,2,1...
The problem is still that some "output" (results) can have multiple values in the same run. We could use another (a bit non standard) separator to have multiple results in a single column, eg using the "+" sign (using a ";" might lead to bad interpretation of CSV).
Any input on this?
The text was updated successfully, but these errors were encountered: