CP-50475: parallelize device ops during VM lifecycle ops #6057

psafont · 2024-10-16T12:55:28Z

Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Also removes some unneeded parallel atoms, which cause some overhead as well as polluting the traces

This makes VM_starts to run about ~1 second faster on some tests where VMs have 5 VIFs. I'm testing it further, but it doesn't look like it's breaking any tests. I want to push it through some bootstorms and measure improvements

Please do suggest some further parallelization if you can find it!

BVT+BST are all green

Parallel atoms do quite a bit of unnecessary actions even when they are empty. They are also not needed when running a single task. They also show as spans in the traces. Removing them makes the traces shorter and easier to read. Co-authored-by: Edwin Török <[email protected]> Signed-off-by: Pau Ruiz Safont <[email protected]>

Operations on different devices should be independent and therefore can be parallelized. This both means parallelizing operations on different device types and on devices for the same type. An atom to serialize action has been introduced because the operation regarding a single device must be kept serialized. Signed-off-by: Pau Ruiz Safont <[email protected]>

robhoes · 2024-10-16T13:22:51Z

I have to look at the details of the PR, but the first thing I wondered is why we need to introduce a new Serial atomic operation, as clearly we are already able to serialise. The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic. Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

psafont · 2024-10-16T13:40:19Z

I introduced the serial atomic because it was convenient. At some point I also added a nop one, which removes the need to concatenate lists quite a bit, but I'd rather do the latter than the former, as it could end up in the traces.

The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic.

An option to avoid the serial atom would be to make the parallel one to receive a list of lists of atomics ops, instead of a list of atomic ops, although I'm not fond of the implicitness of it.

Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

I can try :)

psafont · 2024-10-17T08:52:23Z

Trying is a good exercise of yak-shaving: there's code to calculate progress of tasks. It's not clear to me that it works well currently, especially with regard to the parallel atomic op; and merging the functions needs to track progress into the Serial atomic op. The code needs unit-testing to be able to make it's not made even worse and the results are predictable.

The benefits of the current patch can be seen in the traces

Now the critical path doesn't contain the VIFs, and now it's dominated even more by the SM and the need to plug RW VBDs before RO ones

minglumlu · 2024-10-23T11:44:56Z

I thought there might be some dependencies between these operations. It's great to see that they actually can be performed in parallel like this. If the returned lists can be encoded with Parallel and Serial as well, as Rob suggested, the data structure would be an exact representation on the dependencies. I.E. the qemu and devices depending on it could be handled in parallel with other devices not-depending on qemu as well.

lindig · 2024-10-24T15:14:17Z

Can we merge this and it was only held back because we wanted to get some fixes in first?

psafont · 2024-10-25T09:47:16Z

This was held back because of the fixes as well as some performance tests being run. It's taking really long to get them scheduled :(

psafont · 2024-10-28T16:40:43Z

This can be merged whenever we feel like it, the performance tests won't be able to be run in weeks, so we will need to run benchmarks when this is merged to master

psafont and others added 2 commits October 16, 2024 13:54

psafont requested a review from robhoes October 16, 2024 12:58

robhoes approved these changes Oct 21, 2024

View reviewed changes

minglumlu approved these changes Oct 23, 2024

View reviewed changes

robhoes added this pull request to the merge queue Oct 29, 2024

Merged via the queue into xapi-project:master with commit 2ed1166 Oct 29, 2024
15 checks passed

psafont deleted the private/paus/parops branch October 29, 2024 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP-50475: parallelize device ops during VM lifecycle ops #6057

CP-50475: parallelize device ops during VM lifecycle ops #6057

psafont commented Oct 16, 2024 •

edited

Loading

robhoes commented Oct 16, 2024

psafont commented Oct 16, 2024

psafont commented Oct 17, 2024

minglumlu commented Oct 23, 2024

lindig commented Oct 24, 2024

psafont commented Oct 25, 2024

psafont commented Oct 28, 2024

CP-50475: parallelize device ops during VM lifecycle ops #6057

CP-50475: parallelize device ops during VM lifecycle ops #6057

Conversation

psafont commented Oct 16, 2024 • edited Loading

robhoes commented Oct 16, 2024

psafont commented Oct 16, 2024

psafont commented Oct 17, 2024

minglumlu commented Oct 23, 2024

lindig commented Oct 24, 2024

psafont commented Oct 25, 2024

psafont commented Oct 28, 2024

psafont commented Oct 16, 2024 •

edited

Loading