Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-50475: parallelize device ops during VM lifecycle ops #6057

Merged
merged 2 commits into from
Oct 29, 2024

Conversation

psafont
Copy link
Member

@psafont psafont commented Oct 16, 2024

Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Also removes some unneeded parallel atoms, which cause some overhead as well as polluting the traces

This makes VM_starts to run about ~1 second faster on some tests where VMs have 5 VIFs. I'm testing it further, but it doesn't look like it's breaking any tests. I want to push it through some bootstorms and measure improvements

Please do suggest some further parallelization if you can find it!

BVT+BST are all green

psafont and others added 2 commits October 16, 2024 13:54
Parallel atoms do quite a bit of unnecessary actions even when they are empty.
They are also not needed when running a single task.

They also show as spans in the traces. Removing them makes the traces shorter
and easier to read.

Co-authored-by: Edwin Török <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Signed-off-by: Pau Ruiz Safont <[email protected]>
@psafont psafont requested a review from robhoes October 16, 2024 12:58
@robhoes
Copy link
Member

robhoes commented Oct 16, 2024

I have to look at the details of the PR, but the first thing I wondered is why we need to introduce a new Serial atomic operation, as clearly we are already able to serialise. The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic. Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

@psafont
Copy link
Member Author

psafont commented Oct 16, 2024

I introduced the serial atomic because it was convenient. At some point I also added a nop one, which removes the need to concatenate lists quite a bit, but I'd rather do the latter than the former, as it could end up in the traces.

The list returned by atomics_of_operation is implicitly serial and executed as such by perform_atomics. So now we have two ways of expression serial (atomic) operations.

I can see that nesting is easier with an explicit Serial atomic.

An option to avoid the serial atom would be to make the parallel one to receive a list of lists of atomics ops, instead of a list of atomic ops, although I'm not fond of the implicitness of it.

Should we then as a next step make all serialisation explicit by turning atomics_of_operation into atomic_of_operation and wrapping all lists with Serial? Then we can merge perform_atomics and perform_atomics as well.

I can try :)

@psafont
Copy link
Member Author

psafont commented Oct 17, 2024

Trying is a good exercise of yak-shaving: there's code to calculate progress of tasks. It's not clear to me that it works well currently, especially with regard to the parallel atomic op; and merging the functions needs to track progress into the Serial atomic op. The code needs unit-testing to be able to make it's not made even worse and the results are predictable.

The benefits of the current patch can be seen in the traces
image
image

Now the critical path doesn't contain the VIFs, and now it's dominated even more by the SM and the need to plug RW VBDs before RO ones

@minglumlu
Copy link
Member

I thought there might be some dependencies between these operations. It's great to see that they actually can be performed in parallel like this. If the returned lists can be encoded with Parallel and Serial as well, as Rob suggested, the data structure would be an exact representation on the dependencies. I.E. the qemu and devices depending on it could be handled in parallel with other devices not-depending on qemu as well.

@lindig
Copy link
Contributor

lindig commented Oct 24, 2024

Can we merge this and it was only held back because we wanted to get some fixes in first?

@psafont
Copy link
Member Author

psafont commented Oct 25, 2024

This was held back because of the fixes as well as some performance tests being run. It's taking really long to get them scheduled :(

@psafont
Copy link
Member Author

psafont commented Oct 28, 2024

This can be merged whenever we feel like it, the performance tests won't be able to be run in weeks, so we will need to run benchmarks when this is merged to master

@robhoes robhoes added this pull request to the merge queue Oct 29, 2024
Merged via the queue into xapi-project:master with commit 2ed1166 Oct 29, 2024
15 checks passed
@psafont psafont deleted the private/paus/parops branch October 29, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants