-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relative replication order #415
Comments
We discussed this in the WG meeting. Thanks for all the feedback! Summarizing some issues that came up: Why is the ordering a per-replica property instead of a per-group property?The per-replica property is more general, and allow us to have multicast groups such as
where the first and last replica are FIXED, but the three middle replicas are SHUFFLED. Thus the switch will always start by making a copy for the first replica, then it will process the next three replicas in random order, and then it will always finish by making a copy for the last replica. Does shuffling happen only among contiguous SHUFFLED replicas, or among all SHUFFLED replicas.Consider
Should the first and the last replica be shuffled or not? My proposal is that they should be shuffled. This is not based on a particular use case, but merely because it seems easier/more canonical from a mathematical perspective, in the following way: we can apply any permutations that has the FIXED replicas as fixed points. Maybe there are reasons to prefer only contiguous shuffling though? What is the semantics on hardware that can replicate in parallel?Parallel replication should always be legal -- there is no difference between F and S in the parallel setting. Does replication order even matter, given that there could also be delays due to buffering?It does matter in the following cases:
ReproducibilityWe should be cautious to introduce randomness, since it makes testing/reproducability harder.
Backward compatibilityTargets that don't support the new property should return an |
@jafingerhut do you have a sense if this generalizes reasonably well to other platforms? Any other thoughts on this? Thanks! |
Sorry I missed the 2023-Feb-17 P4 API work group meeting. I could have asked some of these questions live instead of in a comment. Your description of shuffled replicas is:
I am curious about your intended behavior of "subject to uniform shuffling". Do you mean that you REQUIRE that an implementation must pseudo-randomly pick one of the N! permutations of the N shuffled replicas in a group, and pick among those N! with equal probability? If so, why would you want to require such behavior? If that isn't what you mean by the shuffled replicas, would it be legal in your intended meaning for an implementation to simply always create replicas in the precise order given, regardless of what their I have seen several switch implementations that implemented a fixed predictable order for creating replicas. I haven't seen one that randomly picked replication orders before. That sounds like extra implementation work for a benefit that I do not yet understand. |
Oh, and another detail question to follow up: are you thinking that shuffling means that for each packet to be replicated in the data plane, each one might create copies in a different order from the other packets to be replicated using that multicast group? Or are you thinking that at the time you modify the replication list of a group, at that time the shuffling is performed, and then all copies made using that multicast group will have the same relative order from that point onwards, until and unless the multicast group's replication list is modified? |
Thanks for the the comments, @jafingerhut.
Correct. (Note that this is efficiently implementable, no need to represent the distribution over N! permutations explicitly.)
In short, to achieve fairness. I would like to ensure that "in expectation" (on average, across many input packets that get replicated via the same multicast group), no replica is advantaged over another in the sense of being processed first. I can make this more precise and formal if that would be helpful.
No -- this would be legal for
+1. To mitigate this, I suggest we make supporting this optional.
Yes, exactly. We likely have to weaken this a bit, i.e. instead of this shuffling happening for every single packet, it may be sufficient for it to happen every ~ second or so. |
I have seen switch implementations where doing an occasional reordering of the replication list can lead to a transient effect where copies going to the same (output port, replication id) might get reordered briefly, e.g. packets P1 and P2 arriving at the switch might leave the switch to one or more (output port, replication id) destinations in the order P2 followed by P1, due to internal implementation details of the multicast replication. If that is acceptable to you, then the idea of having the shuffling happening every second or so sounds like something that could fairly straightforwardly be implemented in device driver software, rather than in the hardware itself. It could even be implemented in the controller software, albeit with a potentially high rate of P4Runtime API messages required to achieve it. |
Interesting point about reordering, I will double check about that one.
Yes, in fact that is what we have in mind.
That is also something we have thought about, but we don't love the amount of controller-visible churn this would cause. |
Just a heads up that this may no longer be required on our end, so unless someone else would like to pursue this at this time, we can put the discussion on hold for now. |
PSA/P4Runtime has a "packet replication engine" (PRE), e.g. for cloning and multicast.
I am interested in mutlicast cases where fairness matters. In hardware, packet replication for multicast often happens sequentially, meaning that multicast replicas get made one by one, and potentially egress the switch one-by-one. These relative delays don't matter in most uses cases, but if you have a very large multicast tree (i.e. many levels of switches replicating the packets to their childreen in the tree) and really care about fairness, they add up.
With that motivation in mind, I am proposing adding a new
relative_replication_order
field to theReplica
message, as follows:With this, the switch control plane would regularly permute the multicast group members.
Permutations would be chosen uniformly at random from the set of permutations that have all
FIXED
replicas as fixed points.The switch should preserve read-write symmetry, i.e. the reordering would be transparent to the controller and wouldn't be visible by reading back the multicast group.
I would like to discuss with the community if this seems like a reasonable/general enough feature, or if people feel it's too exotic to be included in the standard. I imagine this functionality would be optional, i.e. switches would not have to support anything but the
UNSPECIFIED
replication order.@jonathan-dilorenzo @kheradmandG for visibility.
The text was updated successfully, but these errors were encountered: