You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What I understand is that on nvidia GPUs, blocks are scheduled on SMs (CUs) using a round-robin policy, so the blocks in the kernel should be interleaved on CUs, not just on a few CUs as in the figure for a kernel's blocks
For the “dispatch delay” described in the first case in the figure, what I wonder is why can't blocks wait for idle CUs?
I would be grateful if you could reply!
The text was updated successfully, but these errors were encountered:
Thanks for your interest in our work. The example in this figure is indeed a little bit confusing.
From a high-level, we want to argue that using multiple asynchronous streams will cause an synchronization problem through Fig.8.
It is true that each SM/CU can execute multiple blocks concurrently. But we have an assumption in the example: each CU can hold multiple blocks if there are enough resources (e.g., registers and shared memory), but execute them sequentially. This assumption is just to make it easier to draw pictures.
Therefore, when dispatching the 3rd RT kernel (which has 2 blocks) at the red line, the first block is assigned to an idle CU (i.e. CU1). While, when assigning the second block, all of the all of the 4 CUs are busy (CU1 for the 1st blue block, CU2 for the red block, CU3 and CU4 for the green blocks).
Now, the example makes another assumption: the red and green blocks consume a lot of compute resources, and there is no space for another blue block in the last three CUs. So, the second blue block is assigned to CU1.
What I understand is that on nvidia GPUs, blocks are scheduled on SMs (CUs) using a round-robin policy, so the blocks in the kernel should be interleaved on CUs, not just on a few CUs as in the figure for a kernel's blocks
For the “dispatch delay” described in the first case in the figure, what I wonder is why can't blocks wait for idle CUs?
I would be grateful if you could reply!
The text was updated successfully, but these errors were encountered: