Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with the description of block scheduling in the paper #4

Open
five12 opened this issue Oct 12, 2022 · 1 comment
Open

Problems with the description of block scheduling in the paper #4

five12 opened this issue Oct 12, 2022 · 1 comment

Comments

@five12
Copy link

five12 commented Oct 12, 2022

  1. What I understand is that on nvidia GPUs, blocks are scheduled on SMs (CUs) using a round-robin policy, so the blocks in the kernel should be interleaved on CUs, not just on a few CUs as in the figure for a kernel's blocks

  2. For the “dispatch delay” described in the first case in the figure, what I wonder is why can't blocks wait for idle CUs?

image
I would be grateful if you could reply!

@francis0407
Copy link
Contributor

Thanks for your interest in our work. The example in this figure is indeed a little bit confusing.

From a high-level, we want to argue that using multiple asynchronous streams will cause an synchronization problem through Fig.8.

It is true that each SM/CU can execute multiple blocks concurrently. But we have an assumption in the example: each CU can hold multiple blocks if there are enough resources (e.g., registers and shared memory), but execute them sequentially. This assumption is just to make it easier to draw pictures.

Therefore, when dispatching the 3rd RT kernel (which has 2 blocks) at the red line, the first block is assigned to an idle CU (i.e. CU1). While, when assigning the second block, all of the all of the 4 CUs are busy (CU1 for the 1st blue block, CU2 for the red block, CU3 and CU4 for the green blocks).

Now, the example makes another assumption: the red and green blocks consume a lot of compute resources, and there is no space for another blue block in the last three CUs. So, the second blue block is assigned to CU1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants