From ae774a1e39f701c0b7876301009b47f9cfbf60c9 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Thu, 11 Aug 2022 12:50:23 -0400 Subject: [PATCH 1/3] Scheduler and stealing work in progress --- src/posts/scheduler.md | 96 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 src/posts/scheduler.md diff --git a/src/posts/scheduler.md b/src/posts/scheduler.md new file mode 100644 index 00000000..60c6f890 --- /dev/null +++ b/src/posts/scheduler.md @@ -0,0 +1,96 @@ +# What does spawning mean? + +I want to explain how the OpenCilk compiler implements spawn, but +first I need to explain what spawn means. The `cilk_spawn` keyword is +fundamentally different from C's `pthread_create` and Java's +`Thread.start`. + +The old approach to multithreading made programs that depended on +threads to work. A program with a producer and consumer thread will +hang if one of the threads doesn't run. + +A correct Cilk program behaves the same whether it runs on one thread +or many. Usually you don't specify how many processors to use. By +default Cilk uses as many _workers_ (threads running user code) as +your system has processors. Then spawns enable parallelism. You can +also ask it to run single-threaded. Then spawns don't do anything, +but the program is still the same. + +## Spawning usually does nothing + +In most Cilk programs most spawns have no effect. + +The keyword `cilk_spawn` means the statement with the spawn is allowed +to run in parallel with the statements that follow, up until the next +`cilk_sync`. It doesn't have to run in parallel and usually it doesn't. + +A program we often use for testing and benchmarking computes a +fibonacci number with an inefficient exponential time algorithm. The +number of spawns is comparable to the number computed. For testing I +use an argument in the low 40s that takes about a second to run. The +program spawns about 100 million times. + +Probably less than 100 of those spawns result in added parallelism. +The rest, 99.9999% of the total, are ignored. In a highly parallel +Cilk program the number of scheduling events is related to the number +of processors. A perfect scheduler would have the numbers equal. +There is no perfect scheduler for general purpose programs. + +Because most spawns end up having no effect, we work hard to make them +cheap. Moving work to a new processor is allowed to be slower because +it is rare. We call this the _work-first principle_. + +(This is one reason we use our fibonacci number generator as a test. +Its performance is determined by how fast `cilk_spawn` runs and we +want that to be fast.) + +## The spawn deque + +When a function is spawned, i.e. a function call is prefixed with +`cilk_spawn`, the spawned function is called the _child_ and function +with the `cilk_spawn` keyword is called the _parent_. + +Each worker has a deque (double-ended queue) of functions that have +spawned. We call one end the _head_ and the other the _tail_. +Spawning pushes the *parent* onto the tail. Returning from a spawn pops +the parent off the tail. + +Usually that's all that happens. Functions get pushed, functions get +popped, and in the end push and pop cancel out. + +Once in a while a worker has nothing to do. It looks at other workers +and _steals_ a function by popping it off the *head* of the other +worker's deque. This _work stealing_ is how parts of the program move +between processors. + +## Only monsters steal children + +The thing that was popped off the head of the other worker's deque is +a function that is suspended at a function call. More specifically, +it is a data structure with enough information to resume the parent +function as if the spawned child had returned. The thief does this on +a new processor. + +The worker from which work was stolen, sometimes called the _victim_, +is so far oblivious. It continues doing whatever it was doing in the +child function. Cilk is designed so a processor keeps running serial +code as long as it can. Work first. + +Stealing parents and leaving children alone is what makes Cilk Cilk. +This scheduling policy avoids deadlocks and unnecessary migration of +work between processors. + +The parent function, running on a new processor, has a flag set +indicating that it has been stolen and it is not _synced_. It might +spawn again and be stolen again. In any case it will eventually +execute a `cilk_sync`. This triggers a call to the Cilk runtime which +suspends the function until all spawned children return. + +The spawned child will eventually return. When it returns the worker +tries to pop the tail of the deque. This fails: the deque is empty. +The parent is running on another processor. Execution can not +continue. The worker is now idle and can start stealing. + +(We have an optimization for the common case where the parent reached +a `cilk_sync` and is waiting for the spawn to complete.) + From 18c5037da83dffba3584b93016aa1ad4f942407f Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Mon, 15 Aug 2022 10:06:19 -0400 Subject: [PATCH 2/3] Scheduler and stealing work in progress --- src/posts/scheduler.md | 64 +++++++++++++++++++++++++----------------- 1 file changed, 38 insertions(+), 26 deletions(-) diff --git a/src/posts/scheduler.md b/src/posts/scheduler.md index 60c6f890..44afc353 100644 --- a/src/posts/scheduler.md +++ b/src/posts/scheduler.md @@ -3,18 +3,22 @@ I want to explain how the OpenCilk compiler implements spawn, but first I need to explain what spawn means. The `cilk_spawn` keyword is fundamentally different from C's `pthread_create` and Java's -`Thread.start`. +`Thread.start`. This distinction goes back to the origins of Cilk in +the 1990s and my description applies to the whole Cilk family of +languages. -The old approach to multithreading made programs that depended on -threads to work. A program with a producer and consumer thread will -hang if one of the threads doesn't run. +By making threads explicit in the programming model, the old approach +to multithreading made programs that depended on threads to work. A +program with a producer and consumer thread will hang if one of the +threads doesn't run. A correct Cilk program behaves the same whether it runs on one thread -or many. Usually you don't specify how many processors to use. By -default Cilk uses as many _workers_ (threads running user code) as -your system has processors. Then spawns enable parallelism. You can -also ask it to run single-threaded. Then spawns don't do anything, -but the program is still the same. +or many. You do not have to create threads and it is poor style to +examine thread state. By default Cilk uses as many _workers_ (threads +running user code) as your system has processors. Spawns tell the +system that part of the program can be moved to these workers. You +can also ask it to run single-threaded. Then spawns don't do +anything, but the program is still the same. ## Spawning usually does nothing @@ -50,26 +54,27 @@ When a function is spawned, i.e. a function call is prefixed with `cilk_spawn`, the spawned function is called the _child_ and function with the `cilk_spawn` keyword is called the _parent_. -Each worker has a deque (double-ended queue) of functions that have -spawned. We call one end the _head_ and the other the _tail_. -Spawning pushes the *parent* onto the tail. Returning from a spawn pops -the parent off the tail. +Each worker has a deque (double-ended queue) of parent functions, +functions that have spawned. We call one end the _head_ and the other +the _tail_. Spawning pushes the *parent* onto the tail. Returning +from a spawn pops the parent off the tail. Usually that's all that happens. Functions get pushed, functions get popped, and in the end push and pop cancel out. -Once in a while a worker has nothing to do. It looks at other workers -and _steals_ a function by popping it off the *head* of the other -worker's deque. This _work stealing_ is how parts of the program move -between processors. +Sometimes, especially at the start of a parallel region of code, a +worker has nothing to do. An idle worker _steals_ a function from a +busy worker by popping it off the *head* of the other worker's deque. +This _work stealing_ is how parts of the program move between +processors. ## Only monsters steal children -The thing that was popped off the head of the other worker's deque is -a function that is suspended at a function call. More specifically, -it is a data structure with enough information to resume the parent -function as if the spawned child had returned. The thief does this on -a new processor. +The thing that was popped off the head of the other worker's deque +describes a function that is suspended at a function call. More +specifically, it is a data structure with enough information to resume +the parent function as if the spawned child had returned. The thief +does this on a new processor. The worker from which work was stolen, sometimes called the _victim_, is so far oblivious. It continues doing whatever it was doing in the @@ -81,10 +86,12 @@ This scheduling policy avoids deadlocks and unnecessary migration of work between processors. The parent function, running on a new processor, has a flag set -indicating that it has been stolen and it is not _synced_. It might -spawn again and be stolen again. In any case it will eventually -execute a `cilk_sync`. This triggers a call to the Cilk runtime which -suspends the function until all spawned children return. +indicating that it has been stolen. It might spawn again and be +stolen again. In any case it will eventually execute a `cilk_sync`. +If the function has never been stolen (the usual case) `cilk_sync` +does nothing. If the function has been stolen `cilk_sync` calls into +the Cilk runtime. The runtime suspends the function until all spawned +children return. The spawned child will eventually return. When it returns the worker tries to pop the tail of the deque. This fails: the deque is empty. @@ -94,3 +101,8 @@ continue. The worker is now idle and can start stealing. (We have an optimization for the common case where the parent reached a `cilk_sync` and is waiting for the spawn to complete.) +## Stay tuned + +Having described at a high level what `cilk_spawn` does, next time I +will describe what the compiler does to your code when you spawn. + From f4da25742915e3ba52499a30a1be0f635e13d0a0 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Mon, 15 Aug 2022 10:20:45 -0400 Subject: [PATCH 3/3] Add a header block --- src/posts/scheduler.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/posts/scheduler.md b/src/posts/scheduler.md index 44afc353..3fe37e1c 100644 --- a/src/posts/scheduler.md +++ b/src/posts/scheduler.md @@ -1,3 +1,8 @@ +--- +title: What does spawning mean? +author: John F. Carr +--- + # What does spawning mean? I want to explain how the OpenCilk compiler implements spawn, but