Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning at a high level #136

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions src/posts/scheduler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
title: What does spawning mean?
author: John F. Carr
---

# What does spawning mean?

I want to explain how the OpenCilk compiler implements spawn, but
first I need to explain what spawn means. The `cilk_spawn` keyword is
fundamentally different from C's `pthread_create` and Java's
`Thread.start`. This distinction goes back to the origins of Cilk in
the 1990s and my description applies to the whole Cilk family of
languages.

By making threads explicit in the programming model, the old approach
to multithreading made programs that depended on threads to work. A
program with a producer and consumer thread will hang if one of the
threads doesn't run.

A correct Cilk program behaves the same whether it runs on one thread
or many. You do not have to create threads and it is poor style to
examine thread state. By default Cilk uses as many _workers_ (threads
running user code) as your system has processors. Spawns tell the
system that part of the program can be moved to these workers. You
can also ask it to run single-threaded. Then spawns don't do
anything, but the program is still the same.

## Spawning usually does nothing

In most Cilk programs most spawns have no effect.

The keyword `cilk_spawn` means the statement with the spawn is allowed
to run in parallel with the statements that follow, up until the next
`cilk_sync`. It doesn't have to run in parallel and usually it doesn't.

A program we often use for testing and benchmarking computes a
fibonacci number with an inefficient exponential time algorithm. The
number of spawns is comparable to the number computed. For testing I
use an argument in the low 40s that takes about a second to run. The
program spawns about 100 million times.

Probably less than 100 of those spawns result in added parallelism.
The rest, 99.9999% of the total, are ignored. In a highly parallel
Cilk program the number of scheduling events is related to the number
of processors. A perfect scheduler would have the numbers equal.
There is no perfect scheduler for general purpose programs.

Because most spawns end up having no effect, we work hard to make them
cheap. Moving work to a new processor is allowed to be slower because
it is rare. We call this the _work-first principle_.

(This is one reason we use our fibonacci number generator as a test.
Its performance is determined by how fast `cilk_spawn` runs and we
want that to be fast.)

## The spawn deque

When a function is spawned, i.e. a function call is prefixed with
`cilk_spawn`, the spawned function is called the _child_ and function
with the `cilk_spawn` keyword is called the _parent_.

Each worker has a deque (double-ended queue) of parent functions,
functions that have spawned. We call one end the _head_ and the other
the _tail_. Spawning pushes the *parent* onto the tail. Returning
from a spawn pops the parent off the tail.

Usually that's all that happens. Functions get pushed, functions get
popped, and in the end push and pop cancel out.

Sometimes, especially at the start of a parallel region of code, a
worker has nothing to do. An idle worker _steals_ a function from a
busy worker by popping it off the *head* of the other worker's deque.
This _work stealing_ is how parts of the program move between
processors.

## Only monsters steal children

The thing that was popped off the head of the other worker's deque
describes a function that is suspended at a function call. More
specifically, it is a data structure with enough information to resume
the parent function as if the spawned child had returned. The thief
does this on a new processor.

The worker from which work was stolen, sometimes called the _victim_,
is so far oblivious. It continues doing whatever it was doing in the
child function. Cilk is designed so a processor keeps running serial
code as long as it can. Work first.

Stealing parents and leaving children alone is what makes Cilk Cilk.
This scheduling policy avoids deadlocks and unnecessary migration of
work between processors.

The parent function, running on a new processor, has a flag set
indicating that it has been stolen. It might spawn again and be
stolen again. In any case it will eventually execute a `cilk_sync`.
If the function has never been stolen (the usual case) `cilk_sync`
does nothing. If the function has been stolen `cilk_sync` calls into
the Cilk runtime. The runtime suspends the function until all spawned
children return.

The spawned child will eventually return. When it returns the worker
tries to pop the tail of the deque. This fails: the deque is empty.
The parent is running on another processor. Execution can not
continue. The worker is now idle and can start stealing.

(We have an optimization for the common case where the parent reached
a `cilk_sync` and is waiting for the spawn to complete.)

## Stay tuned

Having described at a high level what `cilk_spawn` does, next time I
will describe what the compiler does to your code when you spawn.