From a81059323019fd6eb5acf14256e07adaf98b0708 Mon Sep 17 00:00:00 2001 From: behoppe Date: Mon, 3 Oct 2022 13:44:41 +0000 Subject: [PATCH 1/9] =?UTF-8?q?Create=20Reference=20=E2=80=9Copencilk-lang?= =?UTF-8?q?uage-reference=E2=80=9D?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../reference/opencilk-language-reference.md | 384 ++++++++++++++++++ 1 file changed, 384 insertions(+) create mode 100644 src/doc/reference/opencilk-language-reference.md diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md new file mode 100644 index 00000000..348931e6 --- /dev/null +++ b/src/doc/reference/opencilk-language-reference.md @@ -0,0 +1,384 @@ +--- +layout: layouts/page.njk +title: OpenCilk language reference +date: 2022-10-03T13:42:05.188Z +eleventyNavigation: + key: Language specification +author: John Carr +attribution: false +--- + +OpenCilk is an extension to the C and C++ programming language adding +support for {% defn "task-parallel" %} programming. It uses a +modified version of [clang](https://clang.llvm.org) (the C compiler +from the LLVM project) and a user-mode work-stealing scheduler. At +the source level, OpenCilk has five additional keywords compared to C: + +* `cilk_spawn` +* `cilk_sync` +* `cilk_scope` +* `cilk_for` +* `cilk_reducer` + +This document describes the syntax and semantics of OpenCilk +constructs. It is not meant to be an introduction or tutorial. + +Informally, `cilk_spawn` marks a point where the program can be forked +into two parts running on different processors and `cilk_sync` marks a +point where those forks must be joined. Forking is permissive and +joining is mandatory. This is a fundamentally different model +compared to older forms of parallelism using C's `pthread_create` and +Java's `Thread.start`. These functions encourage writing programs +that do not work without multithreading. + +The statements executed in a task parallel program form a {% defn +"parallel trace", "directed acyclic graph" %} (DAG). A spawn node has +one incoming edge and two outgoing edges. A sync node has one +outgoing edge. Two statements are said to be logically parallel if +neither precedes the other in DAG order. Whether they actually run +in parallel (at the same time) depends on scheduling. + +```cilkc + int x = cilk_spawn f(); // the body of f()... + int y = g(); // ...runs in parallel with the body of g() + cilk_sync; // wait for f() to complete + return x + y; // the return is in series after what comes before +``` + +## Grammar + +This section describes how the syntax of an OpenCilk program +differs from a C or C++ program. + +All OpenCilk keywords and runtime functions require that the header +`` be included. The header `` can be +included to disable Cilk while still allowing the keywords. If +neither header is included the Cilk keywords are treated as ordinary +identifiers. + +### Spawn + +A statement using `cilk_spawn` is the start of a potentially parallel +region of code. + +The `cilk_spawn` keyword should appear at the start of a statement or +after the `=` sign of an assignment (or `+=`, `-=`, etc.). + +```cilkc +int x = cilk_spawn f(0); +cilk_spawn y = f(1); +cilk_spawn { z = f(2); } +``` + +Although the compiler accepts spawns inside of expressions, they are +unlikely to have the expected semantics. A future version of the +language may explicitly limit `cilk_spawn` to the contexts above, +at or near the top of the parse tree of a statement. + +### Sync + +A sync statement, `cilk_sync;`, ends a region of potentially parallel +execution. It takes no arguments. + +```cilkc +if (time_to_sync) + cilk_sync; +``` + +### Scope + +The keyword `cilk_scope` is followed by a statement, normally a +compound statement. Any spawns within the statement are synced before +exit from the statment. Syncs within the statement, including the +implicit sync before exit, do not wait for spawns outside the +statement. + +```cilkc +int w, x, y, z; +w = cilk_spawn f(0); +cilk_scope { + x = cilk_spawn f(1); + y = cilk_spawn f(2); + z = f(3); // no need to spawn the last statement in a cilk_scope +} +// unsafe to access w because the spawn has not been synced +// x, y, and z are usable here because of the implicit sync +``` + +### For + +A loop written using `cilk_for` executes each iteration of its body in +parallel. + +```cilkc +cilk_for (int i = 0; i < n; ++i) + sum += array1[i]; // sum needs to be a reducer +cilk_for (int i = 0; i < n; ++i) + array2[i] = f(i); +``` + +The syntax of a `cilk_for` statement is very similar to a C `for` +statement. It is followed by three expressions, the first of which +may declare variables. Unlike in C all three expressions are +mandatory. C++ range `for` constructs are not allowed. + +For the loop to be parallelized, several conditions must be met: + +* The first expression must declare a variable (the "loop variable"). +* The second expression must compare the loop variable using one of the + relational operators `<=`, `<`, `!=`, `>`, and `>=`. +* The value compared to must be [...] +* The third expression must modify the loop variable using `++`, + `--`, `+=`, or `-=`. + +`Break` may not be used to exit the body of a `cilk_for` loop. + +[what about exceptions thrown from the loop body?] + +#### Grain size + +The compiler often recognizes that the overhead of allowing parallel +execution can exceed the benefit. If the body of a loop does little +work the compiler will arrange for groups of consecutive iterations to +run sequentially. This behavior can be manually overridden with a +pragma: + +```cilkc + #pragma cilk grainsize 128 + cilk_for (int i = 0; i < n; ++i) + array2[i] = f(i); +``` + +The pragma in the example tells the compiler that groups of 128 +consecutive iterations should be executed as a serial loop. If there +are 1024 loop iterations in total, there are only 8 parallel tasks. + +The argument to the grain size pragma must be an integer constant +in the range 1..231-1. [Do we want to deprecate this +range in favor of a smaller range, or in the other direction +up to a `size_t`?] + + +### Reducers + +A type may be suffixed with `cilk_reducer`. Syntactically it appears +where `*` may be used to declare a pointer type. The type to the left +of `cilk_reducer` is the _view type_. + +Two values appear in parentheses after `cilk_reducer`. Both must be +function types returning `void`. The first, the _identity callback_, +takes one argument of type `void *`. The second, the _reduce callback_, +takes two arguments of type `void *`. + +Two reducer types are the same if their view types are the same and +their callbacks are the same function mentioned by name. Otherwise +two reducer types are different and not compatible. This rule arises +from the impossibility of proving that two different functions are +identical. + +``` + extern void identity(void *), reduce(void *, void *); + extern void (*idp)(void *); + int cilk_reducer(identity, reduce) type1; + int cilk_reducer(identity, reduce) type2; // same as type1 + int cilk_reducer(idp, reduce) type3; + int cilk_reducer(idp, reduce) type4; // not the same as type3 +``` + +In the current version of OpenCilk the callbacks may be omitted. +This behavior may be removed in a future version of OpenCilk. + +## Execution of an OpenCilk program + +This section describes how the keywords added above may affect +execution. A basic principle of Cilk is that the new keywords do not +necessarily change execution. If `cilk_for` is replaced by `for` and +the other keywords are removed, the result is a valid C or C++ program +_with the same meaning_ called the %{ defn "serial projection" %}. A +program can be developed and debugged serially and parallelism added +later. + +### Strand + +A _%{defn "strand" %}_ is a series of instructions between one spawn +or sync and the next spawn or sync. A strand executes on a single +thread. + +In some cases it is necessary to specify exactly where the spawn point +is in a spawn statement. + +Contrary to the syntax, the spawn itself should be considered as +having a `void` value. The compiler internally rewrites an expression +like + +```cilkc +x = cilk_spawn f(); +``` + +into + +```cilkc +cilk_spawn { x = f(); } +``` + +In the current implementation all side effects other than +assignment of a function return value happen before the spawn. +The outermost function call of the spawned statement can be +considered the point of the spawn. + +```cilkc +x[i++] = cilk_spawn f(a++, b++, c++); +// It is safe to access i, a, b, and c here (they have been incremented) +// but it is not safe to access the memory location being assigned. +``` + +While syntactically valid, code like + +```cilkc +f(cilk_spawn i++); +``` + +does not work as expected: + +``` +nonsense.c:4:18: warning: Failed to emit spawn + g(cilk_spawn i++); + ^ +1 warning generated. +``` + +The return value of spawn is like a promise or lazily evaluated value. +If it is consumed immediately the parallelism is lost. + +As noted above, this syntax may be removed in a future version of +OpenCilk. + +The code that follows the spawn point is called the _continuation_ of +the spawn. + +### Sync + +A sync operation waits for previous spawns to complete before +continuing. + +#### Explicit sync + +An explicit sync is a statement using `cilk_sync`. This form normally +has function scope, meaning it waits for all spawns in the same +function. A sync inside the body of a `cilk_for` or `cilk_scope` only +waits for spawns inside the same construct. + +Sync scopes are disjoint, so a sync outside a `cilk_for` or +`cilk_scope` never waits for a spawn belonging to one of these +inner scopes. This situation does not normally occur. +[This handles +```cilkc +cilk_spawn cilk_scope { cilk_spawn ... } +``` +] + +#### Implicit syncs + +In addition to the sync statements in the code, there is an implicit sync +before exit from some scopes: + +* Before returning from a function, after calculating the value to + be returned. This sync has function scope. +* On exit from the body of a `try` block. This sync has function scope. + [Is this true? Is it true only if the try block spawns?] +* On exit from a `cilk_scope` statement. This sync has the scope of the + `cilk_scope` statement. +* On exit from the body of a `cilk_for`, i.e. once per iteration of the loop. + This sync has scope equal to the loop body. + [Does grain size change this?] + +[What about on entry to a try block?] + +When exiting from a block scope, destructors for block scope variables +are run after the implicit sync. + +## Differences between C++ and OpenCilk + +In C++ code there are two exceptions to the rule that serial and +parallel programs are the same. + +### Exceptions + +If a spawned function throws an exception, the parent function may +have continued straight line execution past the spawn. The serial +program goes directly to an exception handler. The difference is +observable if the continuation has side effects. + +When the parent function executes an implicit or explicit `cilk_sync` +the runtime checks whether the spawned child threw an exception. If +it did, any exception thrown by the parent is discarded and the +exception thrown by the child is handled at the sync. The compiler +inserts an implicit sync at the end of a try block if the try contains +a spawn. [Make sure this is consistent with the wording in the implicit +syncs section.] + +### Left hand side side effects + +[TB can explain this] + +## Races and reducers + +Concurrency invites races. If the same object is accessed by two +statements running in parallel, and at least one of the accesses is a +write, there is said to be a _%{defn "data race" %}_. Data races have +undefined behavior. + +[Do we want to clarify that atomic accesses are unspecified rather than undefined?] + +### Reducers + +%{ defn "hyperobject", "Hyperobjects" %} are special variables that +can be accessed in parallel without data races. The OpenCilk runtime +gives each thread running in parallel a separate copy of the variable +and merges the values as necessary. The local copy of the variable is +called a _view_. + +Because each thread may operate on a separate copy, the address of a +view is not constant. This is also true of thread local variables, +but reducers do not follow the same rules as thread local variables. + +The specific kind of hyperobject implemented by OpenCilk 2.0 is a +_reducer_. + +A view of a reducer has a well-defined value in serial code and an +indeterminate value in parallel code. + +monoid + +value is unspecified; do read-modify-write operations and ignore the result + +If the reduce callback does nothing the reducer is called a _holder_ +and is essentially a form of thread-local storage. + +#### Types + +A declaration of a reducer requires a _%{defn "monoid" }_. Aside from +the view type, a reducer monoid includes two callback functions. + +When declaring a type the `cilk_reducer` keyword is used in the same +contexts as `*` or (in C++) `&`. It follows a type, which is referred +to as the _view type_ of the reducer. + +```cilkc +int cilk_reducer(zero, plus) sum = 0; +int cilk_reducer(zero, plus) *sum_pointer = 0; +``` + +#### Values + +At any point in execution the value in a reducer is based on a +contiguous subset of all prior modifications performed in the serial +order of the program. The subset may be empty. + +When all spawns since the initialization of the variable have been +synced, the variable has the serially correct value. + + +#### Handles + +`__builtin_addressof` From d1357c2893db4322f476906985dc8aa6e228f60d Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Fri, 21 Oct 2022 12:11:01 -0400 Subject: [PATCH 2/9] Some comments from review in weekly meeting --- .../reference/opencilk-language-reference.md | 45 +++++++++++++------ 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index 348931e6..b642b36c 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -9,7 +9,7 @@ attribution: false --- OpenCilk is an extension to the C and C++ programming language adding -support for {% defn "task-parallel" %} programming. It uses a +support for {% defn "task-parallel programming" %}. It uses a modified version of [clang](https://clang.llvm.org) (the C compiler from the LLVM project) and a user-mode work-stealing scheduler. At the source level, OpenCilk has five additional keywords compared to C: @@ -66,10 +66,16 @@ after the `=` sign of an assignment (or `+=`, `-=`, etc.). ```cilkc int x = cilk_spawn f(0); -cilk_spawn y = f(1); +cilk_spawn y = f(1); // [TB says this is not the same as the previous] cilk_spawn { z = f(2); } ``` +[Only if cilk_spawn precedes the function call are the arguments +evaluated before the spawn.] + +[Test op= forms and add an example or remove the allegation that +they are allowed.] + Although the compiler accepts spawns inside of expressions, they are unlikely to have the expected semantics. A future version of the language may explicitly limit `cilk_spawn` to the contexts above, @@ -80,6 +86,10 @@ at or near the top of the parse tree of a statement. A sync statement, `cilk_sync;`, ends a region of potentially parallel execution. It takes no arguments. +[Find a real example with a conditional sync. Or have some spawns +to be synced. Matteo Frigo's all pairs shortest path code has +conditional sync, says TB.] + ```cilkc if (time_to_sync) cilk_sync; @@ -131,9 +141,12 @@ For the loop to be parallelized, several conditions must be met: * The third expression must modify the loop variable using `++`, `--`, `+=`, or `-=`. -`Break` may not be used to exit the body of a `cilk_for` loop. +The `break` statement may not be used to exit the body of a `cilk_for` loop. -[what about exceptions thrown from the loop body?] +[In the section on behavior, to be written, +discuss exceptions thrown from the loop body. +An exception probably aborts an unpredictable amount of +later work.] #### Grain size @@ -145,8 +158,10 @@ pragma: ```cilkc #pragma cilk grainsize 128 - cilk_for (int i = 0; i < n; ++i) + cilk_for (int i = 0; i < n; ++i) { + array1[i] = f(i); array2[i] = f(i); + } ``` The pragma in the example tells the compiler that groups of 128 @@ -194,16 +209,18 @@ This section describes how the keywords added above may affect execution. A basic principle of Cilk is that the new keywords do not necessarily change execution. If `cilk_for` is replaced by `for` and the other keywords are removed, the result is a valid C or C++ program -_with the same meaning_ called the %{ defn "serial projection" %}. A +_with the same meaning_ called the {% defn "serial projection" %}. A program can be developed and debugged serially and parallelism added later. ### Strand -A _%{defn "strand" %}_ is a series of instructions between one spawn +A _{%defn "strand" %}_ is a series of instructions between one spawn or sync and the next spawn or sync. A strand executes on a single thread. +#### Spawn + In some cases it is necessary to specify exactly where the spawn point is in a spawn statement. @@ -225,6 +242,8 @@ In the current implementation all side effects other than assignment of a function return value happen before the spawn. The outermost function call of the spawned statement can be considered the point of the spawn. +[The previous sentence was difficult for TB.] +[Probably rewrite the whole previous part.] ```cilkc x[i++] = cilk_spawn f(a++, b++, c++); @@ -256,12 +275,12 @@ OpenCilk. The code that follows the spawn point is called the _continuation_ of the spawn. -### Sync +#### Sync A sync operation waits for previous spawns to complete before continuing. -#### Explicit sync +##### Explicit sync An explicit sync is a statement using `cilk_sync`. This form normally has function scope, meaning it waits for all spawns in the same @@ -277,7 +296,7 @@ cilk_spawn cilk_scope { cilk_spawn ... } ``` ] -#### Implicit syncs +##### Implicit syncs In addition to the sync statements in the code, there is an implicit sync before exit from some scopes: @@ -325,14 +344,14 @@ syncs section.] Concurrency invites races. If the same object is accessed by two statements running in parallel, and at least one of the accesses is a -write, there is said to be a _%{defn "data race" %}_. Data races have +write, there is said to be a _{%defn "data race" %}_. Data races have undefined behavior. [Do we want to clarify that atomic accesses are unspecified rather than undefined?] ### Reducers -%{ defn "hyperobject", "Hyperobjects" %} are special variables that +{% defn "hyperobject", "Hyperobjects" %} are special variables that can be accessed in parallel without data races. The OpenCilk runtime gives each thread running in parallel a separate copy of the variable and merges the values as necessary. The local copy of the variable is @@ -357,7 +376,7 @@ and is essentially a form of thread-local storage. #### Types -A declaration of a reducer requires a _%{defn "monoid" }_. Aside from +A declaration of a reducer requires a _{%defn "monoid" }_. Aside from the view type, a reducer monoid includes two callback functions. When declaring a type the `cilk_reducer` keyword is used in the same From b3244c067ac6a4e77462be0f49ea93d5083ea1a3 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Fri, 21 Oct 2022 12:15:21 -0400 Subject: [PATCH 3/9] Fix compile errors --- src/doc/reference/opencilk-language-reference.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index b642b36c..0853167e 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -31,12 +31,12 @@ compared to older forms of parallelism using C's `pthread_create` and Java's `Thread.start`. These functions encourage writing programs that do not work without multithreading. -The statements executed in a task parallel program form a {% defn -"parallel trace", "directed acyclic graph" %} (DAG). A spawn node has -one incoming edge and two outgoing edges. A sync node has one -outgoing edge. Two statements are said to be logically parallel if -neither precedes the other in DAG order. Whether they actually run -in parallel (at the same time) depends on scheduling. +The statements executed in a task parallel program form a directed +acyclic graph (DAG). A spawn node has one incoming edge and two +outgoing edges. A sync node has one outgoing edge. Two statements +are said to be logically parallel if neither precedes the other in DAG +order. Whether they actually run in parallel (at the same time) +depends on scheduling. ```cilkc int x = cilk_spawn f(); // the body of f()... @@ -376,7 +376,7 @@ and is essentially a form of thread-local storage. #### Types -A declaration of a reducer requires a _{%defn "monoid" }_. Aside from +A declaration of a reducer requires a _{% defn "monoid" %}_. Aside from the view type, a reducer monoid includes two callback functions. When declaring a type the `cilk_reducer` keyword is used in the same From 534a91c7f1dc9376ac183ec8a9aa36e0fe127507 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Fri, 21 Oct 2022 15:05:05 -0400 Subject: [PATCH 4/9] Various edits --- .../reference/opencilk-language-reference.md | 85 ++++++++++++------- 1 file changed, 56 insertions(+), 29 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index 0853167e..b9c6a7b6 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -65,14 +65,11 @@ The `cilk_spawn` keyword should appear at the start of a statement or after the `=` sign of an assignment (or `+=`, `-=`, etc.). ```cilkc -int x = cilk_spawn f(0); -cilk_spawn y = f(1); // [TB says this is not the same as the previous] -cilk_spawn { z = f(2); } +int x = cilk_spawn f(i++); +cilk_spawn y[j++] = f(i++); +cilk_spawn { z[j++] = f(i++); } ``` -[Only if cilk_spawn precedes the function call are the arguments -evaluated before the spawn.] - [Test op= forms and add an example or remove the allegation that they are allowed.] @@ -173,6 +170,11 @@ in the range 1..231-1. [Do we want to deprecate this range in favor of a smaller range, or in the other direction up to a `size_t`?] +Whether the grain size is static or dynamic, an exception thrown from +the loop body will abort the remainder of the group of iterations. +The scope of `cilk_sync` will also include all iterations in the +group. + ### Reducers @@ -222,28 +224,36 @@ thread. #### Spawn In some cases it is necessary to specify exactly where the spawn point -is in a spawn statement. +is in a spawn statement. All code up to the point of spawning +executes in series. The code that follows the spawn point is called +the _continuation_ of the spawn. It potentially executes in parallel +with the following statements of the program. Contrary to the syntax, the spawn itself should be considered as -having a `void` value. The compiler internally rewrites an expression -like +having a `void` value. The return value of spawn is like a promise or +lazily evaluated value. If it is consumed immediately the parallelism +is lost. The compiler rewrites supported uses of `cilk_spawn` into a +form where the value of the spawned expression is consumed in the +continuation. -```cilkc -x = cilk_spawn f(); -``` -into +These three statements are equivalent: ```cilkc +x = cilk_spawn f(); +cilk_spawn x = f(); cilk_spawn { x = f(); } ``` -In the current implementation all side effects other than -assignment of a function return value happen before the spawn. -The outermost function call of the spawned statement can be -considered the point of the spawn. -[The previous sentence was difficult for TB.] -[Probably rewrite the whole previous part.] +When there are side effects the situation is more complex. + +When `cilk_spawn` appears after an assignment operator and before a +function call, the spawn is after all arguments are evaluated and +after the address of the left hand side is evaluated. Side effects do +not race with the continuation. The spawn is before the function call +and assignment of its return value. The body of the called function +and the assignment do race with the continuation. In C++, destructors +for function arguments also race with the continuation. ```cilkc x[i++] = cilk_spawn f(a++, b++, c++); @@ -251,6 +261,9 @@ x[i++] = cilk_spawn f(a++, b++, c++); // but it is not safe to access the memory location being assigned. ``` +When `cilk_spawn` appears at the start of a statement the entire +statement is spawned and everything in it races with the continuation. + While syntactically valid, code like ```cilkc @@ -266,14 +279,13 @@ nonsense.c:4:18: warning: Failed to emit spawn 1 warning generated. ``` -The return value of spawn is like a promise or lazily evaluated value. -If it is consumed immediately the parallelism is lost. - -As noted above, this syntax may be removed in a future version of -OpenCilk. +In this case the compiler is unable to move the use of `i++` into the +spawned expression. -The code that follows the spawn point is called the _continuation_ of -the spawn. +This syntax may be removed in a future version of OpenCilk and +`cilk_spawn` required to appear at the start of a statement or between +an assignment operator and an immediately following function call or +constructor call. #### Sync @@ -295,6 +307,8 @@ inner scopes. This situation does not normally occur. cilk_spawn cilk_scope { cilk_spawn ... } ``` ] +[But the Cilk scope doesn't exit until the inner spawn is synced so maybe +this does not need a rule after all.] ##### Implicit syncs @@ -309,7 +323,8 @@ before exit from some scopes: `cilk_scope` statement. * On exit from the body of a `cilk_for`, i.e. once per iteration of the loop. This sync has scope equal to the loop body. - [Does grain size change this?] + If a grain size is specified, the sync affects the entire group of + iterations in which it is executes. [TODO: Test grain size.] [What about on entry to a try block?] @@ -338,7 +353,18 @@ syncs section.] ### Left hand side side effects -[TB can explain this] +When a function call is spawned in OpenCilk and the result is +assigned, the compiler evaluates the address of the left hand side of +the assignment before calling the function. This conflicts with +recent versions of C++, which require evaluation of the left hand +side to follow return from the function. + +```cilkcpp + extern int global; + a[global++] = cilk_spawn f(); // f() sees incremented value of global +``` + +[TB needs to confirm or deny this] ## Races and reducers @@ -347,7 +373,8 @@ statements running in parallel, and at least one of the accesses is a write, there is said to be a _{%defn "data race" %}_. Data races have undefined behavior. -[Do we want to clarify that atomic accesses are unspecified rather than undefined?] +Parallel atomic accesses do not yield undefined behavior, but the +order of parallel atomic operations is generally unspecified. ### Reducers From 8fc68f22de76eb19f264ae280f7af6f941dfc12b Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Tue, 25 Oct 2022 13:04:22 -0400 Subject: [PATCH 5/9] Update parts of language spec to match implementation --- .../reference/opencilk-language-reference.md | 53 +++++++++++-------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index b9c6a7b6..c68c6dd9 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -70,13 +70,11 @@ cilk_spawn y[j++] = f(i++); cilk_spawn { z[j++] = f(i++); } ``` -[Test op= forms and add an example or remove the allegation that -they are allowed.] - -Although the compiler accepts spawns inside of expressions, they are -unlikely to have the expected semantics. A future version of the -language may explicitly limit `cilk_spawn` to the contexts above, -at or near the top of the parse tree of a statement. +Although the compiler accepts `cilk_spawn` before almost any +expression, spawns inside of expressions are unlikely to have the +expected semantics. A future version of the language may explicitly +limit `cilk_spawn` to the contexts above, at or near the top of the +parse tree of a statement. ### Sync @@ -127,7 +125,8 @@ cilk_for (int i = 0; i < n; ++i) The syntax of a `cilk_for` statement is very similar to a C `for` statement. It is followed by three expressions, the first of which may declare variables. Unlike in C all three expressions are -mandatory. C++ range `for` constructs are not allowed. +mandatory. C++ range `for` constructs are not allowed. (Range for +is planned for a future version of OpenCilk.) For the loop to be parallelized, several conditions must be met: @@ -147,11 +146,11 @@ later work.] #### Grain size -The compiler often recognizes that the overhead of allowing parallel -execution can exceed the benefit. If the body of a loop does little -work the compiler will arrange for groups of consecutive iterations to -run sequentially. This behavior can be manually overridden with a -pragma: +If a single loop iteration does very little work, the overhead of +spawning it exceeds any benefit from parallelism. In many cases the +compiler will recognize this situation and merge several consecutive +iterations into a single task that runs sequentially. This behavior +can be manually overridden with a pragma: ```cilkc #pragma cilk grainsize 128 @@ -161,14 +160,18 @@ pragma: } ``` -The pragma in the example tells the compiler that groups of 128 +The pragma in the example tells the compiler that each group of 128 consecutive iterations should be executed as a serial loop. If there are 1024 loop iterations in total, there are only 8 parallel tasks. +There is guaranteed to be no spawn or sync between the iterations +for `i=0` and `i=1` (assuming `n` is at least 2, otherwise there +will be no second iteration). + +In OpenCilk 2.0 the argument to the grain size pragma must be an +integer constant in the range 1..231-1. -The argument to the grain size pragma must be an integer constant -in the range 1..231-1. [Do we want to deprecate this -range in favor of a smaller range, or in the other direction -up to a `size_t`?] +Without an explicit grainsize the runtime will choose a value from 1 +to 2048. Whether the grain size is static or dynamic, an exception thrown from the loop body will abort the remainder of the group of iterations. @@ -182,10 +185,10 @@ A type may be suffixed with `cilk_reducer`. Syntactically it appears where `*` may be used to declare a pointer type. The type to the left of `cilk_reducer` is the _view type_. -Two values appear in parentheses after `cilk_reducer`. Both must be -function types returning `void`. The first, the _identity callback_, -takes one argument of type `void *`. The second, the _reduce callback_, -takes two arguments of type `void *`. +Two values appear in parentheses after `cilk_reducer`. Both emust be +functions returning `void` or pointers to functions returning void. +The first, the _identity callback_, takes one argument of type `void*`. +The second, the _reduce callback_, takes two arguments of type `void *`. Two reducer types are the same if their view types are the same and their callbacks are the same function mentioned by name. Otherwise @@ -205,6 +208,12 @@ identical. In the current version of OpenCilk the callbacks may be omitted. This behavior may be removed in a future version of OpenCilk. +In the current version of OpenCilk the arguments to `cilk_reducer` +are evaluated each time a reducer is created. This behavior may +change in a future version of OpenCilk. For compatibility and +predictable behavior the arguments to `cilk_reducer` should not +have side effects. + ## Execution of an OpenCilk program This section describes how the keywords added above may affect From 3d3fea9edc7941f442b5adaaafbdca2bf9dc34cf Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Tue, 25 Oct 2022 19:18:03 -0400 Subject: [PATCH 6/9] More edits, especially reducers --- .../reference/opencilk-language-reference.md | 122 ++++++++++++------ 1 file changed, 83 insertions(+), 39 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index c68c6dd9..9a26ca33 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -61,8 +61,9 @@ identifiers. A statement using `cilk_spawn` is the start of a potentially parallel region of code. -The `cilk_spawn` keyword should appear at the start of a statement or -after the `=` sign of an assignment (or `+=`, `-=`, etc.). +The `cilk_spawn` keyword should appear at the start of an expression +statement or after the `=` sign of an assignment (or `+=`, `-=`, +etc.). ```cilkc int x = cilk_spawn f(i++); @@ -76,6 +77,8 @@ expected semantics. A future version of the language may explicitly limit `cilk_spawn` to the contexts above, at or near the top of the parse tree of a statement. +A declaration may not begin with `cilk_spawn`. + ### Sync A sync statement, `cilk_sync;`, ends a region of potentially parallel @@ -125,8 +128,8 @@ cilk_for (int i = 0; i < n; ++i) The syntax of a `cilk_for` statement is very similar to a C `for` statement. It is followed by three expressions, the first of which may declare variables. Unlike in C all three expressions are -mandatory. C++ range `for` constructs are not allowed. (Range for -is planned for a future version of OpenCilk.) +mandatory. Parallel C++ range `for` constructs are not supported +in OpenCilk 2.0. For the loop to be parallelized, several conditions must be met: @@ -137,12 +140,13 @@ For the loop to be parallelized, several conditions must be met: * The third expression must modify the loop variable using `++`, `--`, `+=`, or `-=`. -The `break` statement may not be used to exit the body of a `cilk_for` loop. - -[In the section on behavior, to be written, -discuss exceptions thrown from the loop body. -An exception probably aborts an unpredictable amount of -later work.] +Because loop iterations may execute out of order there is no way to +predictably stop the loop in the middle. The `break` statement may +not be used to exit the body of a `cilk_for` loop. An exception +thrown out of a loop body is only guaranteed to terminate the current +iteration. (Also, any later iteration of the same grain; see below.) +The effect on other iterations is unpredictable; they may run to +completion or not run at all. #### Grain size @@ -205,8 +209,9 @@ identical. int cilk_reducer(idp, reduce) type4; // not the same as type3 ``` -In the current version of OpenCilk the callbacks may be omitted. -This behavior may be removed in a future version of OpenCilk. +In the current version of OpenCilk the callbacks may be omitted in +contexts other than definition of a variable. This behavior may be +removed in a future version of OpenCilk. In the current version of OpenCilk the arguments to `cilk_reducer` are evaluated each time a reducer is created. This behavior may @@ -241,12 +246,10 @@ with the following statements of the program. Contrary to the syntax, the spawn itself should be considered as having a `void` value. The return value of spawn is like a promise or lazily evaluated value. If it is consumed immediately the parallelism -is lost. The compiler rewrites supported uses of `cilk_spawn` into a -form where the value of the spawned expression is consumed in the -continuation. +is lost. When `cilk_spawn` follows `=` the store to the left hand +side is made part of what is spawned. - -These three statements are equivalent: +These three statements assigning to an ordinary variable are equivalent: ```cilkc x = cilk_spawn f(); @@ -308,16 +311,14 @@ has function scope, meaning it waits for all spawns in the same function. A sync inside the body of a `cilk_for` or `cilk_scope` only waits for spawns inside the same construct. -Sync scopes are disjoint, so a sync outside a `cilk_for` or -`cilk_scope` never waits for a spawn belonging to one of these -inner scopes. This situation does not normally occur. -[This handles +If spawns are nested as in ```cilkc cilk_spawn cilk_scope { cilk_spawn ... } +... +cilk_sync; ``` -] -[But the Cilk scope doesn't exit until the inner spawn is synced so maybe -this does not need a rule after all.] +a `cilk_sync` at top level waits for the outer spawn to complete, and +the outer spawn waits for the inner spawn to complete. ##### Implicit syncs @@ -360,6 +361,13 @@ inserts an implicit sync at the end of a try block if the try contains a spawn. [Make sure this is consistent with the wording in the implicit syncs section.] +If an exception is thrown from the body of a `cilk_for` statement the +current loop iteration is aborted, consistent with the semantics of +`throw`. If the `grainsize` pragma is used, later iterations in the +current grain do not execute. No guarantee is made about which other +loop iterations execute, except that a grain in progress is not +affected by an exception thrown from outside the grain. + ### Left hand side side effects When a function call is spawned in OpenCilk and the result is @@ -397,27 +405,53 @@ Because each thread may operate on a separate copy, the address of a view is not constant. This is also true of thread local variables, but reducers do not follow the same rules as thread local variables. -The specific kind of hyperobject implemented by OpenCilk 2.0 is a -_reducer_. - -A view of a reducer has a well-defined value in serial code and an -indeterminate value in parallel code. +Also because each thread operates on a separate copy, a view of a +reducer only has a well-defined value in serial code. Within any +strand the value does not change except due to that strand's actions. +At a strand boundary the value of a view may change even if its +address does not. -monoid +The specific kind of hyperobject implemented by OpenCilk 2.0 is a +_reducer_. Reducers are intended for operations like accumulation and +list concatenation. OpenCilk guarantees that if a reducer is operated +on as a _{% defn "monoid" %}_, the final value in otherwise race-free +parallel code will be the same as in serial code. + +A monoid is a combination of a data type (the view type), an +_identity_ value, and a binary associative operation. If either +operand of the associative operation is the identity value, the result +is the other operand. For example, `(double, 0.0, +)` is a monoid +(ignoring non-finite values). In a reducer type the identity value is +provided by a function; the function takes a pointer to a view (cast +to `void *`) and should store the identity value there. The reduction +operation takes two pointers to views (cast to `void *`) and should +combine the two views and deposit the result in the first view. The +first view is often called the _left_ view and program execution is +considered to go left to right. + +If the view type is a C++ object with a non-trivial constructor the +identity function should use placement new to construct a new view. +If the view type is a C++ object with a non-trivial destructor the +reduce function should explicitly call the destructor on the second +(right) argument. The storage will be freed by the runtime; use +the `->~T()` syntax instead of `delete`. -value is unspecified; do read-modify-write operations and ignore the result +``` +void identity(void *view) { + new(view) View(); +} +void reduce(void *left, void *right) { + static_cast(left)->merge(static_cast(right)); + static_cast(right)->~View(); +} +``` If the reduce callback does nothing the reducer is called a _holder_ and is essentially a form of thread-local storage. -#### Types - -A declaration of a reducer requires a _{% defn "monoid" %}_. Aside from -the view type, a reducer monoid includes two callback functions. - When declaring a type the `cilk_reducer` keyword is used in the same contexts as `*` or (in C++) `&`. It follows a type, which is referred -to as the _view type_ of the reducer. +to as the view type of the reducer. ```cilkc int cilk_reducer(zero, plus) sum = 0; @@ -427,7 +461,7 @@ int cilk_reducer(zero, plus) *sum_pointer = 0; #### Values At any point in execution the value in a reducer is based on a -contiguous subset of all prior modifications performed in the serial +contiguous subset of prior modifications performed in the serial order of the program. The subset may be empty. When all spawns since the initialization of the variable have been @@ -436,4 +470,14 @@ synced, the variable has the serially correct value. #### Handles -`__builtin_addressof` +Taking the address of a reducer gives a pointer to the current view. +To pass a reducer by reference to reducer-aware code, use the +function `__builtin_addressof`. + +```cilkc + extern void f_reducer(double reducer(zero, add) *); + extern void f_view(double *); + double reducer(zero, add) x = 0.0; + f_reducer(__builtin_addressof(x)); + f_view(&x); +``` From 0eed8f5fe225468335767b3512d4f4ae9438bf24 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Wed, 2 Nov 2022 09:20:24 -0400 Subject: [PATCH 7/9] More updates to language reference --- .../reference/opencilk-language-reference.md | 208 +++++++++++------- 1 file changed, 126 insertions(+), 82 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index 9a26ca33..06894f66 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -61,36 +61,41 @@ identifiers. A statement using `cilk_spawn` is the start of a potentially parallel region of code. -The `cilk_spawn` keyword should appear at the start of an expression -statement or after the `=` sign of an assignment (or `+=`, `-=`, -etc.). +The `cilk_spawn` keyword should appear before an expression statement, +before a block statement, after the `=` sign of a variable +initialization, or after the `=` of an assignment that is the entire +body of an expression statement. ```cilkc int x = cilk_spawn f(i++); +x = cilk_spawn f(i++); cilk_spawn y[j++] = f(i++); cilk_spawn { z[j++] = f(i++); } ``` -Although the compiler accepts `cilk_spawn` before almost any +A future version of OpenCilk may limit use of `cilk_spawn` to +these four contexts. + +OpenCilk 2.0 allows other statements, except declarations, to be +spawned. Although the compiler accepts `cilk_spawn` before almost any expression, spawns inside of expressions are unlikely to have the -expected semantics. A future version of the language may explicitly -limit `cilk_spawn` to the contexts above, at or near the top of the -parse tree of a statement. +expected semantics. -A declaration may not begin with `cilk_spawn`. +OpenCilk 2.0 will also accept `cilk_spawn;` as a statement with no +effect. ### Sync A sync statement, `cilk_sync;`, ends a region of potentially parallel -execution. It takes no arguments. - -[Find a real example with a conditional sync. Or have some spawns -to be synced. Matteo Frigo's all pairs shortest path code has -conditional sync, says TB.] +execution. It takes no arguments. It may be conditional and has +no effect if not executed. ```cilkc -if (time_to_sync) - cilk_sync; +for (int i = 0; i < n; i++) { + cilk_spawn f(i); + if (i % 4 == 3) + cilk_sync; +} ``` ### Scope @@ -113,6 +118,8 @@ cilk_scope { // x, y, and z are usable here because of the implicit sync ``` +The compiler also accepts `cilk_scope;` as a statement with no effect. + ### For A loop written using `cilk_for` executes each iteration of its body in @@ -126,27 +133,36 @@ cilk_for (int i = 0; i < n; ++i) ``` The syntax of a `cilk_for` statement is very similar to a C `for` -statement. It is followed by three expressions, the first of which -may declare variables. Unlike in C all three expressions are -mandatory. Parallel C++ range `for` constructs are not supported -in OpenCilk 2.0. - -For the loop to be parallelized, several conditions must be met: - -* The first expression must declare a variable (the "loop variable"). -* The second expression must compare the loop variable using one of the - relational operators `<=`, `<`, `!=`, `>`, and `>=`. -* The value compared to must be [...] -* The third expression must modify the loop variable using `++`, +statement except that none of the three items in parentheses may be +omitted. C++ "range for" is not supported with `cilk_for` in +OpenCilk 2.0. + +The first statement inside parentheses must declare at least one +variable. + +While the following constraints not required by syntax, the compiler +may not be able to parallelize the loop if they are not satisfied. + +* The first expression must declare one variable, the _control variable_. +* In C the control variable must be an integer no larger than 64 bits or + a pointer to a complete type. In C++ it may be any random access iterator. + Among other things, this implies that the difference between starting and + ending values must be an integer computable by subtraction or `operator-`. +* The second expression must compare the control variable using one of the + relational operators `<=`, `<`, `!=`, `>`, and `>=`. The value to which + it is compared is the _loop bound_. (See below for the + interpretation of this value.) +* The third expression must modify the control variable using `++`, `--`, `+=`, or `-=`. +The compiler will emit a warning if the loop can not be unrolled, +eliminated, or parallelized. + Because loop iterations may execute out of order there is no way to predictably stop the loop in the middle. The `break` statement may not be used to exit the body of a `cilk_for` loop. An exception thrown out of a loop body is only guaranteed to terminate the current -iteration. (Also, any later iteration of the same grain; see below.) -The effect on other iterations is unpredictable; they may run to -completion or not run at all. +iteration. #### Grain size @@ -167,9 +183,6 @@ can be manually overridden with a pragma: The pragma in the example tells the compiler that each group of 128 consecutive iterations should be executed as a serial loop. If there are 1024 loop iterations in total, there are only 8 parallel tasks. -There is guaranteed to be no spawn or sync between the iterations -for `i=0` and `i=1` (assuming `n` is at least 2, otherwise there -will be no second iteration). In OpenCilk 2.0 the argument to the grain size pragma must be an integer constant in the range 1..231-1. @@ -177,28 +190,24 @@ integer constant in the range 1..231-1. Without an explicit grainsize the runtime will choose a value from 1 to 2048. -Whether the grain size is static or dynamic, an exception thrown from -the loop body will abort the remainder of the group of iterations. -The scope of `cilk_sync` will also include all iterations in the -group. - - ### Reducers -A type may be suffixed with `cilk_reducer`. Syntactically it appears -where `*` may be used to declare a pointer type. The type to the left -of `cilk_reducer` is the _view type_. +A type may be suffixed with `cilk_reducer`. Syntactically this +keyword appears where `*` may be used to declare a pointer type. The +type to the left of `cilk_reducer` is the _view type_. -Two values appear in parentheses after `cilk_reducer`. Both emust be -functions returning `void` or pointers to functions returning void. -The first, the _identity callback_, takes one argument of type `void*`. -The second, the _reduce callback_, takes two arguments of type `void *`. +Two values appear in parentheses after `cilk_reducer`, separated by a +comma. Both must be functions returning `void` or pointers to +functions returning void. The first, the _identity callback_, takes +one argument of type `void *`. The second, the _reduce callback_, +takes two arguments of type `void *`. Two reducer types are the same if their view types are the same and -their callbacks are the same function mentioned by name. Otherwise -two reducer types are different and not compatible. This rule arises -from the impossibility of proving that two different functions are -identical. +their corresponding callbacks are the same function mentioned by name. +Otherwise two reducer types are different and not compatible. The +requirement that the corresponding arguments be manifestly the same +function is dictated by the impossibility of proving that two +different expressions are equivalent. ``` extern void identity(void *), reduce(void *, void *); @@ -209,15 +218,15 @@ identical. int cilk_reducer(idp, reduce) type4; // not the same as type3 ``` -In the current version of OpenCilk the callbacks may be omitted in -contexts other than definition of a variable. This behavior may be -removed in a future version of OpenCilk. +In the OpenCilk 2.0 the callbacks may be omitted in contexts other +than definition of a variable. This behavior may be removed in a +future version of OpenCilk. -In the current version of OpenCilk the arguments to `cilk_reducer` -are evaluated each time a reducer is created. This behavior may -change in a future version of OpenCilk. For compatibility and -predictable behavior the arguments to `cilk_reducer` should not -have side effects. +In the current version of OpenCilk the arguments to `cilk_reducer` are +evaluated each time a reducer is created but not when a reducer is +accessed. This behavior may change in a future version of OpenCilk. +For compatibility and predictable behavior the arguments to +`cilk_reducer` should not have side effects. ## Execution of an OpenCilk program @@ -240,8 +249,8 @@ thread. In some cases it is necessary to specify exactly where the spawn point is in a spawn statement. All code up to the point of spawning executes in series. The code that follows the spawn point is called -the _continuation_ of the spawn. It potentially executes in parallel -with the following statements of the program. +the _continuation_ of the spawn. The spawn potentially executes in +parallel with the continuation, up to the next sync. Contrary to the syntax, the spawn itself should be considered as having a `void` value. The return value of spawn is like a promise or @@ -306,7 +315,7 @@ continuing. ##### Explicit sync -An explicit sync is a statement using `cilk_sync`. This form normally +An explicit sync is the statement `cilk_sync;`. This form normally has function scope, meaning it waits for all spawns in the same function. A sync inside the body of a `cilk_for` or `cilk_scope` only waits for spawns inside the same construct. @@ -317,8 +326,8 @@ cilk_spawn cilk_scope { cilk_spawn ... } ... cilk_sync; ``` -a `cilk_sync` at top level waits for the outer spawn to complete, and -the outer spawn waits for the inner spawn to complete. +a `cilk_sync` at top level waits for the top level spawn to complete, and +the top level spawn waits for everything spawned inside it to complete. ##### Implicit syncs @@ -327,24 +336,22 @@ before exit from some scopes: * Before returning from a function, after calculating the value to be returned. This sync has function scope. -* On exit from the body of a `try` block. This sync has function scope. - [Is this true? Is it true only if the try block spawns?] * On exit from a `cilk_scope` statement. This sync has the scope of the `cilk_scope` statement. * On exit from the body of a `cilk_for`, i.e. once per iteration of the loop. This sync has scope equal to the loop body. - If a grain size is specified, the sync affects the entire group of - iterations in which it is executes. [TODO: Test grain size.] - -[What about on entry to a try block?] +* Before entering a `catch` block. This sync has the same scope as + the `try .. catch` construct as a whole: the smallest enclosing + function, `cilk_scope`, or `cilk_for` body. [No, it applies to + the try block, which gets its own sync region. Test this.] When exiting from a block scope, destructors for block scope variables are run after the implicit sync. ## Differences between C++ and OpenCilk -In C++ code there are two exceptions to the rule that serial and -parallel programs are the same. +There are three exceptions to the rule that serial and parallel +programs are the same. ### Exceptions @@ -356,32 +363,69 @@ observable if the continuation has side effects. When the parent function executes an implicit or explicit `cilk_sync` the runtime checks whether the spawned child threw an exception. If it did, any exception thrown by the parent is discarded and the -exception thrown by the child is handled at the sync. The compiler -inserts an implicit sync at the end of a try block if the try contains -a spawn. [Make sure this is consistent with the wording in the implicit -syncs section.] +exception thrown by the child is handled as if thrown at the sync. +The compiler inserts an implicit sync at the end of a try block if the +try contains a spawn. If an exception is thrown from the body of a `cilk_for` statement the current loop iteration is aborted, consistent with the semantics of -`throw`. If the `grainsize` pragma is used, later iterations in the -current grain do not execute. No guarantee is made about which other -loop iterations execute, except that a grain in progress is not -affected by an exception thrown from outside the grain. +`throw`. Other loop iterations may or may not execute, depending on +scheduling. An exception thrown by one iteration of the loop will not +prematurely terminate another iteration. + +If more than one exception reaches a sync the earliest in serial +order is thrown by the sync. The other exceptions are destructed. ### Left hand side side effects When a function call is spawned in OpenCilk and the result is assigned, the compiler evaluates the address of the left hand side of the assignment before calling the function. This conflicts with -recent versions of C++, which require evaluation of the left hand -side to follow return from the function. +C++17, which requires evaluation of the left hand side to follow +return from the function. ```cilkcpp extern int global; a[global++] = cilk_spawn f(); // f() sees incremented value of global ``` -[TB needs to confirm or deny this] +Occurring before the spawn, the evaluation of the left hand side is in +series with the continuation of the spawn. + +### Loops + +Parallel for loops are implemented by looping over an integer range. +This transformation requires that the loop count be known before +the loop begins execution and that the control variable be calculable +by adding an integer to the starting value. + +The observable differences are + +* The loop bound expression may be executed fewer times, likely only +once. + +* In C++, `operator-` may be called to subtract the start value from +the loop bound (if the increment is positive) or the loop bound from +the initial value (if the increment is negative). + +* The loop increment expression may not be executed or may be executed +fewer times than in the serial program. + +* In C++, `operator+` may be called to add an integer to the starting +value of the control variable. + +The program is not guaranteed to call `operator+` or `operator-` and +these operators not have side effects. If the loop is not +parallelized it may be executed as written. For example, the compiler +may decide to unroll the loop instead. If a `cilk_for` loop is +compiled to a serial loop the compiler will emit a warning. + +The control variable must not wrap around, even if the control +variable is an unsigned integer with well defined semantics. As +consequence of this rule, if the loop condition uses `!=` the +difference between start and end must be an exact multiple of the +increment. This can also be expressed as a requirement that the +difference between start and end fit in a signed integer. ## Races and reducers From 103fb8c6ed8605813ad082d76730da01aa37b739 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Wed, 2 Nov 2022 14:47:55 -0400 Subject: [PATCH 8/9] Correct description of try .. catch behavior --- src/doc/reference/opencilk-language-reference.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index 06894f66..1b7cbb29 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -317,8 +317,10 @@ continuing. An explicit sync is the statement `cilk_sync;`. This form normally has function scope, meaning it waits for all spawns in the same -function. A sync inside the body of a `cilk_for` or `cilk_scope` only -waits for spawns inside the same construct. +function. A sync inside the body of a `try`, `cilk_for`, or +`cilk_scope` only waits for spawns inside the same construct. +A sync in the `catch` block of a `try ... catch` construct does +wait for spawns in the enclosing scope. If spawns are nested as in ```cilkc @@ -340,13 +342,11 @@ before exit from some scopes: `cilk_scope` statement. * On exit from the body of a `cilk_for`, i.e. once per iteration of the loop. This sync has scope equal to the loop body. -* Before entering a `catch` block. This sync has the same scope as - the `try .. catch` construct as a whole: the smallest enclosing - function, `cilk_scope`, or `cilk_for` body. [No, it applies to - the try block, which gets its own sync region. Test this.] +* On exit from a `try` block, whether or not an exception is thrown. + This sync has the scope of the try block. -When exiting from a block scope, destructors for block scope variables -are run after the implicit sync. +When exiting from a scope with an implicit sync, destructors for +variables defined in that scope are called after the implicit sync. ## Differences between C++ and OpenCilk From a9a2b9a82b012a537ae819fc8434fe9db87cbda3 Mon Sep 17 00:00:00 2001 From: "John F. Carr" Date: Thu, 3 Nov 2022 11:45:59 -0400 Subject: [PATCH 9/9] Clarification and copyediting --- .../reference/opencilk-language-reference.md | 136 ++++++++++-------- 1 file changed, 78 insertions(+), 58 deletions(-) diff --git a/src/doc/reference/opencilk-language-reference.md b/src/doc/reference/opencilk-language-reference.md index 1b7cbb29..8b17be35 100644 --- a/src/doc/reference/opencilk-language-reference.md +++ b/src/doc/reference/opencilk-language-reference.md @@ -32,11 +32,12 @@ Java's `Thread.start`. These functions encourage writing programs that do not work without multithreading. The statements executed in a task parallel program form a directed -acyclic graph (DAG). A spawn node has one incoming edge and two -outgoing edges. A sync node has one outgoing edge. Two statements -are said to be logically parallel if neither precedes the other in DAG -order. Whether they actually run in parallel (at the same time) -depends on scheduling. +acyclic graph (DAG). Most statements execute sequentially. A spawn +node has one incoming edge and two outgoing edges. A sync node has +one outgoing edge. Two statements are said to be logically parallel +if neither precedes the other in DAG order. Whether they actually run +in parallel (at the same time) depends on scheduling. Statements that +are not in parallel are said to be in series. ```cilkc int x = cilk_spawn f(); // the body of f()... @@ -102,9 +103,8 @@ for (int i = 0; i < n; i++) { The keyword `cilk_scope` is followed by a statement, normally a compound statement. Any spawns within the statement are synced before -exit from the statment. Syncs within the statement, including the -implicit sync before exit, do not wait for spawns outside the -statement. +exit from the statment. Syncs within the statement do not wait for +spawns outside the statement. ```cilkc int w, x, y, z; @@ -123,7 +123,7 @@ The compiler also accepts `cilk_scope;` as a statement with no effect. ### For A loop written using `cilk_for` executes each iteration of its body in -parallel. +parallel with all the others. ```cilkc cilk_for (int i = 0; i < n; ++i) @@ -158,6 +158,11 @@ may not be able to parallelize the loop if they are not satisfied. The compiler will emit a warning if the loop can not be unrolled, eliminated, or parallelized. +The first evaluation of the loop condition precedes any iteration of +the loop. Otherwise it is unspecified whether the loop condition and +increment expressions execute in parallel with any instance of the +loop body. These expressions should not have side effects. + Because loop iterations may execute out of order there is no way to predictably stop the loop in the middle. The `break` statement may not be used to exit the body of a `cilk_for` loop. An exception @@ -170,7 +175,7 @@ If a single loop iteration does very little work, the overhead of spawning it exceeds any benefit from parallelism. In many cases the compiler will recognize this situation and merge several consecutive iterations into a single task that runs sequentially. This behavior -can be manually overridden with a pragma: +can be overridden with a pragma: ```cilkc #pragma cilk grainsize 128 @@ -248,9 +253,10 @@ thread. In some cases it is necessary to specify exactly where the spawn point is in a spawn statement. All code up to the point of spawning -executes in series. The code that follows the spawn point is called -the _continuation_ of the spawn. The spawn potentially executes in -parallel with the continuation, up to the next sync. +executes in series. The code that follows the spawning statement is +called the _continuation_ of the spawn. Code after the spawn point in +the spawning statement potentially executes in parallel with the +continuation, up to the next sync in the same scope. Contrary to the syntax, the spawn itself should be considered as having a `void` value. The return value of spawn is like a promise or @@ -270,11 +276,12 @@ When there are side effects the situation is more complex. When `cilk_spawn` appears after an assignment operator and before a function call, the spawn is after all arguments are evaluated and -after the address of the left hand side is evaluated. Side effects do -not race with the continuation. The spawn is before the function call -and assignment of its return value. The body of the called function -and the assignment do race with the continuation. In C++, destructors -for function arguments also race with the continuation. +after the address of the left hand side is evaluated. Side effects +are not in parallel with the continuation. The spawn is before the +function call and assignment of its return value. The body of the +called function and the assignment are in parallel with the +continuation. In C++, destructors for function arguments are +also in parallel with the continuation. ```cilkc x[i++] = cilk_spawn f(a++, b++, c++); @@ -283,7 +290,8 @@ x[i++] = cilk_spawn f(a++, b++, c++); ``` When `cilk_spawn` appears at the start of a statement the entire -statement is spawned and everything in it races with the continuation. +statement is spawned and everything in it is in parallel with the +continuation. While syntactically valid, code like @@ -301,12 +309,10 @@ nonsense.c:4:18: warning: Failed to emit spawn ``` In this case the compiler is unable to move the use of `i++` into the -spawned expression. +spawned expression because its value is needed immediately. -This syntax may be removed in a future version of OpenCilk and -`cilk_spawn` required to appear at the start of a statement or between -an assignment operator and an immediately following function call or -constructor call. +As noted above, this syntax (spawning a subexpression) may be removed +in a future version of OpenCilk. #### Sync @@ -355,17 +361,20 @@ programs are the same. ### Exceptions -If a spawned function throws an exception, the parent function may -have continued straight line execution past the spawn. The serial -program goes directly to an exception handler. The difference is -observable if the continuation has side effects. +If a spawned function throws an exception, the spawning function may +have continued execution past the `cilk_spawn`. The serial program +goes directly to an exception handler. The difference is observable +if the continuation has side effects. + +At each implicit or explicit sync OpenCilk checks whether any spawned +child threw an exception. If it did, any exception thrown by the +parent is discarded and the exception thrown by the child is handled +as if thrown at the sync. If more than one child throws an exception +the earliest in serial order is kept and the rest discarded. +Discarded exceptions are destructed. -When the parent function executes an implicit or explicit `cilk_sync` -the runtime checks whether the spawned child threw an exception. If -it did, any exception thrown by the parent is discarded and the -exception thrown by the child is handled as if thrown at the sync. -The compiler inserts an implicit sync at the end of a try block if the -try contains a spawn. +The implicit sync at the end of a try block ensures that an exception +thrown by a spawned function will be handled by the correct `catch`. If an exception is thrown from the body of a `cilk_for` statement the current loop iteration is aborted, consistent with the semantics of @@ -373,9 +382,6 @@ current loop iteration is aborted, consistent with the semantics of scheduling. An exception thrown by one iteration of the loop will not prematurely terminate another iteration. -If more than one exception reaches a sync the earliest in serial -order is thrown by the sync. The other exceptions are destructed. - ### Left hand side side effects When a function call is spawned in OpenCilk and the result is @@ -404,6 +410,11 @@ The observable differences are * The loop bound expression may be executed fewer times, likely only once. +* In C++ the comparison operator of the loop condition, +e.g. `operator<`, may not be called. The loop range may be calculated +assuming that the comparison operator is suitable for a random access +iterator. + * In C++, `operator-` may be called to subtract the start value from the loop bound (if the increment is positive) or the loop bound from the initial value (if the increment is negative). @@ -412,13 +423,15 @@ the initial value (if the increment is negative). fewer times than in the serial program. * In C++, `operator+` may be called to add an integer to the starting -value of the control variable. +value of the control variable. The result becomes the value of the +control variable for a loop iteration. -The program is not guaranteed to call `operator+` or `operator-` and -these operators not have side effects. If the loop is not -parallelized it may be executed as written. For example, the compiler -may decide to unroll the loop instead. If a `cilk_for` loop is -compiled to a serial loop the compiler will emit a warning. +If the loop is not parallelized it may be executed as written. The +C++ operators mentioned above should not have side effects. + +If a `cilk_for` loop is compiled to a serial loop the compiler will +emit a warning. If the loop is unrolled it will silently use the +serial behavior. The control variable must not wrap around, even if the control variable is an unsigned integer with well defined semantics. As @@ -445,12 +458,19 @@ gives each thread running in parallel a separate copy of the variable and merges the values as necessary. The local copy of the variable is called a _view_. +A hyperobject type is distinct from a view type. When hyperobjects +are accessed using the hyperobject type, those accesses are not +data races because statements that actually execute in parallel +always have separate views. + Because each thread may operate on a separate copy, the address of a -view is not constant. This is also true of thread local variables, -but reducers do not follow the same rules as thread local variables. +view is not constant. (This is also true of thread local variables, +but reducers do not follow the same rules as thread local variables.) +If a pointer or reference to a view is used outside the strand in +which it was created, the behavior is undefined. Also because each thread operates on a separate copy, a view of a -reducer only has a well-defined value in serial code. Within any +reducer only has a predictable value in serial code. Within any strand the value does not change except due to that strand's actions. At a strand boundary the value of a view may change even if its address does not. @@ -465,13 +485,13 @@ A monoid is a combination of a data type (the view type), an _identity_ value, and a binary associative operation. If either operand of the associative operation is the identity value, the result is the other operand. For example, `(double, 0.0, +)` is a monoid -(ignoring non-finite values). In a reducer type the identity value is -provided by a function; the function takes a pointer to a view (cast -to `void *`) and should store the identity value there. The reduction -operation takes two pointers to views (cast to `void *`) and should -combine the two views and deposit the result in the first view. The -first view is often called the _left_ view and program execution is -considered to go left to right. +(provided that values remain finite). In a reducer type the identity +value is provided by a function; the function takes a pointer to a +view (cast to `void *`) and should store the identity value there. +The reduction operation takes two pointers to views (cast to `void *`) +and should combine the two views and deposit the result in the first +view. The first view is often called the _left_ view and program +execution is considered to go left to right. If the view type is a C++ object with a non-trivial constructor the identity function should use placement new to construct a new view. @@ -509,14 +529,14 @@ contiguous subset of prior modifications performed in the serial order of the program. The subset may be empty. When all spawns since the initialization of the variable have been -synced, the variable has the serially correct value. - +synced, the variable has the same value as in serial code. #### Handles -Taking the address of a reducer gives a pointer to the current view. -To pass a reducer by reference to reducer-aware code, use the -function `__builtin_addressof`. +Taking the address of a reducer gives a pointer to the current view +which is not valid outside of the current strand. To pass a reducer +by reference to reducer-aware code, use the function +`__builtin_addressof`. ```cilkc extern void f_reducer(double reducer(zero, add) *);