Skip to content

Latest commit

 

History

History
3495 lines (2929 loc) · 117 KB

README.org

File metadata and controls

3495 lines (2929 loc) · 117 KB

Extending a Language — Writing Powerful Macros in Scheme

The canonical HTML-formatted version of this document can be found at https://mnieper.github.io/scheme-macros/README.html. The interactive Org document is hosted at https://github.com/mnieper/scheme-macros.

Preface

This document is an introduction to writing powerful macros in Scheme. It was initially written on the occasion of a tutorial I give at the BOB2023 Konferenz in Berlin on 17 March 2023.

The macro facility, especially its built-in hygiene, is one of the fundamental pillars of the Scheme programming language. While more complicated than the simple token-replacing macros of other languages like C, Scheme macros can be written in a way that make them robust and so that the abstractions they offer seamless blend into the language and cannot be distinguished from syntactic forms built into the language. It is often felt that this expressiveness makes writing Scheme macros more complicated (even something of a black art) than writing C or Common Lisp macros, for example. One goal of this tutorial is to convince the audience otherwise.

While the Scheme macro facility has always been avant-garde (and this is one of the reasons why Scheme was chosen as the implementation language for this tutorial), a lot of what is said here also applies to languages that provide corresponding features. It is also a appeal to language designers that languages should include a macro facility as Scheme does, as this allows for small language cores and enables the user to provide their own syntactic abstractions.

Another reason why the Scheme language is used in this tutorial is that it has an exceptionally clear semantics, is a compact language, and is easy to learn.

The document can both be read as is or used interactively in an Emacs session. In the following section, a possible setup is described.

Prerequisites

Chez Scheme

We need a Scheme implementation. This tutorial assumes Chez Scheme, which is one of the most mature, standard-compliant Scheme implementations. You can get Chez Scheme from its homepage. On Debian-based GNU/Linux system like Ubuntu, it is prepackaged.

Emacs

We will use GNU Emacs as our development environment, which has great tooling for Scheme. The typical GNU/Linux system ships with it.

For GNU Emacs < 28, enable the NonGNU Emacs Lisp Package Archive in your init file or customize the variable package-archives.

In any case, enable the NonGNU-devel ELPA Packages.

Org

Org Mode is a GNU Emacs major mode that allows to document, edit and execute source code.

The current versions of Org packaged with Emacs hide Scheme evaluation errors. This is fixed in the version in Org’s git repository, for which Org provides installation instructions.

Familiarize yourself with how one works with source code in an Org document, especially how to edit and execute code.

Geiser

Geiser is a GNU Emacs package that allows to runs Scheme processes in GNU Emacs, and which is used by Org’s Babel. To install it, it is enough to install the package geiser-chez using GNU Emacs’ package menu. We need the most recent development version.

Paredit

Paredit, a tool for parenthetical editing in Emacs makes working with Scheme code a lot more pleasant. Like Geiser, it can be installed through GNU Emacs’ package manager.

Initialization

After the GNU Emacs packages have been installed, we want to customize them for our needs. The following should go into your init file unless you want to execute the following code every time you start GNU Emacs.

(require 'compile)
(autoload 'enable-paredit-mode "paredit" "Turn on pseudo-structural editing of Lisp code" t)
(add-hook 'scheme-mode-hook #'enable-paredit-mode)
(show-paren-mode t)
(setq auto-mode-alist (cons '("\\.ss" . scheme-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.sls" . scheme-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.sps" . scheme-mode) auto-mode-alist))
(add-to-list 'compilation-error-regexp-alist
             '("^\\(Exception\\|Warning\\).*: .* \\(line \\([0-9]+\\), char \\([0-9]+\\) of \\(.*\\)\\)" 5 3 4 nil 2))
(setq geiser-default-implementation 'chez)
(org-babel-do-load-languages
 'org-babel-load-languages
 '((scheme . t)))
(setq org-confirm-babel-evaluate nil)

The Scheme programming language

Scheme is programming language of the Lisp family. Its defining properties are its uniform parenthesized syntax (inherited from Lisp), first-class procedures and continuations, lexical scoping, dynamic typing, proper tail calls and hygienic macros. It is primarly a functional programming language but allows many other programming paradigms.

The Scheme programming language was developed in the 1970s by Guy L. Steele and Gerald Jay Sussman. Since then it has been refined and further developed through a series of de facto standards called the Revisedn Report(s) on the Algorithmic Language Scheme (R/n/RS). The two current standards are R6RS (2007) and R7RS-small (2013). Despite the versioning and the timeline, R6RS is the more detailed, more advanced and more modern standard[fn:1].

In this tutorial, we work with the macro facility of R6RS, which is far more powerful than the one of R7RS-small, and also discuss some proposed or implemented extensions. Such extensions to the Scheme programming language are often proposed, discussed and implemented using the Scheme Requests for Implementation process, where everyone can submit a SRFI extending the Scheme programming language. Whenever we speak of the Scheme language in this text, we default to the R6RS dialect.

For practical programming, one needs, of course, an implementation. Scheme is possibly the programming language with the highest number of implementations. The R6RS language has some very high-quality implementations, including Chez Scheme, GNU Guile, Loko Scheme, and Racket, so for any application area, there will be a suitable Scheme system.

Some simple macros

Let us call a combination an expression in Scheme of the form

(operator operand ...)

An example is given by the following expression evaluating to the answer of life:

(* 21 2)
42

Such a combination is usually evaluated by evaluating the operator and the operands in some unspecific order and by then calling the procedure resulting from the operator evaluation with arguments resulting from the operand evaluations.

Scheme, however, also possesses special forms, which do not follow this evaluation strategy. An example is given by the conditional if.

(if (number? 2)
    'ok
    (/ 1 0))
ok

If the conditional were a normal combination, the operands, and (/ 1 0) in particular, would have been evaluated first (and unconditionally). Scheme recognizes special forms through the operator in first position, namely if it is a keyword (a special type of identifier). The Scheme macro facility allows the programmer to define their own keywords.

Incrementing a variable

Let us ignore for a moment that mutation is frowned upon in functional programming and let us assume that we have to frequently increase the value of variables in our program. Given a variable x, this is done in Scheme through the following expression:

(set! x (+ x 1))

That the variable x is repeated in this expression is unpleasant (and may be considered a violation of the DRY principle), so we want an operator akin to C’s pre/post-increment operator. Unfortunately, Scheme does not provide such an operator, but, fortunately, it doesn’t have to because we can build one ourself.

Our first attempt could be to write a procedure (the primary means of abstraction in functional programming languages)[fn:2]:

(define incr!
  (lambda (x)
    (set! x (+ x 1))))

This attempt, however, is failed:

(define x 1)
(incr! x)
x
1

The reason that it doesn’t work — the variable’s value is still 1 and not 2 — is that (incr! x) is a normal combination as introduced earlier. As the arguments are evaluated first and the procedure is called with their values, in this example, incr! is called with the argument 1. This is then bound to a new variable x locally to incr!. It is this variable, which is increased by 1 and not the top-level variable.

The solution is, of course, to define incr! not as a procedure[fn:3] but as a keyword. In the Scheme programming language, the define-syntax keyword can be used for it:

(define-syntax incr!
  (syntax-rules ()
    ((incr! x)
     (set! x (+ x 1)))))

This definition says that incr! is defined to be a new keyword, implemented as a macro. The syntax-rules line shall be viewed as boilerplate for the moment (and we will come back to it later). Important are the next two lines. The form (incr! x) is a pattern saying that the macro matches against a use of the form (keyword form) (where keyword is necessarily incr!). When the macro is used, the pattern variable x is bound to the form. The form (set! x (+ x 1)) is a template. When the macro is used, the pattern variables in the template are replaced with the forms they are bound to and the substituted template is then used in place of the macro.

In the following example, (incr! y) is effectively substituted by (set! y (+ y 1)), so we have achieved what we wanted[fn:4]:

(define y 10)
(incr! y)
y
11

As a side note, we see from the discussion that set! is another keyword (like if, it cannot be a procedure for the same reasons why our attempt to write incr! as a procedure doesn’t work).

As any other identifier in Scheme, the identifier set! can also be rebound as in the following example:

(let ([set! (lambda (x y) (+ x y))])
  (define x 1)
  (set! x 2))
3

In the body of the let form, set! has lost its usual meaning and is bound to a procedure adding its two arguments. It is most interesting to see what happens when we use our incr! macro, which refers to set!, in the body of the let form:

(let ([set! (lambda (x y) (/ 1 0))])
  (define x 1)
  (incr! x)
  x)
2

This example yields the correct result 2, although calling set! within the let body would raise an exception. The reason for this is the already mentioned hygiene of Scheme macros. The identifier set! in the output of the incr! macro didn’t occur in its input but came from the macro definition. Scheme macro hygiene now ensures that it still refers to the lexical binding it had where it occured in the program source. Note that the C preprocessor — as an example for a very simple, if not primitive macro facility — wouldn’t have ensured it. Whether a C macro works correctly or not often depends on the lexical environment of the macro use site.

We say that hygienic Scheme macros are referentially transparent. This is already known from procedures in functional programming languages and lexical scoping:

(define f
  (let ([x 1])
    (lambda () x)))

(list (f)
      (let ([x 2])
        (f)))
(1 1)

Wherever the procedure f is called, it always evaluates to 1.

We finish this subsection with another example of hygiene:

(let ([set! 2])
  (incr! set!)
  set!)
3

The result, which is the increment of the original value of the variable set! by one, can again be explained by hygiene and by distinguishing the identifier set! that appears in the macro use and the same-named identifier set! appearing in the macro source. Without distinguishing both, the macro use (incr! set!) is transcribed to (set! set! (+ set! 1)). In this transcription, the first set! originates from the macro transformer and thus still refers to the lexical binding it had at that place. The other two occurrences of set! are copies from the macro input and thus refer to the lexical binding of set! as a let-bound variable.

A tracing let

Simple loops are often written using the named let form as in the following example:

(define fact
  (lambda (n)
    (let f ([n n] [a 1])
      (if (zero? n)
          a
          (f (- n 1) (* a n))))))

In order to facilitate debugging, let us define a version of the named let form that prints the arguments with which the loop recursion is entered and with which it is exited[fn:5]. As let is a special form, this has to be a special form as well, so let us write our second macro:

;; A form of a named let that prints information about each recursive
;; call.
(define-syntax trace-let
  (syntax-rules ()
    [(trace-let name ([var expr] ...) body1 ... body2)
     (let f ([depth 0] [var expr] ...)
       (define name
         (lambda (var ...)
           (f (+ depth 1) var ...)))
       (indent depth)
       (display "(")
       (display 'name)
       (begin
         (display " ")
         (display var))
       ...
       (display ")")
       (newline)
       (call-with-values
           (lambda ()
             body1 ... body2)
         (lambda val*
           (indent depth)
           (fold-left
            (lambda (sep val)
              (display sep)
              (display val)
              " ")
            "" val*)
           (newline)
           (apply values val*))))]))

;; Helper procedure referenced by the macro output of the macro above.
(define indent
  (let ([pattern "| "])
    (lambda (depth)
      (do ([i 0 (+ i 1)])
          ((> i depth))
        (display (string-ref pattern (mod i 2)))))))

In this macro, the pattern is given by (trace-let name ([var expr] ...) body1 ... body2), while the template makes up the bulk of the macro. Already in the pattern, we see a new syntax, the ellipsis .... It means that the subpattern preceding it may appear repeated zero or more times in the input. When such a subpattern is matched, the contained pattern variables represent lists of forms.

In the template, the ellipsis means to repeat the preceding subtemplate as many times as the pattern variables contained in it represent forms. For this to work, every such subtemplate has to contain at least one pattern variable, obviously, and all pattern variables contained in it have to represent lists of forms of the same length.

Note the occurrence of begin in the macro. Normally, in a procedure body, (begin expression ...) is equivalent to the list of expressions, here, however, we have to use it. The reason is that following ellipsis refers the immediately preceding subtemplate, so it is crucial that the two display commands (which we both want to repeated once per variable) appear in a single form.

When we run the following test, we see the given result printed.

(define fact
  (lambda (n)
    (trace-let f ([n n])
      (if (zero? n)
          1
          (* (f (- n 1)) n)))))
(fact 3)
|(f 3)
| (f 2)
| |(f 1)
| | (f 0)
| | 1
| |1
| 2
|6

We can demonstrate another facet of hygiene with this particular macro. In the macro template, which is part of the macro’s source, the identifier f is introduced and is bound by let appearing next to in the source. In the particular use of the macro above, the pattern variable name represents another identifier name f, namely the identifier with that name that appears in the macro use. Although f coming from the macro use is bound in the macro output within the scope of the binding of f coming from the macro text, it does not shadow the other f as this would be a violation of hygiene. Instead, the identifier f coming from the macro text is renamed by the Scheme macro expander, at least conceptually (as it isn’t inserted as a free identifier, the precise name obviously doesn’t matter).

The ellipsis can also be used to turn our incr! macro into one that accepts more than one variable to increment:

(define-syntax incr!
  (syntax-rules ()
    ((incr! x ...)
     (begin
       (set! x (+ x 1))
       ...))))

Let us briefly test our new extended macro:

(define x 10)
(define y 20)
(incr! x y)
(list x y)
(11 21)

The role of begin in the macro definition of the extended incr! differs from the role in our previous use of begin. Here it is used to solve the problem that the template that prescribes the macro output has to be a single form.

One can also write the multi-variable incr! macro without the ellipsis by letting the macro expand into itself. This is not necessarily how one would do it, but here it serves as a demonstration for further macro techniques:

(define-syntax incr!
  (syntax-rules ()
    ((incr!)
     (values))
    ((incr! x . x*)
     (begin
       (set! x (+ x 1))
       (incr! . x*)))))

First of all, this is our first macro with two transcription rules, where each rule consists of a pattern and of a template. The pattern of the first rule is (incr!), the pattern of the second rule is (incr! x . x*). Scheme’s macro expander tries to match the macro input against the patterns in the order in which the patterns appear in the syntax-rules form.

The second new thing is a a pattern of the form (incr! x . x*), which matches an (improper) list of at least two elements, the first being the macro keyword and the second one being bound to the pattern variable x. The rest arguments are bound as an (improper) list to the pattern variable x*.

Finally, this example demonstrates a recursive macro, that is a macro that transforms the input into an instance of itself. As long as the output of a macro use involves a new macro use (possibly with the same keyword), the Scheme expander continues with transcribing the macro.

Let us not forget to test the new version of the macro:

(define x 100)
(define y 200)
(incr! x y)
(list x y)
(101 201)

Accessing vector locations through variables

A vector in Scheme is a collection of locations in the store that can be linearly addressed. A new vector can be allocated with the vector procedure:

(define v (vector 1 2 3))
v
#(1 2 3)

Vector elements can be retrieved using vector-ref and mutated using vector-set!:

(vector-ref v 2)
3
(vector-set! v 1 4)
v
#(1 4 3)

Assume that we want to use our incr! macro to increase the value of one vector element. As incr! expects a variable as its argument, we have make the locations associated to a vector accessible as if they were backed up by a variable. Another feature of the (R6RS) macro system comes to our rescue:

(define-syntax v1
  (identifier-syntax
   [v1 (vector-ref v 1)]
   [(set! v1 expr) (vector-set! v 1 expr)]))
(incr! v1)
v
#(1 6 3)

This macro isn’t written with syntax-rules but uses identifier-syntax. This is used to declare a keyword, v1 in our case, that is transcribed differently, depending on whether it appears in the form v1 or in the form (set! v1 expr) in the source code.

To access the zeroth or the second element of the vector v, we could define identifier macros v0 and v2 similar to v1 but this would mean mostly duplicating code and violating the DRY principle. A better approach is to use the Scheme macro system once more. We define a macro that, when used, defines a customized macro[fn:6]:

(define-syntax define-vector-reference
  (syntax-rules ()
    [(define-vector-reference var vec-expr idx-expr)
     (begin
       (define vec vec-expr)
       (define idx idx-expr)
       (define-syntax var
         (identifier-syntax
          [var (vector-ref vec idx)]
          [(set! var expr) (vector-set! vec idx expr)])))]))

We can now use this macro as follows:

(define-vector-reference initial-element v 0)
(incr! initial-element)
v
#(2 6 3)

Note that the arguments vec-expr and idx-expr can stand for arbitrary expressions. We evaluate these expressions once and store their values in the variables vec and idx (which will be suitably renamed by the macro expander so that they won’t clash with user defined identifiers with the same name). If we didn’t do this but used vec-expr and idx-expr everywhere in place where vec and idx appeared in the defined macro, the vector and the index expressions would be evaluated every time, the vector reference variable would be accessed.

Syntax objects

The Scheme reports define hygiene and referential transparency for macros as follows:

  • If a macro transformer inserts a binding for an identifier (variable or keyword) not appearing in the macro use, the identifier is in effect rename throughout its scope to avoid conflicts with other identifiers.
  • If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that may surround the use of the macro.

The examples of the previous section make it hopefully a bit clear what is meant by these two points. Nevertheless, one may think that there still must be some magic at work and that it will be impossible to prove anything about these macros. The purpose of this section is to disassemble everything and to explain what is going on under the hood.

Identifiers

The Lisp languages, and thus Scheme as well, are homoiconic programming languages, which means that the program’s internal representation is a datum of the language. In first approximation, the internal representation of a Scheme expression (as of a Scheme program) is a Scheme datum value. For example, the program (expression)

(let ([x 1])
  (+ x 2))

is represented by a list whose first element is the symbol let, whose second element is a list of a list with two elements and whose third element is a list of the three data +, x, and 2.

Due to existence of hygienic macros we have to amend this traditional picture. Consider the following example.

(let ([set! 10])
  (incr! set!)
  set!)

To evaluate the let expression, the macro use of incr! has to be expanded first. After the expansion, the expression would look like

(let ([set! 10])
  (set! set! (+ set! 1))
  set!)

if Scheme expressions were represented by Scheme datum values and within, identifiers were represented by symbols. It is obvious that this cannot be how the Scheme expander works because there would be no way to tell which copy of the symbol set! refers to which binding. The point is that identifiers cannot be represented by symbols, which only have a symbolic name. Instead, to an identifier both a symbolic name and a lexical context are associated. When the binding of an identifier is looked up, it is looked up in the lexical context associated with it.

In Scheme, symbols are first-class values. The can be created using the syntax (quote name), which can be abbreviated to 'name:

'red
red

The same is true for identifiers. They are created just like symbols but use the syntax (syntax identifier), which can be abbreviated to #'identifier, instead:

#'x
#<syntax x>

(The format of the output, #<syntax x>, is implementation-specific, because identifiers are not Scheme datum values and thus have no standardized or faithful written representation.)

Evaluating of the form (syntax x) (or #'x) means the following for the Scheme system: construct and return an identifier with the symbolic name x and with the lexical context at the place of the x appearing in the syntax form. We have to be aware of that the term identifier can be used in two (slightly) different contexts: When we refer to set! as an identifier in the example above, we speak about a token being part of the code. When we refer to the expression #'x evaluating to an identifier, we speak about a value of the language. The expression #'x contains an identifier in the first sense (speaking about the language) and evaluates to an identifier (as a value of the language).

The procedure syntax->datum can be used to convert an identifier to a symbol, namely its underlying symbolic name:

(syntax->datum #'x)
x

There are no standard procedures that allow us to look up the binding of an identifier, but we can compare identifiers. Scheme defines two equivalence relations, realized by the predicates bound-identifier=? and free-identifier=?. Two identifiers are ”bound-identifier=?” if they are interchangeable when they appear bound in the output of a macro. Two identifiers are ”free-identifier=?” if they are interchangeable when they appear free in the output of a macro. Neither equivalence implies the other. It will become clearer in the course of this tutorial what this means, but some experiments will already give some understanding:

(list (bound-identifier=? #'x #'x) (bound-identifier=? #'x #'y))
(#t #f)

The two identifiers to which the two evaluations of #'x in the first argument to list evaluate are therefore ”bound-identifier=?” while the differently named identifiers #'x and #'y (more precisely: the identifiers returned by these expressions) are not. It is tempting to say that the two (or three) instances of #'x evaluate to the same identifier, but for this to make sense, some equivalence relation would have had to be fixed earlier.

Let us now consider two simple examples for free-identifier=?:

(let ([x 1])
  (free-identifier=? #'x #'x))
#t

If the identifiers to both instances of #'x evaluate were inserted in the code as free identifiers they both would refer to the variable binding of the identifier x introduced by let.

The second example is a bit more interesting:

(let ([x 1]
      [y 1])
  (free-identifier=? #'x #'y))
#f

The answer is #f (for false) because although the values of the two variables x and y are both initialized to 1 they are bound to different locations in the store (which can be exhibited by mutating one of the two variables.

So far, in all examples bound-identifier=? seems to give the same result as free-identifier=?. That this is not true is shown in the next example.

(let ([x 1])
  (define outer-x #'x)
  (let ([x 2])
    (define inner-x #'x)
    (list (bound-identifier=? outer-x inner-x)
          (free-identifier=? outer-x inner-x))))
(#t #f)

Inserting inner-x as a free identifier would not be equivalent to inserting outer-x because the former would refer to the binding of the variable with value 2 and the latter to the binding of the variable with value 1. Thus identifiers that are ”bound-identifier=?” are not necessarily ”free-identifier”. We hope that the connection of free-identifier=? to the second hygiene condition, the one about inserting free references to an identifier, is apparent.

Again so far, it seems that identifiers are ”bound-identifier=?” if and only if they have the same symbolic name. One implication is correct, namely that identifiers that are interchangeable as bound identifiers must have the same symbolic name, but the other implication is not. To show this, we have to employ a macro:

(let ([x 1])
  (let-syntax
      ([outer-x (identifier-syntax #'x)])
    (define inner-x #'x)
    (list (bound-identifier=? outer-x inner-x)
          (free-identifier=? outer-x inner-x))))
(#f #t)

Two remarks about the example code are in order before we discuss the result. The binding form let-syntax is to let as define-syntax is to define; in other words, it allows us to locally bind keywords to macro (transformers). Furthermore, we employ a short form of identifier-syntax here, which defines no set! semantics but just replaces an occurrence of the keyword outer-x with #'x.

Both the identifier x in the definition of the macro outer-x and the identifier x in the definition of the variable inner-x refer to the binding of x introduced by the outer let, which explains that the values of outer-x and inner-x are ”free-identifier=?”. But they are not ”bound-identifier=?”, so this example shows that identifiers that ”free-identifier=?” need not necessarily be ”bound-identifier=?”.

The reason why they cannot be ”bound-identifier=?” is that the first hygiene condition about inserting bindings for an identifier would be violated otherwise. Consider the following example:

(let-syntax
    ([add1
      (syntax-rules ()
        [(add1 y)
         (let ([x 1])
           (+ x y))])])
  (let ([x 2])
    (add1 x)))
3

The identifier x appearing in the macro template is inserted as a bound identifier in the macro output and thus is in effect renamed to avoid conflict with the identifier x appearing in the macro use. Renaming means that the two identifiers named x cannot be ”bound-identifier=?” because they would otherwise be interchangeable as bound identifiers.

Scheme implements this hygiene condition by assigning to identifiers besides their symbolic name and their lexical context another property, namely their historic context (or just history)[fn:7]. The history of an identifier is the information when the identifier was first introduced in the program. All identifiers in the program source have the same history — they were already there when the program was started. An identifier introduced by a macro transformation (as part of its output) has a different history than identifiers that were already present in the program source. Identifiers introduced by different macro transformations have different histories and all identifiers introduced by the same macro transformation have the same history.

Let us take another view at this example:

(let ([x 1])
  (let-syntax
      ([outer-x (identifier-syntax #'x)])
    (define inner-x #'x)
    (list (bound-identifier=? outer-x inner-x)
          (free-identifier=? outer-x inner-x))))

The identifier x appears three times in the source. All three identifiers have the same history. When the macro outer-x is expanded, the identifier x is introduced in the macro output (as part of the expression #'x) and this particular identifier was not part of the macro input, so the introduced identifier x has a different history than the identifier to which inner-x is bound.

We are now in a situation to give alternative definitions for bound-identifier=? and free-identifier=?: Two identifiers are ”bound-identifier=?” if they have the same symbolic name and the same history. Two identifiers are ”free-identifier=?” if they refer to the same binding in their respective lexical contexts. (An unbound identifier is, by definition, ”free-identifier=?” to another identifier if the other identifier is also unbound and has the same symbolic name.)

Scheme also allows to fudge identifiers. The procedure datum->syntax can turn a symbol into an identifier with that symbolic name. For that, the user has to provide a lexical context and a history. This is done by giving a “template” identifier from which the context is taken.

(let ([x 1])
  (define outer-x #'x)
  (let ([x 2])
    (define outer (datum->syntax outer-x 'x))
    (list (bound-identifier=? outer-x outer)
          (free-identifier=? outer-x outer))))
(#t #t)

In this example, the identifier outer is an identifier with the symbolic name x and with the context as if it was introduced where x appears in the definition of outer-x.

In the following example, the fudged identifier with the symbolic name y has the same history as the identifier x appearing the macro use of as-y, and thus the same history as the identifier y appearing in the call to bound-identifier=?.

(let-syntax
    ([as-y
      (syntax-rules ()
        [(as-y x) (datum->syntax #'x 'y)])])
  (bound-identifier=? #'y (as-y x)))
#t

Constructing syntax objects

In the previous section we learned that Scheme code cannot be represented by a Scheme datum value (a Scheme value that has a written representation like a list, a number, or a symbol), at least not during the expansion process, as identifiers cannot be represented by symbols.

The objects that do represent Scheme forms are called syntax objects. The basic idea is that a syntax object is like a datum value but with identifiers instead of symbols. So a list of identifiers or a vector of a number and an identifier, or a single string or identifier are all syntax objects. Moreover, there can be a wrap around a nonidentifier syntax object.

Formally, syntax objects can inductively be defined as follows:

  • A nonpair, nonvector, or nonsymbol value is a syntax object.
  • A pair of syntax objects is a syntax object.
  • A vector of syntax objects is a syntax object.
  • An identifier is a syntax object.
  • A wrapped nonpair, nonvector, or nonsymbol value is a syntax object.
  • A wrapped pair or vector of syntax objects is a syntax object.

To each syntax object corresponds a (datum) value by stripping all wraps and converting all identifiers to their symbolic names. The Scheme procedure that does this conversion is syntax->datum. We have already seen it converting identifiers to symbols. It is also used in effect by the quote special form: When Scheme evaluates an expression like (quote (1 2 foo)), the (internal) procedure responsible for expanding or evaluating this expression will receive a syntax object whose underlying datum value is (1 2 foo) and will evaluate to this underlying value.

We can construct the syntax object in the above example as a Scheme value:

(list 1 2 #'foo)
(1 2 #<syntax foo>)

It is a syntax object because it is a list of syntax objects (and Scheme lists are built from pairs and the empty list) and it has the expected corresponding (datum) value:

(syntax->datum (list 1 2 #'foo))
(1 2 foo)

The predicate identifier? is a Scheme procedure that can be used to test whether a syntax object is an identifier or not:

(list (identifier? 1) (identifier? (list #'x)) (identifier? #'x))
(#f #f #t)

In the previous section, we saw how to use syntax keyword (abbreviated by #') can be used to create identifiers. In fact, the argument to the syntax keyword does not have to be symbol but can be any datum, so a syntax expression can be used to build more complicated syntax objects:

(syntax (1 2 foo))
#<syntax (1 2 foo)>

As the result shows, this is a wrapped syntax object, namely a wrapped list (of syntax objects). The Scheme system uses the wrap to attach source location information to the syntax object (facilitating debugging), and the expander makes use of the fact that syntax objects can be opaque (wrapped) to provide optimal algorithmic complexity for the expansion process.

Whether wrapped or not, we can apply syntax->datum on this syntax object:

(syntax->datum #'(1 2 foo))
(1 2 foo)

Here, we used again the abbreviation #' for syntax.

Destructing syntax objects

The syntax object returned by #'(1 2 foo) cannot be destructed using list procedures like car and cdr although it represents a list as it is wrapped. Scheme offers a special form, syntax-case to destruct syntax objects. A syntax-case form contains clauses, each consisting of a pattern of the form we already saw in connection with syntax-rules and an expression. An input syntax object is matched against the patterns in order and the expression corresponding to the first pattern that matches is evaluated:

(syntax-case #'(1 2 foo) ()
  [(a b) 'case-1]
  [(a b (c d)) 'case-2]
  [(2 b c) 'case-3]
  [(a b c d e ...) 'case-4]
  [(a b c) 'case-5]
  [x 'case-6])
case-5

(The empty list () appearing in the second argument of syntax-case will be explained soon and plays the same role as the empty list we saw in our syntax-rules examples.)

The pattern of the last clause would have also matched but the matching ends as soon as a matching clause (the fifth in this example) is found. (The system will raise an exception if no match can be found.)

Let us try to distinguish the syntax objects returned by #'(1 2 foo) and #'(1 2 bar).

(syntax-case #'(1 2 foo) ()
  [(a b bar) 'bar]
  [(a b foo) 'foo])
bar

That we don’t get the expected (or hoped for) result is because syntax-case (as syntax-rules) does treat every identifier appearing in a pattern as a pattern variable by default. Thus, in the first pattern, bar is not matched against foo but bar is bound to foo. We can change this behavior by adding the identifiers that we want to match literally to the list that appeared as the empty list so far:

(syntax-case #'(1 2 foo) (bar foo)
  [(a b bar) 'bar]
  [(a b foo) 'foo])
foo

The equivalence predicate that syntax-case uses to compare an input identifier against a literal identifier is free-identifier=?. In the case of the example, both bar and foo are unbound and we recall that unbound identifiers are ”free-identifier=?” if and only if they have the same symbolic name. The next example demonstrates how the binding comes into play:

(let ([foo 1])
  (define input
    (let ([foo 2])
      #'(1 2 foo)))
  (syntax-case input (foo)
    [(a b foo) 'match]
    [(a b c) 'no-match]))
no-match

We have now the tool to dispatch on the structure of a syntax object, but what we also need is a way to get hold of the individual components of a syntax object. This is done with pattern variables (a, b, and c in the example above). We said above that a pattern variable is bound to the syntax object it is matched against. This scope of this binding is the expression following the pattern in the syntax-case clause. Just a keywords are not ordinary variables, pattern variables are neither. They may only be referenced inside the syntax form as in the following example:

(syntax-case #'(1 x) ()
  [(1 y) #'y])
#<syntax x>

Here, #'y does not resolve to the identifier y (because y is bound to a pattern variable) but to the syntax object to which y is bound, which is the value of #'(1 x).

Mixing of pattern variables and non-pattern variable identifiers in the same syntax expression also works:

(syntax-case #'(1 x) ()
  [(1 a) #'(b a)])
(#<syntax b> #<syntax x>)

As one can see, the result is not a wrapped syntax object but a list of two syntax objects. This is no coincidence. When a pattern variable appears in a syntax template, all the substructure in which the pattern variable is replaced by what it was matched against, is unwrapped, so ordinary list and vector accessor procedures can be used. The following is another example:

(syntax-case #'(1 2 3) ()
  [(1 x ...) #'(a x ... b c)])
(#<syntax a> #<syntax 2> #<syntax 3> . #<syntax (b c)>)

As can be seen, the pattern variable x is matched against the list of syntax objects consisting of 2 and 3. Up to the part (and including it) where x is substituted, the syntax object is unwrapped. The ellipsis in the syntax template works as the ellipsis in the syntax-rules templates (we will see below why this is no coincidence).

In particular, we can use list procedures to reference individual elements or to calculate lengthes:

(define syntax-length
  (lambda (stx)
    (syntax-case stx ()
      [(x ...)
       (length #'(x ...))])))

(syntax-length #'(a b c d))
4

We have already seen how literals in syntax-case can be used for literal matching of identifiers (using free-identifier=?). Otherwise, syntax-case only matches per structure. If we want to match structural element using special rules, fenders can be used as in the following example:

(syntax-case #'(define 3 (+ 1 2)) ()
  [(define id expr)
   (identifier? #'id)
   'ok]
  [_ 'error])
error

The fender is the expression between the pattern and the final expression in the first clause of syntax-rules. If present, it is evaluated when the pattern matches. If the evaluation yields #f, this clause is skipped and matching is continued with the next clause. The scope of the pattern variables of a pattern includes a fender if present.

The (sub)pattern _ matches anything (like a pattern variable) but does not bind a pattern variable.

Syntax-case macros

Macro transformers

We started this tutorial with writing macros and discussing a number of some example of such macros. Somehow, we seemed to have deviated by talking about identifiers, syntax objects, and their construction and destruction. In this section we will see how syntax-case and syntax can be employed to write powerful macros. In fact, they are the building blocks of macro transformers.

To make use of the forms syntax-case and syntax, we have to understand what actually goes into a define-syntax definition. The general form of a syntax definition is (define-syntax identifier transformer-expression) (the analogous holds for bindings in a let-syntax expression). When the Scheme expanders encounters a define-syntax definition, it evaluates the transformer-expression, which is an ordinary Scheme expression. It’s value must be a macro transformer, which is then bound to the keyword given by identifier.

Now, a macro transformer is just an ordinary Scheme procedure taking one argument, a syntax object, and returning one value, another syntax object. The input syntax object represents the macro use form, the output syntax object represents the transcribed macro use. Let us check this:

(let ([x 41])
  (define-syntax always-42
    (lambda (stx)
      (syntax (+ 1 x))))

  (+ always-42
     (always-42 400)))
84

Independently of how the macro is used — that is, independently of what stx is —, the macro transformer of this example always returns the expression (+ 1 x) (evaluating to 42). Note that we could have equivalently written #'(+ 1 x) instead of syntax.

If we want to make the macro output dependent on the macro input, we have to employ syntax-case to destruct the input syntax object. Let us first define a macro transformer that uses syntax-case:

(define f
  (lambda (stx)
    (syntax-case stx ()
      [(_ x ...)
       (list #'quote (append (list (length #'(x ...))) #'(x ...)))])))

We can test this procedure as any other procedure:

(f #'(q a b c))
(#<syntax quote> (3 #<syntax a> #<syntax b> #<syntax c>))

The output is thus a syntax object of the form (quote (n x ...)) where the x denote the arguments following the head element of the syntax object argument to f and n is the number of these arguments. The expression that yields the syntax object in the procedure f above is not very readable. Because of that, Scheme also offer a quasisyntax form (abbreviated with #`), which is to syntax as quasiquote is to quote:

(define f
  (lambda (stx)
    (syntax-case stx ()
      [(_ x ...)
       #`(quote (#,(length #'(x ...)) x ...))])))

Even more readable becomes the expression if pattern variables are used, which can not only be bound by syntax-case but also by with-syntax, which is for pattern variables what let is for ordinary variables:

(define f
  (lambda (stx)
    (syntax-case stx ()
      [(_ x ...)
       (with-syntax ([n (length #'(x ...))])
         #'(quote (n x ...)))])))

In fact, with-syntax is not a primitive form but can be expressed in terms of syntax-case:

(define-syntax with-syntax
  (syntax-rules ()
    [(with-syntax ([p e0] ...) e1 ... e2)
     (syntax-case (list e0 ...) ()
       [(p ...)
        (let ()
          e1 ... e2)])]))

In whatever way we write the procedure f, we can then use it to define an actual macro:

(define-syntax quote/length f)

Of course, instead of naming the macro transformer and just referencing to it in the right hand side of define-syntax, we could have equally well written the transformer procedure expression inline. The advantage of the former is that the transformer procedure can then be easily tested using the usual tools, the advantage of the latter is that it is more compact[fn:8].

Let’s test our macro:

(quote/length a b c)
(3 a b c)

It should be noted that the calculation of the length, 3 in this case, happens at expand-time (so in the compiler if we use one). In fact, a macro can be understood as a compiler for a sublanguage and that is be plugged into the Scheme system to extend the language.

We now have amassed enough knowledge to give the definition of syntax-rules. As the right hand side of define-syntax expects a procedure expression, a syntax-rules form must evaluate to a procedure. And, in fact, syntax-rules can be defined as follows:

(define-syntax syntax-rules
  (lambda (stx)
    (syntax-case stx ()
      [(_ (lit ...) [(k . p) t] ...)
       (for-all identifier? #'(lit ... k ...))
       #'(lambda (x)
           (syntax-case x (lit ...)
             [(_ . p) #'t] ...))])))

The syntax expression following the fender of the syntax-case clause shows that a syntax-rules expression evaluates to a procedure. There is another instance of syntax (#') within the template of the outer syntax expression. This is because procedure to which a syntax-rules expression evaluates outputs itself a syntax object.

One more thing is remarkable: Each syntax-rules pattern is of the form (k . p); more precisely, it can only match (syntax) pairs whose head element is an identifier, that is macro uses of exactly this form. Notably, the pattern variable k isn’t referenced in the output. This is because a syntax-rules pattern ignores the pattern variable that corresponds to the keyword position. In particular, the following two syntax definitions are equivalent:

(define-syntax incr!
  (syntax-rules ()
    [(incr! x) (set! x (+ x 1))]))
(define-syntax incr!
  (syntax-rules ()
    [(_ x) (set! x (+ x 1))]))

This is in contrast to a syntax-case expression, which doesn’t tread the keyword position in a special way. This is the reason why we often use _ at the keyword position in syntax-case expressions for macro transformers.

It is a good time to finally give the definition of our initial incr! macro in terms of syntax-case:

(define-syntax incr!
  (lambda (stx)
    (syntax-case stx ()
      [(_ x)
       (identifier? #'x)
       #'(set! x (+ x 1))])))

It is instructive to go through the above definition of the syntax-rules keyword and see how the earlier definition using syntax-rules expands into the later definition using syntax-case. The only line that is not present with the syntax-rules definition is the fender (identifier #'x), which has no equivalent for syntax-rules. This fender ensures that a syntax error is reported early if the user tries to use this macro in a non-sensible form like in (incr! 2).

A fluid let

We should finally move past the incr! macro. We already remarked that mutation (which incr! does) is frowned upon. To be more precise, what makes problems is mutation with unlimited extent. Mutation with dynamic extent, on the other hand, can be used to implement dynamically scoped variables, which are also called fluids and do not have all the problems associated with unbound mutation.

It is probably best to explain it with an example. For this, we define a new binding-like construct, named ~fluid-let~[fn:9]:

(define-syntax fluid-let
  (lambda (stx)
    (syntax-case stx ()
      [(_ [(x e)] b1 ... b2)
       (identifier? #'x)
       #'(let ([y e])
           (define swap!
             (lambda ()
               (let ([t x])
                 (set! x y)
                 (set! y t))))
           (dynamic-wind
             swap!
             (lambda ()
               b1 ... b2)
             swap!))])))

Let us briefly check the output of the following expression:

(let ([x 1])
  (define show
    (lambda ()
      (display x)
      (newline)))
  (show)
  (fluid-let ([x 2])
    (show))
  (show))

The dynamic-wind procedure takes three thunks (procedures that take no arguments) as arguments. When dynamic-wind is called, it calls the three thunks in that order and finally returns the results of the call to the second, the middle, thunk. The reason why we didn’t write (begin (swap!) ((lambda () b1 ... b2)) (swap!)) is that dynamic-wind arranges for calling the enter and exit thunk even in the presence of non-local control flow[fn:10].

The variable y is used by the macro to store the old value of var in it before the latter is mutated. As y does not come from the macro input, it won’t conflict with the definition of an identifier named y surrounding the use of fluid-let. Likewise, the temporary variable t won’t conflict regardless of what variable the pattern variable x stands for.

Our fluid-let can “bind” exactly one variable. If we want to change more than value, say two, we have to rewrite our macro:

(define-syntax fluid-let
  (lambda (stx)
    (syntax-case stx ()
      [(_ [(x1 e1) (x2 e2)] b1 ... b2)
       (for-all identifier? #'(x1 x2))
       #'(let ([y1 e1] [y2 e2])
           (define swap!
             (lambda ()
               (let ([t x1])
                 (set! x1 y1)
                 (set! y1 t))
               (let ([t x2])
                 (set! x2 y2)
                 (set! y2 t))))
           (dynamic-wind
             swap!
             (lambda ()
               b1 ... b2)
             swap!))])))
(let ([a 1] [b 2])
  (define show
    (lambda ()
      (display (list a b))
      (newline)))
  (show)
  (fluid-let ([a 3] [b 4])
    (show))
  (show))

This is, of course, a non-solution because we still can’t pass three variables and have also lost the ability of just passing one variable. Possibly, the ellipsis can help as in the following attempt:

(define-syntax fluid-let
  (lambda (stx)
    (syntax-case stx ()
      [(_ [(x e) ...] b1 ... b2)
       (for-all identifier? #'(x ...))
       #'(let ([y e] ...)
           (define swap!
             (lambda ()
               (let ([t x])
                 (set! x y)
                 (set! y t))
              ...))
           (dynamic-wind
             swap!
             (lambda ()
               b1 ... b2)
             swap!))])))

However, this won’t quite work. The problem is that there is only one identifier y introduced and not one identifier per each fluid variable. The canonical solution Scheme offers here is the generate-temporaries procedure, which takes a list or a syntax object representing a list and returns a list of as many identifiers, each with its unique history so that they won’t be pairwise ”bound-identifier=?” or to any other identifier:

(with-syntax ([(x y) (generate-temporaries '(a b))])
  (list (identifier? #'x)
        (identifier? #'y)
        (bound-identifier=? #'x #'y)))

Here, the list (a b) has two elements, so generate-temporaries creates two identifiers, which we bound using with-syntax to the pattern variables x and y.

With this tool at our disposal, we can finally write a version of fluid-let that works with an arbitrary number of variables:

(define-syntax fluid-let
  (lambda (stx)
    (syntax-case stx ()
      [(_ [(x e) ...] b1 ... b2)
       (for-all identifier? #'(x ...))
       (with-syntax
           ([(y ...) (generate-temporaries #'(x ...))])
         #'(let ([y e] ...)
             (define swap!
               (lambda ()
                 (let ([t x])
                   (set! x y)
                   (set! y t))
                 ...))
             (dynamic-wind
               swap!
               (lambda ()
                 b1 ... b2)
               swap!)))])))
(let ([a 1] [b 2])
  (define show
    (lambda ()
      (display (list a b))
      (newline)))
  (show)
  (fluid-let ([a 3] [b 4])
    (show))
  (show))

Implementing a variant type in Scheme

It is time to demonstrate more involved macros to highlight some features of the Scheme macro system and how it leads to extensibility of the language.

To have some use case at hand, let us assume that we deal with binary trees that carry a value at each (internal) node and at each leaf. We can use the Scheme record facility to provide the necessary data types, implementing an abstract tree interface:

(define-record-type node (fields left value right))
(define-record-type leaf (fields value))

We can build a tree using the constructors defined by the above record-type definitions:

(define t
  (make-node
   (make-node (make-leaf 4) 2 (make-leaf 1))
   8
   (make-leaf -1)))

While creating a tree by hand in this way is doable, it is not very neat. It would be nice if we could give the tree above in simple, parenthesized syntax as follows:

(((4)
  2
  (1))
 8
 (-1))

In other words, (internal) nodes are given by lists of three elements, and leafs by lists of one element. To achieve this, one might want to write a procedure as the following one:

(define make-tree
  (lambda (e)
    (define n (length e))
    (cond
     [(= n 3)
      (make-node (make-tree (car e))
                 (cadr e)
                 (make-tree (caddr e)))]
     [(= n 1) (make-leaf (car e))]
     [else
      (assert #f)])))

We can then build our tree t as follows:

(define t
  (make-tree
   '(((4)
      2
      (1))
     8
     (-1))))

The quote (necessary so that Scheme does not try to evaluate our tree description as an expression) is not optimal, but we can write a macro that inserts the quote for us:

(define-syntax tree
  (syntax-rules ()
    [(tree datum)
     (make-tree 'datum)]))

With this macro, we can now build our tree with the following syntax:

(define t
  (tree
   (((4)
      2
      (1))
     8
     (-1))))

While this is optimal as far as the flexibility in syntax is concerned, the solution is inferior to our original approach of building the tree by calling the constructors make-node and make-leaf by hand. The point is that the procedure make-tree, which is called in the output of the macro tree, walks the tree expression at run time and so is not as efficient as the original approach. What we want is that the tree expression is analyzed during compile time. As macros are nothing but small compilers, it is no surprise that a macro will help. All we have to do is to rewrite the tree macro that it doesn’t output a call to make-tree but that it directly outputs calls to make-node and make-leaf:

(define-syntax tree
  (syntax-rules ()
    [(tree (left value right))
     (make-node (tree left) value (tree right))]
    [(tree (value))
     (make-leaf value)]))

The tree can be built as before:

(define t
  (tree
   (((4)
      2
      (1))
     8
     (-1))))

Now let us do something with the tree. For example, we can ask for the sum of all values in the tree nodes, internal and leaf nodes:

(define tree-accumulate
   (lambda (t)
     (cond
      [(node? t)
       (+ (tree-accumulate (node-left t))
          (node-value t)
          (tree-accumulate (node-right t)))]
      [(leaf? t)
       (leaf-value t)]
      [else (assert #f)])))

We can test the procedure with our example tree:

(tree-accumulate t)

We have used Scheme’s general cond expression to dispatch on the two possible types of trees. Compared to pattern matchers of other languages, this also does not deserve the attribute neat. What we would like is to have a syntax so that we can write tree-accumulate as follows:

(define tree-accumulate
  (lambda (t)
    (tree-case t
     [(node left value right)
      (+ (tree-accumulate left)
         value
         (tree-accumulate right))]
     [(leaf value)
      value])))

Obviously, this calls for another macro!

(define-syntax tree-case
  (lambda (stx)
    (define parse-clause
      (lambda (cl)
        (syntax-case cl (node leaf)
          [[(node left value right) e1 ... e2]
           #'[(node? tmp)
              (let ([left (node-left tmp)]
                    [value (node-value tmp)]
                    [right (node-right tmp)])
                e1 ... e2)]]
          [[(leaf value) e1 ... e2]
           #'[(leaf? tmp)
              (let ([value (leaf-value tmp)])
                e1 ... e2)]]
          [_
           (syntax-violation 'tree-case "invalid clause syntax" stx cl)])))
    (syntax-case stx ()
      [(_ tree-expr clause ...)
       (with-syntax ([(clause ...)
                      (map parse-clause #'(clause ...))])
         #'(let ([tmp tree-expr])
             (unless (tree? tmp)
               (assertion-violation 'tree-case "invalid tree argument" tmp))
             (cond
              clause ...
              [else
               (assertion-violation 'tree-case "unhandled tree argument" tmp)])))]
      [_
       (syntax-violation 'tree-case "invalid syntax" stx)])))

(define tree?
  (lambda (obj)
    (or (node? obj)
        (leaf? obj))))

In the above macro, we used the procedure syntax-violation defined by Scheme to report syntax errors when the macro is misused. It is always a good idea to report syntax violations as early and as precise as possible.

The two identifiers, the Scheme reports speak of auxiliary syntax, node and leaf are matched using free-identifier=?. Both of these identifiers are bound (they were bound by our record-type definitions of the node and the leaf type). Thus, when the macro is used in the form

(tree-case t
 [(n l v r) ---]
 ---)

the binding of the identifier n is compared to the binding of the identifier node (in the lexical context of the macro transformer).

In general, it is a good idea to use bound identifiers as literals in syntax-case (or syntax-rules). Even if the code surrounding a macro use of, say, tree-case binds node to something else, the library system of Scheme allows to import another identifier that is bound to the original binding of node so the tree-case macro can still be used with the other identifier in place. This does not work when free-identifier=? compares unbound identifiers by name.

With our tree-case macro, we can finally define and test our newly written tree-accumulate:

(define tree-accumulate
  (lambda (t)
    (tree-case t
     [(node left value right)
      (+ (tree-accumulate left)
         value
         (tree-accumulate right))]
     [(leaf value)
      value])))

(tree-accumulate t)
14

We have solved our binary-tree-use case but we can still do better. Assume that the problem we have to solve the next day does not involve binary trees but abstract syntax trees of a programming language, for which we have to write an interpreter or compiler. Instead of (internal) nodes and leaves, we would have, say, expressions, statements, definitions, programs. When walking an abstract syntax tree, one has to dispatch again on the possible types of an abstract syntax tree. So, instead of tree-case we want ast-case. We could copy and suitably modify the tree-case macro but this would violate DRY.

The answer is, instead, to write a macro, once and for all, that generates macros like tree-case. Here it is:

(define-syntax define-destructor
  (lambda (stx)
    (syntax-case stx ()
      [(_ name [keyword predicate-expr accessor-expr ...] ...)
       (for-all identifier? #'(keyword ...))
       (with-syntax
           ([(pred-id ...)
             (generate-temporaries #'(predicate-expr ...))]
            [((acc-id ...) ...)
             (map generate-temporaries #'((accessor-expr ...) ...))]
            [((var ...) ...)
             (map generate-temporaries #'((accessor-expr ...) ...))])
         #'(begin
             (define pred-id predicate-expr) ...
             (define acc-id accessor-expr) ... ...
             (define-syntax name
               (lambda (stx)
                 (define parse-clause
                   (lambda (cl)
                     (syntax-case cl (keyword ...)
                       [[(keyword var ...) e1 (... ...) e2]
                        #'[(pred-id tmp)
                           (let ([var (acc-id tmp)] ...)
                             e1 (... ...) e2)]]
                       ...
                       [_
                        (syntax-violation 'name "invalid clause syntax" stx cl)])))
                 (syntax-case stx ()
                   [(_ expr clause (... ...))
                    (with-syntax ([(clause (... ...))
                                   (map parse-clause #'(clause (... ...)))])
                      #'(let ([tmp expr])
                          (cond
                           clause (... ...)
                           [else
                            (assertion-violation 'name "unhandled argument" tmp)])))]
                   [_
                    (syntax-violation 'name "invalid syntax" stx)])))))]
      [_
       (syntax-violation 'define-destructor "invalid syntax" stx)])))

A few explanations are in order. First of all, we see nested ellipses in the code above. Using syntax-case we can match a syntax object of the form ((a b c) (1 2)) against a pattern of the form ((x ...) ...). The pattern variable x will then represent a list of two lists; the first list will contain the elements a, b, and c, the second list will contain the elements 1 and 2. In syntax templates, the pattern variable can be used as long as least two ellipses follow. For example, the template ((x ...) ...) gives back ((a b c) (1 2)), while (x ... ...) gives (a b c 1 2).

We also have to explain the occurrences of (... ...). In the definition of define-destructor, the outer syntax form has to evaluate into a syntax object that contains ellipses, so we have to keep the outer syntax form from interpreting these ellipses that should be in the output syntax object, in other words, we have to escape them. The Scheme way of doing this, is to write (... x). If syntax sees a sub-template like this one, it processes x and returns the result but gives the ellipsis in x the status of an ordinary identifier.

Coming back to our tree example, the define-destructor syntax can be used as follows:

(define-destructor tree-case
  [node node? node-left node-value node-right]
  [leaf leaf? leaf-value])

Now we can redefine tree-accumulate and test it:

(define tree-accumulate
  (lambda (t)
    (tree-case t
     [(node left value right)
      (+ (tree-accumulate left)
         value
         (tree-accumulate right))]
     [(leaf value)
      value])))

(tree-accumulate t)
14

Breaking hygiene

Scheme macros written with syntax-rules are hygienic. This is also true by default for macros written with the more general syntax-case~/~syntax combination. Hygiene — although it may take some time to understand — is one of the selling points of Scheme macros and one (of many) reasons why Scheme macros are so more powerful than, say, macros in C or even in Common Lisp.

Sometimes, however, we want to break hygiene explicitely. We give a number of concrete examples:

A classical loop macro

A classical example for this is a loop macro that provides a loop that evaluates the code enclosed in it repeatedly until a corresponding break command is evaluated. A typical use looks like the following (again, not necessarily a good example for functional programming!):

(let ([i 0])
  (loop
    (when (= i 5)
      (break))
    (display i)
    (newline)
    (incr! i)))

Our first attempt to implement the loop construct with a macro is the following syntax definition:

(define-syntax loop
  (lambda (stx)
    (syntax-case stx ()
      [(_ e ...)
       #'(call-with-current-continuation
          (lambda (break)
            (let f ()
              e ...
              (f))))])))

Here we make use of the fact that Scheme has first-class continuations. The call to call-with-current-continuation captures the continuation of the named let expression.

Nevertheless, our example code that is supposed to print the numbers zero to four won’t work with this version of the loop keyword. Our Scheme system will tell us that break is an undefined identifier (or refer to a predefined top-level identifier with this name). Although, it won’t say that, but hygiene is to be blamed for it.

As the identifier break bound in the output of loop does not come from the input of the loop form, it has a different history than the identifier break appearing in the body of the loop form. Identifiers with different histories do not shadow each other, so the break in the loop body cannot reference the binding of break coming from the template in the loop macro.

One way to solve it is to provide break as an explicit argument to the loop macro (we put a star to the name to mark the new syntax):

(define-syntax loop*
  (lambda (stx)
    (syntax-case stx ()
      [(_ break e ...)
       #'(call-with-current-continuation
          (lambda (break)
            (let f ()
              e ...
              (f))))])))

With this modification, everything works:

(let ([i 0])
  (loop* break
    (when (= i 5)
      (break))
    (display i)
    (newline)
    (incr! i)))

This solution has one more advantage besides that it actually works — it allows us to specify the name we want to use for the expression breaking out of the loop. For example, it allows us to easily nest two of the loops:

(let ([i 0])
  (loop* break-outer
    (loop* break-inner
      (when (= i 5)
        (break-outer))
      (when (= i 2)
        (break-inner))
      (display i)
      (newline)
      (incr! i))
    (display "-\n")
    (incr! i)))

(Note how hygiene again helps to make this possible. Both macro instances bind the identifier f, but the occurrences of f correspond to different histories so they don’t shadow each other.)

While the version with an explicit break argument to the loop* macro has its advantages, sometimes we still may want the more terse syntax with an implicit break parameter. To make our original version of loop work, we must not introduce a break identifier with a different history. Instead, we must output break as if it appeared as an argument to the macro use. In other words, we have to forge an identifier and datum->syntax was the tool to do this:

(define-syntax loop
  (lambda (stx)
    (syntax-case stx ()
      [(k e ...)
       (with-syntax ([break (datum->syntax #'k 'break)])
         #'(call-with-current-continuation
            (lambda (break)
              (let f ()
                e ...
                (f)))))])))

Here, for the first time, we make use of the keyword identifier of the macro use, which we bound to the pattern variable k. The call to datum->syntax then returns an identifier named break as if it appears where the macro use keyword appears, that is with the same history and the same lexical context.

Let us test our example with this new version!

(let ([i 0])
  (loop
    (when (= i 5)
      (break))
    (display i)
    (newline)
    (incr! i)))

Convenience syntax to bind implicit identifiers

Above, we used the datum->syntax procedure together with with-syntax explicitly to inject an identifier as if it appeared at the macro use site into the template. Chez Scheme provides a syntactic form with-implicit that abstracts from this low-level approach. While the with-implicit form is non-standard, thanks to Scheme’s macro system we can define it in any standard system:

(define-syntax with-implicit
  (lambda (x)
    (syntax-case x ()
      [(_ (k x ...) e1 ... e2)
       (for-all identifier? #'(k x ...))
       #'(with-syntax ([x (datum->syntax #'k 'x)] ...)
           e1 ... e2)]
      [_
       (syntax-violation 'with-implicit "invalid syntax" x)])))

With it, we can rewrite our loop macro as follows:

(define-syntax loop
  (lambda (stx)
    (syntax-case stx ()
      [(k e ...)
       (with-implicit (k break)
         #'(call-with-current-continuation
            (lambda (break)
              (let f ()
                e ...
                (f)))))])))

Definitions that make the bound name accessible

For those who didn’t like the loop example because it is mostly useful in imperative programming, we have provided another example, that we will describe in this subsection.

User-friendly procedures check their arguments so that errors are reported early:

(define reverse-append
  (lambda (head tail)
    (unless (list? head)
      (assertion-violation 'reverse-append "invalid list argument" head))
    (let f ([head head] [tail tail])
      (cond
       [(null? head) tail]
       [(pair? head)
        (f (cdr head) (cons (car head) tail))]
       [else
        (assertion-violation 'reverse-append "concurrent modification detected")]))))

Just a brief test:

(reverse-append '(1 2 3) '(4 5 6))
(3 2 1 4 5 6)

The Scheme procedure that is used here to report an error is assertion-violation. Its first formal argument is called who and (if not #f) should be a string or symbol naming the procedure where the error occurs.

One can make the point that the code above again violates some instance of the DRY principle because we had to type the name of the procedure, reverse-append in this case, three times. The following, non-hygienic, macro, which can also be found in the source code of Chez Scheme and in one of Racket’s libraries, helps:

(define-syntax define/who
  (lambda (x)
    (define out
      (lambda (k f e)
        (with-syntax ([k k] [f f] [e e])
          (with-implicit (k who)
            #'(define f
                (let ((who 'f)) e))))))
    (syntax-case x ()
      [(k (f . u) e1 ... e2)
       (identifier? #'f)
       (out #'k #'f #'(lambda u e1 ... e2))]
      [(k f e)
       (identifier? #'f)
       (out #'k #'f #'e)]
      [_
       (syntax-violation 'define/who "invalid syntax" x)])))

With define/who we can define a variable (or procedure) as with define. Moreover, the identifier who (with a lexical and historic context as the keyword define/who in the macro use) is bound to the name of the variable (or procedure) being defined.

With define/who, the definition of reverse-append looks like:

(define/who reverse-append
  (lambda (head tail)
    (unless (list? head)
      (assertion-violation 'who "invalid list argument" head))
    (let f ([head head] [tail tail])
      (cond
       [(null? head) tail]
       [(pair? head)
        (f (cdr head) (cons (car head) tail))]
       [else
        (assertion-violation 'who "concurrent modification detected")]))))

We can compare who with the predefined identifier __func__ that can be found in the C99 standard. With Scheme and its macro system, however, this becomes a library feature and need not be a language feature.

Definitions of constants

In Scheme, we can use define to, well, define a variable. This variable can be set! by other parts of the code, possibly accidentally. So we may want to define a variable-like object that behaves more like a constant. One option is to use the identifier-syntax form, we already saw at the beginning of the tutorial:

(define-syntax pi (identifier-syntax 3.14159))
pi
3.14159

If we tried to mutate the “variable” pi now, the Scheme system would raise an exception.

This is a good point to give the actual definition of identifier-syntax. Like syntax-rules, it can be defined by the more primitive forms syntax-case and syntax:

(define-syntax identifier-syntax
  (syntax-rules (set!)
    [(_ e)
     (lambda (x)
       (syntax-case x ()
         [id (identifier? #’id) #’e]
         [(_ x (... ...)) #’(e x (... ...))]))]
    [(_ (id exp1) ((set! var val) exp2))
     (and (identifier? #’id) (identifier? #’var))
     (make-variable-transformer
      (lambda (x)
        (syntax-case x (set!)
          [(set! var val) #’exp2]
          [(id x (... ...)) #’(exp1 x (... ...))]
          [id (identifier? #’id) #’exp1])))]))

This definition can be found exactly in this form in the R6RS, describing the Scheme language and its standard libraries. Again, we see the occurrence of the quoted ellipses (... ...), which is necessary because of the nesting of templates (remember that the right hand side of syntax-rules rules are syntax templates).

We also note the two patterns within the first syntax-case. The first pattern is of the form id where id is an identifier, the second pattern is of the form (_ x ...) where x is an arbitrary form. The first pattern will match if the identifier, pi in our example, is not used in head-position of a combination; the second pattern will match if pi is used in the form (pi x ...). The latter does not make sense for pi, but we see that identifier-syntax allows us to define procedure-like identifiers that behave differently when the directly applied or when referenced.

In the second part of the definition of identifier-syntax, the procedure make-variable-transformer is used. This turns a macro transformer given by a procedure (mapping syntax objects to syntax objects) into a variable-transformer. A variable-transformer does the same mapping between syntax objects but will also be called by the expander when it processes an expression of the form (set! id form) where id is the keyword bound to a variable-transformer.

Now, 3.14159 is not the most precise value of pi. We get a value whose precision is adapted to the precision of Schemes inexact real numbers by using the formula (* 2 (atan 1 0)). This directly leads to the following attempt of redefining pi:

(define-syntax pi (identifier-syntax (* 2 (atan 1 0))))
pi
3.141592653589793

This is not a good solution, though. Every time, we reference pi, Scheme replaces it by the expression (* 2 (atan 1 0)), so unless we can rely on a sufficiently optimizing compiler, we will have pi recalculated every time we use it. A better approach is to calculate the value once, store this is a variable and expand into a reference to it:

(define-syntax define-constant
  (lambda (stx)
    (syntax-case stx ()
      [(_ id expr)
       (identifier? #'id)
       #'(begin
           (define val expr)
           (define-syntax id (identifier-syntax val)))])))

Again, we have macro-defining macro. Thanks to hygiene, the variable val cannot be accessed outside the macro. Let’s test our new macro:

(define-constant pi (* 2 (atan 1 0)))
pi
3.141592653589793

The number pi is just a single constant, so let us now turn to a use case where not only more than one constant but many constants are needed. For concreteness, let us assume that want to develop a library handling ELF files. We could start with defining all the magic constants:

(define-constant et-none #x00)
(define-constant et-rel  #x01)
(define-constant et-exec #x02)
...
(define-constant pt-null #x00000000)
(define-constant pt-load #x00000001)
...

This is not much different from what we would do in C. However, it has the same problem. It pollutes our top-level namespace. In Scheme, this is mitigated a bit due to the library system (which allows one to confine these constants in a module); nevertheless a library that exports myriads of identifiers (and where the exact set of identifiers may depend on the version of the ELF format) is not a good idea.

A way out is — you will already have guessed it — the Scheme macro system. We will implement a macro define-constants that can be used as follows:

(define-constants elf-constant
  (et-none #x00)
  (et-rel  #x01)
  ...)

(elf-constant et-none) ; => 0x01

Moreover, we want this definition to bind the identifier elf-constants (note the “s”) to a procedure returning an association list of the form

((et-none . #x00)
 (et-rel  . #x01)
 ...)

For this, we first need a procedure that takes an identifier like elf-constant and constructs a new identifier, elf-constants in this case, from it:

(define/who construct-name
  (lambda (k . arg*)
    (unless (identifier? k)
      (assertion-violation who "invalid template identifier argument" k))
    (datum->syntax
     k
     (string->symbol
      (apply string-append
             (map (lambda (x)
                    (cond
                     [(string? x)
                      x]
                     [(identifier? x)
                      (symbol->string (syntax->datum x))]
                     [else
                      (assertion-violation who "invalid string or identifier argument" x)]))
                  arg*))))))

This procedure takes a template identifier k and a sequence of strings and identifiers to forge and return an identifier with the same lexical and historic context as k and whose name is given by the concatenation of the sequence of strings and identifier (names).

With it, we can define our define-constants easily:

(define-syntax define-constants
  (lambda (x)
    (syntax-case x ()
      [(_ t (n c) ...)
       (and (identifier? #'t)
            (for-all identifier? #'(n ...)))
       (with-syntax ([ts (construct-name #'t #'t "s")]
                     [(e ...) (generate-temporaries #'(c ...))])
         #'(begin
             (define e c)
             ...
             (define-syntax t
               (lambda (x)
                 (syntax-case x ()
                   [(_ y)
                    (identifier? #'y)
                    (cond
                     [(assq (syntax->datum #'y) (list (cons 'n #'e) ...)) => cdr]
                     [else
                      (syntax-violation 't "unknown constant" x #'y)])]
                   [_
                    (syntax-violation 't "invalid syntax" x)])))
             (define ts
               (lambda ()
                 '([n . c] ...)))))])))

The macro is programmed so that the lookup of the constant happens at expand-time and not at run-time. Let us test it:

(define-constants color
  (salmon          #xFA8072)
  (light-green     #x90EE90)
  (cornflower-blue #x6495ED))

(colors)
((salmon . 16416882) (light-green . 9498256) (cornflower-blue . 6591981))
(color cornflower-blue)
6591981

A pitfall

Although the macros loop and define/who defined above are non-hygienic, they are only so in a controlled sense. They behave as if the user has provided an explicit break or who identifier, so do not really differ from a hygienic macro, which makes reasoning about them still easy.

Nevertheless, there is still a potential pitfall, we are going to explain now. Consider the following definition of the macro define-logging/who:

(define-syntax define-logging/who
  (syntax-rules ()
    [(define-logging/who (name . formals) body1 ... body2)
     (define/who (name . formals)
       (display "log: entering procedure ")
       (display who)
       (newline)
       body1 ... body2)]))

The macro defines a “logging procedure”, a procedure that prints a log message when it is called:

(define-logging/who (hello)
  (display "Hello!\n"))
(hello)
log: entering procedure hello
Hello!

The define-logging/who macro’s output is an instance of the define/who macro from earlier. The define-logging/who macro makes use of the implicitly defined who identifier.

We named the macro define-logging/who with the suffix /who because the idea is that the user of the define-logging/who macro can also refer to the procedure’s name through the implicitly bound identifier who. This, however, is not the case as the following test shows:

(let ([who 'outer])
  (define-logging/who (return-who)
    who)
  (return-who))
outer

The reason is that historic context of the identifier who is the same as the historic context of the identifier define/who that occurs in the syntax template in the definition of define-logging/who. The historic context of the identifier define/who, which does not come from the macro input in the use of define-logging/who, is therefore not the same as those of the identifiers who appearing in the source of our test.

This problem cannot be easily mitigated bar explicitly define who a second time with the historic context of the final macro use. One could think a possible solution would be the following rewrite:

(define-syntax define-logging/who
  (lambda (stx)
    (syntax-case stx ()
      [(k (name . formals) body1 ... body2)
       (with-implicit (k define/who who)
         #'(define/who (name . formals)
             (display "log: entering procedure ")
             (display who)
             (newline)
             body1 ... body2))])))

Here, the identifier define/who is put in the macro output of define-logging/who with the same history as the keyword in the macro use of define-logging/who. This will create who with the historic context of the macro use of define-logging/who (and that’s why we had to add who to the list of implicit identifiers as well), but it will only work if define/who is also bound (to the correct macro) at the use site of the macro define-logging/who. Such an assumption, however, should not be made. (In the section after next we are going to finally give a solution that works, but it needs an extension which is not in R6RS.)

Our other example for a macro with implicit (unhygienic) identifiers was define-constant where the plural form of the name of the defined macro was a forged identifier. That identifier’s history, however, was not derived from the macro keyword in the macro use but from the (singular) name of the defined macro. This also helps mitigating the problem of nested macro invocations described above.

Phasing

Phasing is an issue that is not specific to the particular macro system of Scheme or hygienic macros but occurs with procedural macros when the system distinguishes between run-time and expand-time. The latter distinction is important, for example, for the possibility of ahead-of-time compilation. As Scheme allows to evaluate code at run time, which then has to be expanded first, run-time and expand-time can be interleaved. The latter can also be due to the library system of Scheme; a library may need to be run first before another library can be expanded because the macro transformers may reference the code of the first library.

(Relative) phases

When the expressions in a program or library are evaluated, the evaluation happens at a specific relative phase. These relative phases are non-negative integers. The top-level expressions are evaluated at relative phase 0. The right-hand sides of top-level variable definitions are also evaluated at relative phase 0. The right-hand sides of top-level syntax definitions are evaluated at relative phase 1 (which means: “earlier”). The right-hand side of a variable definition appearing within an expression evaluated at relative phase n is also evaluated at relative phase n. The right-hand side of a syntax definition appearing within an expression evaluated at relative phase n is evaluated at relative n + 1.

In other words, define-syntax shifts the phase by one for its right-hand side.

The following code should make this clearer:

(begin
  (define x 4)
  (define-syntax foo
    (lambda (stx)
      (define-syntax bar
        (lambda (stx)
          #'3))
      (+ 1 bar)))
  (+ x foo))

Assume that this expression is evaluated at phase n (0 if appearing at the program top-level). The right-hand side of the definition of x, the reference to x and the use of the macro foo are evaluated at relative phase n. The transformer expression of foo is evaluated at relative phases n + 1, and the transformer expression of bar is evaluated at relative phase n + 2.

Identifier references at different phases

A variable (an identifier bound to a location holding a value) can be referenced by an expression if the expression is evaluated at the same relative phase as the initializing expression of the variable. A keyword (an identifier bound to a macro transformer) can be referenced by an expression if the expression is evaluated at the same or higher relative phase than the phase of the transformer expression of the keyword minus one.

The following code is erroneous, for example:

(let ([counter 0])
  (define-syntax count!
    (lambda (stx)
      (syntax-case stx ()
        [(_)
         (begin
           (set! counter (+ 1 counter))
           #'(values))])))
  (count!)
  counter)

The variable counter can only be referenced at relative phase 0, thus not in the right-hand side of the syntax definition of count!, which is evaluated at relative phase 1. One can understand this restriction as follows: The variable counter only exists at run-time, not at compile-time of the program, but the transformer associated to count! us used at compile-time.

On the other hand, the following example is correct code:

(let-syntax ([foo (lambda (stx) #'1)])
  (define-syntax bar
    (lambda (stx)
      (define-syntax quux
        (lambda (stx)
          foo))
      quux))
  bar)

If a procedure needs to be used in different phases, in Scheme the library system can be used. If a library exports a variable (bound to the procedure) or any other identifier, the identifier can be imported at any relative phase.

Extensions

In this section, we will describe three extensions to Scheme macro’s system that are not yet standardized in one the reports. All three extensions are supported by Chez Scheme, so we can experiment with them.

Aliases

Given an identifier x, we may want to use it under a different name as well. The first attempt of defining an alias for x may look like:

(let ([x 'old])
  (define-syntax y
    (identifier-syntax
     [_ x]
     [(set! _ e) (set! x e)]))
  (set! y 'new)
  x)
new

This solution has no run-time overhead because y is a keyword and not a variable. Whenever we access y, Scheme’s expander rewrites it into an access of x. In a lot of cases, this is all that we need. Still, y is not a true alias to x. This is demonstrated by the following test:

(let ([x 'old])
  (define-syntax y
    (identifier-syntax
     [_ x]
     [(set! _ e) (set! x e)]))
  (free-identifier=? #'x #'y))
#f

The reason that the result of the free-identifier=? test is #f is that x and y are bound differently. The identifier x is bound to a location holding a value (x is a variable); the identifier y is bound to a macro transformer (y is a keyword).

The alias form described in SRFI 212 allows to define true aliases. The syntax is (alias y x), which can be used wherever definitions are allowed. It arranges that y has the same binding as x:

(let ([x 'old])
  (alias y x)
  (set! y 'new)
  (list (free-identifier=? #'x #'y) x))
(#t new)

(The reason why alias is not called define-alias is that it does not define a new binding; it just gives an existing binding (the one of x) a new name (y).)

With the alias form, there is also a general solution to the general problem of nested unhygienic macros that we exhibited with define-logging/who:

(define-syntax define-logging/who
  (lambda (stx)
    (syntax-case stx ()
      [(k (name . formals) body1 ... body2)
       (with-syntax ([(tmp-id) (generate-temporaries #'(tmp))])
         (with-syntax ([local-define/who
                        (construct-name #'k #'tmp-id)])
           (with-implicit (k who)
             #'(begin
                 (alias local-define/who define/who)
                 (local-define/who (name . formals)
                   (display "log: entering procedure ")
                   (display who)
                   (newline)
                   body1 ... body2)))))])))

Here, we generate a fresh identifier and construct from its name an identifier with the context of the keyword of the use of define-logging/who. This identifier is then aliased to define/who but is used instead so that the transformer bound to define/who (and now also bound to define-logging/who forges the who identifier with the right context.

Let us test our new version:

(let ([who 'outer])
  (define-logging/who (return-who)
    who)
  (return-who))
return-who

Syntax parameters

Syntax parameters, which are like aliases also described in a SRFI, namely SRFI 139, provide an alternative to unhygienic macros when implicit macro parameters are needed.[fn:11]

In what follows, we use Chez Scheme’s implementation so that we can readily test our examples.

Chez Scheme defines the fluid-let-syntax form, whose syntax is equivalent to let-syntax, but which is to let-syntax as fluid-let is to let. In other words, it rebinds a keyword for the dynamic extent of the expansion of the body of fluid-let-syntax:

(let-syntax ([x (identifier-syntax 'outer)])
  (define-syntax y (identifier-syntax x))
  (list
   (fluid-let-syntax ([x (identifier-syntax 'inner)])
     y)
   y))
(inner outer)

Compare this to let-syntax:

(let-syntax ([x (identifier-syntax 'outer)])
  (define-syntax y (identifier-syntax x))
  (list
   (let-syntax ([x (identifier-syntax 'inner)])
     y)
   y))
(outer outer)

With the use of syntax parameters (keywords that are rebound by fluid-let-syntax), we can give a definition of our loop macro as a hygienic macro:

(define-syntax break
  (lambda (stx)
    (syntax-violation 'break "invalid use outside of loop form" stx)))
(define-syntax loop
  (syntax-rules ()
    [(loop e ...)
     (call/cc
      (lambda (k)
        (fluid-let-syntax
            ([break (syntax-rules () [(break) (k)])])
          (let f ()
            e ... (f)))))]))

Let us test this new version:

(let ([i 0])
  (loop
    (when (= i 5)
      (break))
    (display i)
    (newline)
    (incr! i)))
0
1
2
3
4

In this example, datum->syntax does not appear so the macro defined above is indeed hygienic. The identifier break was not newly introduced by the macro use but already existed in the lexical context of the macro use.

It should be noted that the new loop macro has a different semantics than the unhygienic loop macro from earlier. In our original loop macro, the expression (break) ends the loop when break is ”bound-identifier=?” to the implicit identifier forged by the first version of the loop macro. In the version with syntax parameters, the expression (break) ends the loop when break is ”free-identifier=?” to the global identifier named break. Which semantics is the better one depends on the use case. On the one hand side, hygienic macros are preferable to unhygienic ones. On the other hand side, syntax parameters have the same problem associated to variables with dynamic scope: Their change of the behavior of code is not lexically confined.

The Chez Scheme User’s Guide contains a very interesting use of syntax parameters, which we want to reproduce here:

(define-syntax define-integrable
  (syntax-rules (lambda)
    [(_ name (lambda formals form1 form2 ...))
     (begin
       (define xname
         (fluid-let-syntax ([name (identifier-syntax xname)])
           (lambda formals form1 form2 ...)))
       (define-syntax name
         (lambda (x)
           (syntax-case x ()
             [_ (identifier? x) #'xname]
             [(_ arg (... ...))
              #'((fluid-let-syntax ([name (identifier-syntax xname)])
                   (lambda formals form1 form2 ...))
                  arg
                  (... ...))]))))]))

The define-integrable keyword can be used as follows:

(define-integrable count
  (lambda (ls)
    (if (null? ls)
        0
        (+ (car ls) (count (cdr ls))))))

(list (count '(1 2 3))
      (procedure? count))
(6 #t)

This definition binds count to a keyword that behaves like an immutable variable bound to a procedure. However, when count is used in the form (count '(1 2 3)) the procedures body is inlined at the macro use site, allowing for more local optimizations. We chose the example of a recursive procedure because it demonstrates a difficulty: If count in the procedure’s body were also inlined and so on, we would get an infinite macro expansion. Instead, the implementation of define-integrable uses a syntax parameter to rebound count in the procedure body so that it is not further inlined.

Identifier properties

It is often the case that two different macros have to communicate. In its simplest form we already saw it, namely in the case of macros whose behavior depends on the presence of auxiliary syntax (another macro) in their inputs. A typical example is Scheme’s cond keyword (implementable as a macro in terms of if) that uses the else auxiliary syntax.

Such an auxiliary keyword can function as a yes/no flag. Sometimes, however, we may be interested in more than a boolean value. This can be done with identifier properties, which are implemented in Chez Scheme and also described in SRFI 213.

An identifier property are superficially similar to symbol properties of Lisp, but there are important differences making them work with Scheme’s macro system. Identifier properties are associated with an existing binding of an identifier and thus automatically lexically scoped. Each property is keyed by the binding of another identifier, so also property keys are lexically scoped.

The define-property form, which can be used where ever a definition can be used, associates properties with identifiers:

(define add1 (lambda (x) (+ 1 x)))
(define key)
(define-property add1 key #'"value")

If the define-property appears in a context evaluated at relative phase n, the very right-hand side of define-property is evaluated at relative phase n + 1, much like the right-hand side of define-syntax. This means that the property value, "(syntax "value")" in this example, is accessible at expand-time, but not a run-time.

Regardless of the identifier property attached to add1, the identifier is still a variable resolving to a procedure adding one to its argument:

(add1 2)
3

Macro transformers can retrieve the values of identifier properties. If the need to do so, they have to return a procedure instead of a syntax object. The returned procedure must accept one argument, lookup and should return the syntax object which is the result of the transformation. The lookup procedure takes two arguments id and key, which must be bound identifiers, and returns the value of the identifier property associated to id and keyed by key, or #f if there is none:

(let-syntax
    ([x
      (lambda (stx)
        (lambda (lookup)
          (syntax-case stx ()
            [(_ key)
             (or (lookup #'add1 #'key)
                 #'"no-value")])))])
  (define other-key)
  (list (x key) (x other-key)))
("value" "no-value")

While this example theoretically describes how identifier properties work, it doesn’t show their usefulness. A more practical example is given by another loop macro, modeling general for, we are going to present:

(define key)

(define-syntax define-iterator
  (lambda (stx)
    (syntax-case stx ()
      [(define-iterator name parser-expr)
       (identifier? #'name)
       #'(begin
           (define-syntax name
             (lambda (stx)
               (syntax-violation 'name "invalid use of for keyword" stx)))
           (define-property name key
             (let ([parser parser-expr])
               (unless (procedure? parser)
                 (assertion-violation 'define-iterator "invalid parser" parser))
               parser)))])))

(define-syntax for
  (lambda (stx)
    (lambda (lookup)
      (define parse-clause
        (lambda (cl)
          (syntax-case cl ()
            [(formals keyword . arg)
             (identifier? #'keyword)
             (let ([keyword #'keyword])
               (define parser (lookup keyword #'key))
               (unless (procedure? parser)
                 (syntax-violation 'for "invalid for iterator" stx keyword))
               (let-values ([(outer-var* var* loop-var*
                              outer-expr init-expr test-expr loop-expr step-expr)
                             (parser stx cl)])
                 (list outer-var* var* loop-var*
                       outer-expr init-expr test-expr loop-expr step-expr)))])))
      (syntax-case stx ()
        [(_ (clause ...) command ...)
         (with-syntax ([(((outer-var ...) (var ...) (loop-var ...)
                          outer-expr init-expr test-expr loop-expr step-expr) ...)
                        (map parse-clause #'(clause ...))])
           #'(let-values ([(outer-var ...) outer-expr] ...)
               (let-values ([(var ...) init-expr] ...)
                 (let f ([var var] ... ...)
                   (unless (or test-expr ...)
                     (let-values ([(loop-var ...) loop-expr] ...)
                       command ...
                       (let-values ([(var ...) step-expr] ...)
                         (f var ... ...))))))))]))))

(define-iterator in-list
  (lambda (stx cl)
    (syntax-case cl ()
      [(var _ list-expr)
       (identifier? #'var)
       (values #'()
               #'(tmp)
               #'(var)
               #'(values)
               #'list-expr
               #'(null? tmp)
               #'(car tmp)
               #'(cdr tmp))])))

(define-iterator in-range
  (lambda (stx cl)
    (syntax-case cl ()
      [(var _ start-expr end-expr)
       (identifier? #'var)
       (values #'(end)
               #'(i)
               #'(var)
               #'end-expr
               #'start-expr
               #'(>= i end)
               #'i
               #'(+ i 1))])))

The public API for our new loop facility consists of the for keyword and the define-iterator defining keyword. Moreover, we defined two iterator forms, one for going through a list and the other one for going through a numeric range. The loop facility is extensible because the user can define more iterator forms using define-iterator.

A typical use of a for loop can look like the following:

(for ([x in-list '(a b c)]
      [i in-range 0 10])
  (display (list x i))
  (newline))
(a 0)
(b 1)
(c 2)

In a language without a powerful macro system as Scheme possesses it, if it doesn’t ship a suitable looping construct for our needs, we can’t do anything except hoping for a future version of the language that includes more looping constructs. In Scheme, on the other hand, a small and simple core suffices as we can build syntactic abstractions as much as we can build procedural abstractions ourselves. Many advertised “brand new” features of en-vogue or not-so-en-vogue languages may sound like a big yawn to a Schemer.

In our implementation of the for macro above, we used identifier properties on the iterator keywords to communicate with the main for macro. This hints at how we can build powerful, extensible sub-languages into Scheme.

Complex examples

An LR(1) parser generator implemented as a Scheme macro

The power of Scheme’s procedural macros enables us to stay within Scheme and within one Scheme process all the time. Let us consider a parser generator like GNU Bison as a case study. A classical parser generator is a separate executable that takes a grammar (for some formal language) interspersed with semantic actions and outputs source code implementing a parser for that language.

The process is rather fragile as it depends a lot on text substitutions and the parser generator is ignorant about the embedded semantic code in the target language. Moreover, an external tool is needed.

In Scheme, on the other hand, we can write a macro that takes a grammar (described using Scheme’s ordinary lexical syntax) and semantic expressions and replaces it at compile-time with the definition of a parser of this languages, obeying, among other things. the lexical scoping of the embedded semantic expressions.

We have written such a macro that implements an LR(1)-parser generator in roughly 1000 lines to demonstrate a highly non-trivial macro (or sub-language). The source code can be found in the library directory of this tutorial’s GitHub repository.

It is written as an R6RS library, which we can readily import:

(import (languages))

The following gives an example of how a grammar together with semantic actions is defined:

(define-language (make-parser token token?)
  (nonterminals exp term factor)
  (terminals NUM "+" "-" "*" "/" "(" ")")
  (rules
   [(exp -> exp "+" term)
    (+ exp term)]
   [(exp -> exp "-" term)
    (- exp term)]
   [(exp -> term)]
   [(term -> term "*" factor)
    (* term factor)]
   [(term -> term "/" factor)
    (/ term factor)]
   [(term -> factor)]
   [(factor -> NUM)]
   [(factor -> "(" exp ")")
    exp])
  (start exp))

This definition binds make-parser to a thunk (a procedure taking no arguments) that when called returns a parser for the defined language. A parser is a procedure. Calling it with one value, a token, pushes this token in the parser. Calling it with zero values, signals an end of the input and the procedure returns the semantic value of the sentences consisting of the token pushed so far.

A convenience procedure parse is defined that pushes a fixed number of tokens and is used in the following example:

(parse (make-parser)
       (token NUM 3)
       (token "+")
       (token NUM 2)
       (token "*")
       (token NUM 7)
       (token "-")
       (token NUM 1))
16

Exercises

  1. Write a macro push! such that (push! list-variable expression) prepends the value of expression to the list bound to the list-variable.
  2. Write a macro when-all such that (when-all test-expression ... expression) evaluates expression only if all test-expressions evaluate to #f. The macro should short-circuit the evaluation the test-expressions as soon as one evaluates to #f.
  3. Write a macro alist such that (alist key1 value key2 value2 ... ) expands into a literal expression of the form '((key1 value1) (key2 value2) ...).
  4. Let x be a pattern variable that represents a list of syntax objects and y a pattern variable that represents a list of lists of syntax objects. Find out the result of #'(((x y) ...) ...).
  5. Write a macro timestamp such that timestamp expands into a number literal counting the number of uses of timestamp.
  6. Rewrite fluid-let as a recursive macro that does not use generate-temporaries.
  7. Write a procedure symbolic-identifier=? so that (symbolic-identifier=? id1 id2) returns #t if and only if the two identifiers id1 and id2 have the same symbolic name.
  8. Extend the loop macro so that evaluating (continue) skips the rest of the current loop iteration and the loop continues with the next iteration.
  9. Modify the for macro into a functional form supporting user-definable accumulators whose final result are returned by the for expression.
  10. Write a pattern matching Scheme macro.
  11. Make the parser generator more user friendly. It should provide more information at compile-time when shift/reduce and reduce/reduce conflicts are reported and at run-time when syntax errors are reported. Implement operator precedence and associativity rules to handle some shift/reduce and reduce/reduce conflicts automatically.
  12. Write a version of the lex scanner generator as a Scheme macro.

Footnotes

[fn:1]This may change when the R7RS-large standardization effort is finished. Both, R6RS and R7RS-small, are successors (and extensions) to R5RS (1998), but R7RS-small was never meant to be seen in isolation as a successor to R6RS. Time will tell whether the R7RS large language will be able to replace R6RS when it is finally done. It is planned to include the R6RS macro facility and the extensions discussed here in R7RS-large.

[fn:2]While Scheme does not forbid mutation (like ML but unlike Haskell), the pitfalls of impure code are well-understood. Therefore, the names of Scheme procedures and syntax that modifies locations in the store (the Scheme model of the computer’s memory) end with a ! by convention.

[fn:3]More precicely: as a variable holding a procedure value.

[fn:4]When interactively testing our procedures and syntax (keywords), one has to be careful that we have given both the procedure and the keyword the same name. Evaluating one of the definitions will overwrite the meaning of the other one.

[fn:5]Such a form is predefined in Chez Scheme, but is not part of the Scheme standard. In fact, Chez Scheme’s version properly handles tail calls, which our simple version doesn’t.

[fn:6]Note that this cannot be done with C preprocessor macros.

[fn:7]The term “history” of an identifier is not an established one but was invented for this presentation.

[fn:8]When used in programs or libraries, there is one problem in the first approach because of so-called phasing issues. We will come back to this.

[fn:9]In fact, the standard binding constructs in Scheme let, let*, or letrec can also be defined as macro keywords in terms of the more primitive lambda using the macro facility described in this report.

[fn:10]Remember that Scheme has first-class continuation. But this is a topic for a different tutorial.

[fn:11]In-depth advanced information can be found in the paper Keeping it Clean with Syntax Parameters.