Lisp is famous for, and in some sense, defined by (see this), syntactic extension. Syntactic extension is Lisp is a great example of a design trade off - Lisp picks a relatively high level representation of its base syntax (lists of lists and atoms), and then writing syntactic extensions is greatly simplified, because they simply transform that high level representation. Syntax extensions don't have to worry about the nitty-gritty details of turning character streams into some kind of program representation (that is done for you, by the reader), and the code representation is simple enough to modify with the language itself, so extensions are easy to write. The trade off is that you have to stick to Lisp's baseline syntax (unless you get really tricky). What this means, in practice, is that even extended Lisp syntax will still be a list of lists and atoms (and chock full of parentheses, for which the language family is famous).

Despite the simplicity of this scheme, wherein the reader transforms text into a code-representation, which is then transformed by macros, and then finally converted to machine code or bytecode or interpreted, various Lisp dialects have approached the question of syntactic extension differently. Since I've recently gotten interested in macro hygiene, I thought it would be nice to write a series of posts about the different kinds of macro transformations available in various Lisp dialects.
picoLisp
picoLisp is a pretty bizarre Lisp variant which doesn't enjoy wide use, and has a pretty amazing set of philosophical underpinnings. PicoLisp is sufficiently different from other Lisps that I hesitated to cover it here, but it turns out to be extremely useful from a pedagogical point of view. How come? Because the code/data relationship, which is important in all Lisps, plays an even more central role in picoLisp. We are going to end this post talking about hygienic macros in Scheme, and picoLisp, in a sense, represents the polar opposite approach, with other Lisps occupying an arguably uncomfortable or impure middle ground. Code and data are so tightly coupled in picoLisp that the above picture of how syntactic extension works doesn't even really apply.
Code is Data
So what do we really mean when we say code is data? This is a bit fatuous, in a way - code is text, after all. All Lisp variants I know still require the user to write text out into a file, which is then read by the Lisp environment and acted upon. As suggested above, "code is data" really means that code is transformed from text into an intermediate form which is represented in terms of Lisp data types. We can then use the Lisp itself to transform that data or to execute it or interpret it.
What is the simplest thing we can do with Lisp data that may represent
some code? This one is easy: the simplest thing we can do is
nothing. Doing nothing to piece of code/data is called "quotation"
in Lisp. It is represented by preceding an expression with the '
character. Quoted expressions aren't evaluated - when the interpreter
hits one, it just returns the data stored in the quotation. A quoted
expression is read by the reader, but not evaluated. It is just
data.
'a ;-> a
'(this is some data) ;-> (this is some data)
'(if T "True" "False") ;-> (if T "True" "False")
(if T "True" "False") ;-> "True"
(N.B. You could download picoLisp and try these examples.)
Every line above except the last is quoted, and evaluates to just the
quoted data. The last expression isn't quoted, and so it is actually
evaluated by the interpreter. t is the true value, so the
expression evaluates to the first clause, which is the "string"
"True" (technically "True" is a symbol, since picoLisp only has one
type which encompasses both strings and symbols). The opposite of
quotation is evaluation. This is represented as the function eval
in picoLisp:
(eval '(if T "True" "False")) ;-> "True"
A slightly less trivial example:
(setq x 10)
(setq q '(+ x 1))
(eval q) ;-> 11
When eval encounters a symbol, like x, above, it just replaces it
with the current value of that symbol wherever eval is called. So,
if we run the above code and then say:
(let x 0 (eval q)) ;-> 1
We get 1.
This is one of the ways the picoLisp differs from most other Lisps.
This behavior is called "dynamic scope," and while most Lisps (even
those that have this behavior) look upon it as a historical relic
better thrown off or worked around, picoLisp embraces this behavior.
This is because picoLisp is aiming for simplicity in its interpreter,
and nothing is simpler than a symbol just meaning "get what the value
of this symbol is right here, right now." As we'll see, eval in
newer Lisp dialects won't work this way.
Ok, ok, we are sort of drifting. We were talking about Macros. Rather surprisingly, picoLisp's documentation says that:
Yes, there is a macro mechanism in picoLisp, to build and immediately execute a list of expressions. But it is seldom used. Macros are a kludge. Most things where you need macros in other Lisps are directly expressible as functions in picoLisp, which (as opposed to macros) can be applied, passed around, and debugged.
How can this be? Well, functions in picoLisp have the option of not evaluating their arguments. In most languages, function calls proceed by first evaluating all of their arguments, then binding those values to the variable names and then executing the body of the function. This is how it works in picoLisp too, if you say:
(de fun (A)
(print "Inside")
(print A))
(fun (prog (print "Argument evaluated.") 10))
You get:
"Argument evaluated.""Inside" 10-> 10
For non-Lispers, prog is just a form which evaluates each of its
sub-forms in order and returns the last value. We are using it here
to demonstrate when the argument expression is evaluated.
So when a function is called, its arguments are evaluated. This is true in every Lisp I know (except for exotica like Lazy Lisp, in Racket). This property is usually called, by the way, eagerness: the function is eager to know its argument's values before evaluating the body.
Unlike other Lisps, picoLisp lets you define functions which don't evaluate their arguments immediately. This is done by providing a single symbol instead of an argument list:
(de fun2 A
(print "Inside")
(print (eval (car A))))
(fun2 (prog (print "Argument evaluated.") 10))
(car returns the first element of a list).
Results in:
"Inside""Argument evaluated."10-> 10
The argument expression is not evaluated until we (unpack it and then)
call eval on it. Using this feature, we can do something which is
usually impossible in other languages: write if as a function. In
most languages you can't write if in any nice way. In most Lisps,
you've got to write a macro. In picoLisp you can just write if as
an "ordinary" function:
(de my-if A
(if (eval (car A))
(eval (cadr A))
(eval (caddr A))))
(my-if (< 1 0)
(prog (print "True Branch Evaled") T)
(prog (print "False Branch Evaled") NIL))
"False Branch Evaled"-> NIL
Note that "True Branch Evaled" is never printed. The true branch is never evaluated.
(Obviously we have to use the primitive if - the example is meant to
highlight optional evaluation of arguments, not how a primitive like
if makes it into a language).
Because functions can do this kind of thing, you can do some
surprising things with picoLisp, like map if over lists of
branches. I find it astonishing that the picoLisp docs provide this
example and then say picoLisp goes for "The principle of least
astonishment."
Variables, Scope, Eval, Quote and Lambda
If you are used to other programming languages, picoLisp probably seems pretty bizarre. But, if you're a little more open minded, you might think: wait a minute, this optional evaluation of input expressions is actually pretty handy! I can write all my special forms as regular functions, and then they too can be first class values! Why isn't every Lisp like this?
Well, I can't claim to understand the entire historical development of
the Lisp family of languages, but I'm pretty sure that the main reason
other Lisps moved away from this model is that they wanted to support
lexical scope (to be discussed below). The watchword of picoLisp is
simplicity: simplicity is why it is an interpreted language, rather
than a compiled one, and simplicity shaped the design of the
interpreter. Dynamic variable scope is the simplest thing, as far
as an interpreter is concerned, and so picoLisp runs with it. Because
code is always evaluated with respect to a dynamic environment, the
code itself doesn't need to remember anything like "Which x did the
programmer mean in the peice of code (+ x 1)." It is simply assumed
that, by default, x refers to the x in the current dynamic
environment. Under these semantic assumptions, a quotation is just a
lambda expression without any arguments. Eval is just funcall for
functions that take no arguments.
This is, in fact, literally true in picoLisp, even for functions with
arguments.. There is no lambda. Anonymous functions are created with
quoteation.
(mapcar '((X) (+ X 1)) (list 1 2 3 4))
; -> (2 3 4 5)
Because variable binding is assumed to be handled by eval and is
assumed to be related only to where eval is called, not to anything
about where the lambda was defined, code in picoLisp need have no
additional information associated with it. It literally is just data.
In short, buzzword packed language: Restricting intepretation of
variable binding to dynamic scope, picoLisp unifies function
application and regular evaluation, and quote and lambda. With
minimal convention, the distinction between functions and special
forms nearly vanishes!
Lexical Scope?
I keep going on about dynamic scope above, and I even mentioned lexical scope above, but what is lexical scope and how does it impact upon the relationship between function application and evaluation and between quotation and lambda?
We are going to switch to Emacs Lisp for some examples now, because Emacs Lisp supports both lexical and dynamic scope and forces us to say explicitly which scoping rules we want to use. This helps make the discussion of scope clearer.
(If you want to follow along, Download the latest emacs, fire it up, and then press "Alt-x ielm *ENTER*". This will start up an Emacs Lisp read-eval-print loop. There are better ways to interact with the Emacs Lisp interpreter, but this is the most familiar for non-emacs users.)
Emacs Lisp is a relatively old Lisp dialect, and by default it is dynamically scoped. There is a library that comes with Emacs that adds a lot of features from Common Lisp, including simulation of lexical-scope rules, so the first thing you need to do is require this package:
*** Welcome to IELM *** Type (describe-mode) for help.
ELISP> (require 'cl)
A quick dynamic scope example:
(let ((x 10))
(defun get-value-of-x () x)
(get-value-of-x))
Obviously evaluates to 10. get-value-of-x just returns the value of
x, whatever it is, when it is called. Hence:
(let ((x 11))
(get-value-of-x))
Is 11. If we call get-value-of-x outside of a let which binds
x, we'll get an error. x, at that point, has no value at all.
What if we use lexical-let instead?
(lexical-let ((x 10))
(defun get-value-of-x () x)
(get-value-of-x))
This is still 10.
But, perhaps surprisingly, we can now say:
(get-value-of-x)
Outside of any let expression which binds x and get 10 as the
answer. Perhaps even more surprisingly:
(let ((x 11))
(get-value-of-x))
Is still 10. Even:
(lexical-let ((x 11))
(get-value-of-x))
Is still 10. In the context of a lexical-let, symbols bound by
that lexical-let don't refer to whatever value is bound to that
symbol "right now" in the dynamical environment, they refer to the
particular lexical environment of the lexical-let expression.
Conceptually, any lambda expression appearing in a lexical-let has
to store more than just the list of symbols and lists that makes up
its body; it also has to store the lexical environment to which those
symbols refer.
(Lexical scope is called lexical scope because variables are interpreted by virtue of their lexical context, literally where they appear in the source code, rather than where they are evaluated when the program is executed.)
Code might still be represented as a list of symbols and lists and atoms, but that representation leaves out information about lexical context. When we finally reach hygienic macros, when we talk about Scheme, we'll see that decorating the list-of-things representation of source code with lexical information will be a key feature.
Let's drive this home before we adjourn part 1. Consider:
(let ((x 10))
(setq code 'x)
(eval code))
That is definitely 10.
What about if we say:
(eval code)
We get an error: "void variable x". Also expected.
Now:
(lexical-let ((x 10))
(setq code 'x)
(eval code))
This, maybe surprisingly, gives an error too. The real reason is that
Emacs Lisp does some tricky things to simulate lexical scope, but it
isn't totally obvious that eval should use the current lexical scope
to evaluate terms. The meaning of symbols passed to eval in the
context of lexical scope is suddenly non-trivial. We can't execute
the above, but lets try:
(lexical-let ((x 10))
(setq code 'x))
(let ((x 11))
(eval code))
This evaluates to 11. The variable code really just contains a
piece of symbolic data. That data doesn't remember anything about
where it was defined - it forgets its lexical context - and so it
can't evaluate correctly. The addition of lexical scope breaks the
symmetry between quoteation and lambda expressions! We didn't
need lambda in picoLisp because there was no lexical context to keep
track of. In other Lisps we need defun and lambda to function as
"smart quotations," which remember their lexical environment so that,
upon invokation, they can behave correctly!
Because of the breakdown of this symmetry, and because Emacs Lisp doesn't provide picoLisp style functions, we need a whole new special form in Emacs Lisp to declare syntactic extension.
Conclusions of Part 1
Lexical scope is widely agreed to be the superior semantic mode for programming languages, principally because it makes reasoning about a piece of code mostly about looking at the text that defines the code, not at when the code is executed. But now we can see that lexical scope comes at a price: it breaks the symmetry between code and data. We get the simplicity, as programmers, of being able to understand our code based on its lexical environment "on the page". But the compiler/interpreter now has to do extra work to execute code correctly. Futhermore, naive modes of representing code as data are no longer complete: lists of lists and atoms don't contain lexical information.
Next time we'll talk about macros in Emacs/Common Lisp and Clojure (and maybe Arc (why the hell not?)). With today's lessons in mind, we'll see most of the conceptual problems with macros in these systems are associated with the fact that macros manipulate an impoverished code representation.
In subsequent posts, we'll look at macros in Scheme, and we'll see that the representation of code in that language is decorated with lexical information. This makes writing "well behaved" macros easier, but makes writing macros a little more abstract and, arguably, difficult to understand.
Thanks for reading!

