To be incorporated into http://www.megabaud.fi/~karttu/matikka/Nekomorphisms/gatomorf.htm either as a part or as a separate page...)

The purpose of this paper is to propose a new kind of science, gatomorphologia (a branch of combinatorics examining gatomorphisms), and to strongly advocate the programming language Scheme as means of doing research and publishing one's results in that or any related fields.

A brief introduction to the Lisp/Scheme list structures for aspiring gatomorphists.

The programming language Lisp, which has its roots in Church's Lambda calculus, excels for many reasons, but from the viewpoint of anybody doing explorative programming its chief advantages are:

  1. the dynamic run-type typing,
  2. the automatic memory reclamation (garbage collection)
    and its
  3. flexible and universal list data type.

Note: From now on, when we refer to "Lisp", we will also mean its statically scoped dialect "Scheme", unless otherwise noted.

In contrast to some other programming languages (e.g. Perl, Maple) which hide their implementation details, the Lisp lists are explicitly implemented as simply linked lists, with an underlying (plane) binary tree structure. That is, a Lisp list is nothing else than a double pointer, called "cons cell" (in Lisp parlance) or "pair" (in Scheme culture), whose other component pointer points to the first element of the list, and the other pointer points to the rest of the list. For obscure historical reasons these pointers are called "car" and "cdr", and here we will also stick to these terms [1]. Because of its dynamic typing, Lisp (usually) does not know the types of variables and function arguments beforehand (e.g. at the compile time), but instead has to "tag" [2] each data object (at the runtime) with its type indicating whether the raw binary data contained in the object (which could fit into a machine word or two) is to be interpreted for example as an integer, or say, as a cons-cell. This means also that there is no restriction as regards to what kind of data the "car" and "cdr" elements of the cons cell can point to, allowing the programmer to build any kind of polymorphic data structures with them. Now, it is just the standard convention that a "list", with which we mean an ordered sequence of any data objects, distinct or not, ...

(a b c)
is implemented in Lisp as a collection of cons cells, in such a way, that the one of them has its "car" pointer pointing to the element a, and its "cdr" pointer pointing to another cons cell, whose "car" points to the element b, and "cdr" in turn points to yet another cons cell, whose "car" points to the element c, and whose "cdr" points to [3] a special object called NIL (from Latin nihil, nothing), which has a very special meaning: it indicates the end of the list, and also stands by itself for an empty list ( ), which cannot be represented with cons cells.

Being such a special creature, many Lisp systems have an ambivalent attitude towards NIL: it is considered a "list", although it's not a "cons cell". (All other lists are cons cells.) In traditional Lisp-implementations it is also a "symbol", and it stands for the boolean value "false", in contrast to all other Lisp objects that stand for "true". Also, in Lisp it can be specified in source code either way, as a "symbol" NIL or as an empty list ( ), although at least the Common Lisp always prefers to output it as NIL, not as ( ).
However, in (standard implementations of) Scheme this is markedly different: There is a separate object #f meaning "false" and there is actually no symbol called NIL, and the only way to specify it literally is to use the empty pair of parentheses ( ), and Scheme will also output it always in that way, which is better for us, as we shall see. However, in this article we stick to the term "nil", even when talking about Scheme-functions, instead of the more exact, but longer "empty list".

So what are the elements a, b and c, in the above example then? As stated above, also the "car"-sides of cons-cells can contain/point to any kind of data, thus a, b and c can be any valid Lisp objects of whatsoever. That includes numeric objects like integers and real numbers, character strings, vectors, special Lisp objects called "symbols", and most importantly, also other lists (i.e. cons cells) including nil, the empty list (which, we repeat, is not a cons cell). Fortunately, we can forget all the myriad data types of Lisp, and think about just two: the cons cell's and the NIL (which we can think of belonging into type of its own).

Now, what kind of combinatorial structures one gets when one builds such list structures, where only allowed elements are other cons-cells (non-empty list structures), and NIL?

Clearly, if we look at the outward manifestation of these structures, they are general parenthesizations (or bracketings) (and this is even easier to see when the NIL is output as ( ), as Scheme does) and if we look at the implementation level, they are rooted planar binary trees, where each cons cell corresponds to a branching (non-terminal) node of the tree, and NIL's are its leaves.

Both of these structures belong to an immense family [1] of combinatorial structures enumerated by the Catalan numbers:

C(0)
= 1. ( )
C(1)
= 1. (( ))
C(2)
= 2. (( )( )), ((( )))
C(3)
= 5. (( )( )( )), (( )(( ))), ((( ))( )), ((( )( ))), (((( ))))
etc.
The mapping from above general parenthesizations to general (plane) trees is simple, with ( ) corresponding to an empty tree or tip-vertex . and each element of the list of length n corresponding to a branch from the node of (upward) degree n.

It should be noted that the underlying implicit mapping between the implementation level binary trees and the outwardly manifested general parenthesizations present in such Lisp lists hides precisely the same one-to-one correspondence between planted plane trees and trivalent planted plane trees as shown by N.G. De Bruijn and B.J.M. Morselt in the first description (figures 1 & 2) of their paper "A Note on Plane Trees", published 1967, a long after the invention of Lisp by John McCarthy.

Also, the map from the outwardly manifested parenthesizations to the underlying cons cell binary trees is just the map R given by Donaghey as the first constituent of his map MG presented in [DON], and thus the implementation of the map R in our case is trivial: it is no-op.

Programming in Lisp/Scheme

Here we give a very restricted subset of Lisp/Scheme, but just enough for understanding how to write gatomorphisms that manipulate the Catalan-enumerated list-structures specified above.

The large part of the elegance of Lisp stems from the fact that there is a minimum of syntax to learn about and remember. Apart from the constant literals (like integers), variable and function argument names, almost everything else is a function invocation, which is indicated the same way as one writes an arbitrary list:

(foo arg1 arg2 ... argn)
where the first element of the list, here foo stands for the function to be called, and arg1 - argn are its arguments. Thus call of n-ary function is presented with an n+1 element list. There are also function invocations with no arguments, for example (list) will call the function called "list" with no arguments, and that will return nil, i.e. an empty list. Note that the arguments given to the function can themselves be other function invocations, and when evaluating the result of the whole expression, the deeper function call's are invoked first, and then their values used as ...

We need define, cond, pair?, not, car, cdr, cons, set-car!, set-cdr! and almost nothing else... list, append, what else?

All the examples below are given in Scheme, but we tell in the side note what kind of changes need to be done that they worked in Lisp.

Footnotes

(1) Why don't we use perhaps more intuitive "first" and "rest" or "head" and "tail"? It's because although they indeed depict the function of "car" and "cdr" very well when one thinks in terms of lists (general parenthesizations), they will fail to illustrate their function in the binary tree context, where the "left branch" and "right branch" would be proper indications. Because in this article we are constantly changing our viewpoint between the binary and general bracketings/trees, we avoid those more descriptive names, as not to stir any unwanted connotations in a wrong context.
(2) ... or otherwise keep track of each data object's type, e.g. by reserving certain memory areas only for certain data types, or by other subtle tricks.
(3) ... or actually: contains. The NIL is in many Lisp/Scheme systems implemented as a machine word containing the pure zero.
References: