On S-expressions and the utility of the programming language Scheme in their manipulation

In many cases it would be nice to have a concise and unambiguous notation for symmetry and other operations involving the structures presented in the previous chapter. Also, when examining their orbit-profiles it's useful to have some empirical data before trying to derive mathematical conclusions. Naturally, such data can be obtained with computer, by programming the rotations, reflections and other operations as procedures that act on data structures implementing various manifestations presented in Chapter I. E.g., a simple-minded aficionado of the object-oriented programming (OOP) paradigm might devise a separate class for each such manifestation, with certain inheritance between the classes, e.g. a class of binary trees might be a subclass of general trees, and non-crossing handshakes and set-partitions might diverge from a common class of chord-diagrams. The objects instantiating the said classes would then hide each its own set of private elements, for whose manipulation specific methods would be needed.

Clearly this approach is far too baroque, when in the previous chapter I have already shown that these various interpretations for Catalan numbers are just different manifestations of the one and same combinatorial structure. This means that to effect, say a rotation of Euler's polygon divisions, it's enough to define a certain operation on binary trees, as long as those two manifestations are unambiguously bound together in some way, the most natural of which was explained in the chapter I. So, instead of tailoring multiple custom data types, a single universal one should suffice. It turns out that S-expressions, introduced by John McCarthy in his programming language Lisp [JMC, 1960] naturally fit the bill. Being originally just designed for the manipulation of such data structures, it contains a powerful set of primitives for their manipulation, which have in turn been inherited by its dialect Scheme (invented by Guy Steele and Gerald Sussman), and to a lesser extent also by Prolog, and certain more recent functional programming languages derived from these.

S-expressions

Definition. There exists various definitions for S-expressions (see e.g. [Holm], [Sottile]), however in this paper we adopt a convention that an internal, "dotted" representation of an S-expression can be any of the following:

  1. a symbol. For example, define, pair?, else and + are symbols.
  2. an integer. For example, 144 and -1 are integers.
  3. an empty list ( )
    or
  4. an ordered pair of the form (X . Y) where the element X (traditionally called car) can be any of the cases (a-d), while the element Y (traditionally called cdr) can be either the case (c) (an empty list) or the case (d), other such pair.
This definition is isomorphic with the external, "dotless" representation, in which an S-expression can be any of the following:
  1. a symbol.
  2. an integer.
    or
  3. (<dotfree-sexpr>*)
The last case says that a parenthesis-delimited list of zero or more other such external representations is also an external, dot-free representation. From now on, when it's not the question of a single symbol or integer, we shall refer to these dotless representations of S-expressions as lists. The isomorphism between these two representations is realized by the function f which maps from the "internal" to "external" representation as follows:
symbol-->symbol
integer-->integer
( )-->( )
(X . ( )) --> ( f(X) )
(X . Y) --> a list obtained by inserting f(X) as the first element to a front of a list obtained with f(Y)

For example, the function f would map the internal, dotted S-expression (a . ((( ) . (2 . ( ))) . (b . ( )))) to external, dotless list (a (( ) 2) b)

This definition is sufficient for explaining how to define functions in our minimal subset of Scheme.

Furthermore, we define a subset of symbolless S-expressions. Their internal, dotted representation is defined that they are either:

  1. an empty list ( )
    or
  2. an ordered pair of the form (X . Y) where either element, X or Y can be either an empty list or other such pair.
When we apply the above map f to this subset, we realize we have a missing link connecting our two sub-families of the Catalan clan, represented in the previous section. Namely, the former definition is equivalent (*) to the definition of planar binary trees and binary bracketings, and the corresponding external, dot-free representations are lists composed of parentheses only, i.e. parenthesizations surrounded by an extra pair of parentheses.


A brief introduction to the programming language Scheme

Note: Most of what is said below applies also to Lisp, with just minor changes in the names of some primitives and functions. We might gloss over some details, as our purpose is just to describe a subset of the language sufficient for implementing gatomorphisms.

The syntax of the language is itself based on S-expressions: A program written in Scheme is a set of functions invoking each other and/or built-in functions, where each function defined by the programmer is represented as a single S-expression of the special form, beginning with the symbol define. Almost every computation in Scheme is effected by evaluating a function call, and each such function call is represented as a list of length n+1, where n is the arity of the function invoked. For example, a function call with three arguments, like f(a, b, c) is represented as a list of four elements: (f a b c), and a nested call like f(g(a, b), c) in turn represented by a nested list structure (S-expression): (f (g a b) c).

Here is a list functions and primitives we need to implement gatomorphisms in Scheme.

(cons Sexpr1 Sexpr2)
Returns a new cons cell whose car-side contains Sexpr1 and cdr-side Sexpr2.
Examples:
(cons 'a 'b)          = (a . b)


(cons '() '(() . ()))
                      = (() . (() . ())) [Result in the internal, dotted-pair format.]
                      = (() ())          [Result in the external format.]

(car Pair)
Returns the car-side of the cons-cell Pair. An error results if car is applied to an empty list ( ).
Examples:
(car '(a . b))        = a

(car '((a b) c d))    = (a b)

(cdr Pair)
Returns the cdr-side of the cons-cell Pair. An error results if cdr is applied to an empty list ( ).
Examples:
(cdr '(a . b))        = b

(cdr '((a b) c d))    = (c d)

(list Sexpr1 Sexpr2 ... Sexprn)
Returns a list constructed from its arguments.
Examples:
(list)                = ()      [With no arguments gives an empty list.]

(list '(a b))         = ((a b)) [Unary form always surrounds with extra parentheses.]

(list 'a 'b)          = (a b)

(null? Sexpr)
Returns #t (true) if its argument is an empty list ( ), otherwise #f (false).
Examples:
(null? (cons 'a 'b))  = #f

(null? (list))        = #t

(pair? Sexpr)
Returns #t (true) if its argument is a cons cell, otherwise #f (false).
Examples:
(pair? (cons 'a 'b))  = #t

(pair? (list))        = #f

(not Sexpr)
Returns #t (true) if its argument is #f (false), and for all other values returns #f (false).
Examples:
(not (pair? (cons 'a 'b)))  = #f

(not (pair? (list)))        = #t


Note: As long as we restrict our attention to symbolless S-expressions (as in the inplementations of gatomorphisms below),
(null? Sexpr) is equivalent to (not (pair? Sexpr))
and vice versa,
(pair? Sexpr) is equivalent to (not (null? Sexpr))


(if Expr1 Expr2 Expr3)
Returns the value of Expr2 if Expr1 evaluates to a non-false value, otherwise evaluates Expr3 and returns its value. Note that if always evaluates either Expr2 or Expr3, never both.
Examples:
(if (null? (list)) (list 'it 'is 'empty) (list 'it 'is 'not 'empty))
                            = (it is empty)

(if (pair? (list)) (list 'it 'is 'a 'pair) (list 'it 'is 'not 'pair))
                            = (it is not pair)

(cond (TestExprA ExprA1 ... ExprAn) (TestExprB ExprB1 ... ExprBn) ... (else Expr_else1 ... Expr_elsen) )
If TestExprA evaluates to a non-false value, executes expressions ExprA1 ... ExprAn listed after it, returning the value of the last one as the value of whole cond-expression, without evaluating any expressions. Otherwise, checks whether TestExprB evaluates to a non-false value, and if so, executes expressions ExprB1 ... ExprBn listed after it, again returning the value of the last one as the value of whole cond-expression. Et cetera. If none of the test-expressions evaluate to a non-false value before else, then the expressions Expr_else1 ... Expr_elsen are evaluated, with the value of the last one returned as the value of the whole cond-expression.
Examples:
(cond ((null? (list)) (list 'it 'is 'empty))
      (else (list 'it 'is 'not 'empty))
)
                            -> (it is empty)

(cond ((pair? (list)) (list 'it 'is 'a 'pair))
      (else (list 'it 'is 'not 'pair))
)
                            -> (it is not pair)


Note: As can seen from above examples
(if Expr1 Expr2 Expr3) is equivalent to (cond (Expr1 Expr2) (else Expr3))

(define (funcname arg1 arg2 ... argn) expr1 ... exprn )
Defines a new function with name funcname with formal argument names arg1 arg2 ... argn, which when invoked, will bind the values of the actual arguments to the formal arguments, and then execute each of the expressions expr1 .. exprn in turn, returning the value of the last one as the result of the function invocation.
Examples:
(define (xcons x y)
   (cons y x)
)


(xcons 'a 'b)                -> (b . a)     [We defined a "crossed" cons.]

(let ((var1 init1) ... (varn initn) ) expr1 ... exprn )
Evaluates the expressions init1 ... initn and bounds variables var1 ... varn temporarily to their values, then executes each of the expressions expr1 .. exprn in turn, returning the value of the last one as the result of the whole let-expression. Note that this is the standard way to define temporary variables inside function definitions.

(append Sexpr1 Sexpr2 ... Sexprn)
Returns a list concatenated together from its arguments.
Examples:
(append)                            = ()        [With no arguments gives an empty list.]

(append '(a b) '(c))                = (a b c)

(append '() '(a b) '() '(c d) '())  = (a b c d) [Empty lists are silently ignored.]

(set-car! Pair Sexpr)
Modifies physically the contents of cons cell Pair, such that its car-side is overwritten with Sexpr. An error results if the first argument is not a cons cell.

(set-cdr! Pair Sexpr)
Modifies physically the contents of cons cell Pair, such that its cdr-side is overwritten with Sexpr. An error results if the first argument is not a cons cell.



Difference between constructive and destructive implemetations of gatomorphisms

If we exclude set-car! and set-cdr! from the set of primitives listed above, we can only define gatomorphisms and other functions that leave intact the physical structure of the S-expression(s) given as their argument(s). In case of gatomorphisms, if it is not a question about a mere identity function, then at least on some values of its argument it must return an S-expression with different structure than the argument has. Because by its definition is a bijection on the set of structures of the same size, merely invoking the accessor-functions car and/or cdr is impossible, as then the result's size (in cons cell nodes) would be less than that of the argument. So to effect changes which return an S-expression of the same size but of different structure, it must invoke the function cons, or some other functions like append or list which calls cons implicitly. For this reason these are called constructive implementations of gatomorphisms.

On the other hand, if an implementation of a gatomorphism invokes set-car! and/or set-cdr! or any other function invoking them, it is called destructive. This is because such functions make direct modifications to their argument's physical structure, and thus destroy information about their original structure. Traditionally, use of such destructive functions has been considered extremely dangerous, and thus Scheme has adopted the convention of suffixing the names of such functions with an exclamation mark (!). For example, there is a function called append! which is a destructive version of the function append described above. However, it turns out that in our application the use of strictly destructive implementations of gatomorphisms (or functions of suspected of being ones) have certain merits, when they are subjected to mathematical analysis. Namely, the bijectivity of such a function is much easier to proive, if we know that it does neither "lose" any existing cons cells, nor allocate any new ones with constructive operations. More about this in the last section of this paper. For most of the gatomorphisms represented below we thus give both constructive and destructive definition. We follow the Scheme practice of suffixing the names of the latter with '!'.

For example, here is a destructive version of gatomorphisms A069770 whose constructive version was defined above:

(define (gma069770! s)
  (if (null? s)
      s
      (let ((org-car (car s)))
        (set-car! s (cdr s))
        (set-cdr! s org-car)
        s
      )
  )
)
Note that we first check whether the argument is an empty S-expression ( ), which is returned back intact, while for real S-expressions consisting of cons cells we first have to store the original contents of the car-side of the argument to a temporary variable org-car, as otherwise its contents would be lost immediately after set-car! has overwritten it with the cdr-side of the argument. The destructive version of A057163 can then be written simply as:
(define (gma057163! s)
  (cond ((pair? s)
            (gma069770! s)
            (gma057163! (car s))
            (gma057163! (cdr s))
        )
  )
  s
)
i.e. we apply A069770 to every cons cell we encounter (i.e. swap their sides), and recurse down to both car- and cdr-side, each branch as long as we encounter ( ).