ETEP WWWGATE

Note: This was all written in Winter 1996, so take with a grain of salt, please.
Author:
Antti Karttunen

Recommended reading:
Overview of the ETEP server software
I have also assumed here that the reader has previous knowledge about the HTTP and HTML standards and about the way the Web works in general. You might want to check the following site if you want more information about these matters:
www.w3.org
Table of contents:


Overview of The ETEP WWWGATE Interface

The WWWGATE interface is the subsystem of the ETEP server which shows itself to the outside world (here: the rest of the Internet) as a special purpose Web-server, interacting with client-browsers as any other Web-server using the standard HTTP protocol. That is, it listens to incoming connections, reads, parses and handles received http-requests, doing all that by itself, and then responds to each (valid) request with a dynamically generated HTML page. [0].

WWWGATE allows building applications based on the object-oriented principles as well as on some other design, for example, by doing value-based inquiries to the relational databases. The way how WWWGATE is used in the ETEP server is a good example of a mixture of these both models.

The main principles in the design of the ETEP WWWGATE have been:





WWWGATE Functionality

The main functionality the ETEP server offers for its users via WWWGATE interface is:
  1. to implement GDD ("Global Data Directory") matrix as the table of counts and hyperlinks to more detailed submatrices. (The submethod www-matrix)

  2. to offer users also a more traditional way of filtering and listing the desired data by letting them to fill and select appropriate fields from the search form. (The submethod www-inq)

  3. to extract individual objects of information (i.e. "records") from the backend database, and show them to a user. (The submethod www-obj)

  4. to let users enter new objects of information to the system, i.e. into the backend database(s). (The submethod www-feed) For that end, provide them a way to search and browse on the hierarchical classification categories with a special tool. (The submethod www-hier)
We see that points 1-3 work as steps from a more general, but a shallow, view of the data present in the system to the most detailed, but also a narrow, view of a single, individual object.




WWWGATE Implementation Notes

Platform

The WWWGATE, like the whole ETEP server software (*) of which it is an integral part, has been programmed mostly with Allegro Common Lisp, with only few modules written in the C language.

Allegro Common Lisp is a Common Lisp Interpreter & Compiler of Franz Inc., of which there are implementations for various Unixes (AIX, Solaris, etc.). It provides I/O-safe multithreading, the feature which is extensively used by the ETEP server.

(*) Not counting the backend databases and associated replication servers, which are written wholly in C.





Submethods, URLs seen as function calls

The primary mode of operation of any Web-server is to listen requests coming from its clients, handle them in some way or another, and then send some response to the requesting client browser, a response which is usually a page written in HTML language. In HTTP standard the most important part of the client request is a string called URL (Universal Resource Locator), which specifies what the client wants.

As contrast to traditional Web-servers which view most URLs as (almost) opaque ids for the static objects of information that is, as filenames of the static HTML-documents, WWWGATE sees each URL it receives as an invocation for some procedure to be executed, the URL possibly specifying also the parameters given to that procedure.

Here that Lisp-procedure (function) to be invoked is called submethod and parameters passed to it from the rest of URL are called its arguments.

The resulting output is then the function of (in mathematical sense, i.e. it depends on) this submethod called, arguments given to it, and the current state of the backend database(s).

Naturally the HTML page created may also contain new hyperlinks (URLs), and some of them might as well lead back to the same or some other ETEP server. In that case it's useful to think those URLs as they were potential or interactive remote function calls (+). That is, they are executed only if the user decides to click/follow the corresponding anchor. When executed each one of them may generate one dynamic HTML page more, with a new set of potential remote function calls, etc. ad infinitum.

(+) At least in time, if not in distance...




Templates

Almost all information shown by WWWGATE is formatted as HTML pages, which are always generated on-the-fly (dynamically), but never stored or cached in that format anywhere, at least not by WWWGATE itself.

To facilitate the maximum control over the layout of the HTML pages generated, but also over the contents of data to be shown at those pages, i.e. what will be shown and how, I have developed the system of templates for WWWGATE.

The templates are set of files whose syntax at outer level is almost like HTML but with an addition of a new tag allowing embedding of Lisp code at arbitrary locations.

The input for the template system is always one object of information, either a persistent object fetched from the backend database, or a non-persistent object instantiated to be just as a container for the group of data fetched from the database(s) by some other means (e.g. with the relational select operation of SQL).

The template file to be used is determined as the function of the class of the object given, e.g. persistent objects of different types (offers of various kinds, company and user records, etc.) use different template files.

The template file is processed so that all the outer level text and tags are transmitted directly to the client's browser, on the assumption that it is plain human readable text and any standard HTML markup-tags for emphasis, images, hyperlinks and whatever.

This continues until a special tag containing an embedded Lisp-expression is encountered, which is then evaluated in the context which allows it to refer to the fields (slots) of the object given, and the result of that evaluation (which may contain also standard HTML tags in addition to human readable text) is then output to the client end, just as it has been there in the first place, instead of the Lisp expression. However, if the embedded Lisp-expression returns a special empty value (NIL) the whole expression is skipped, with no output at all produced for that tag. This feature allows selective output of the information, for example showing only those fields which contain a non-empty or specific values, thus helping users to concentrate only on the essential information. It can be also used to show certain fields only when the user's browser sends appropriate authorization information, thus showing the full contents of data object for the authorized users, and just a limited sample for others.

Utilizing Predefined Field Types in Templates

Not only can embedded Lisp-code refer to the fields of object in their "raw", internal format used in the database, but also in more "civilized", HTML-readable format, just by prefixing the fieldnames with another prefix character. Each field of the object has been given its characteristic field type, which affects how its internal representation is converted to corresponding HTML/human readable format, what kind of input tag is generated for it (in the case that it is possible also to insert objects of the said type via WWWGATE) and what kind of validation and conversion is done for the data entered/selected by the user to get the internal representation.

For example, the fields used for storing dates have a specific field type datefield, and when they have to be output, the output method for that field type is called, which will convert the internal integer representation of the date to the human readable format, e.g. as 23-DEC-1995

Similarly, when a field has the field type objfield, its corresponding output method will generate a HTML hyperlink (<A HREF> tag) to the submethod www-obj that will eventually fetch and show the said object provided that the user decides to follow that hyperlink. Hyperlink tags are also automatically generated for fields of types emailfield and urlfield in case they have been filled by the user who left the record.

On more abstract tone...

It's easier to grasp the principle of WWWGATE if we look at how each object of information is being moved and merged through distinct levels in the ETEP system, before finally shown as a page on the screen of the client software.

  1. "Raw" data in the backend database(s).
  2. Raw data transmitted over TCP/IP to the ETEP server, which will be gathered into one object at the ETEP server side, unless it already was stored as a persistent object in the database. Result: an instantiated object in RAM.
  3. Template file.
  4. Produced HTML page (<= template file + raw data).
  5. Page rendered on the screen (or printer) of a Web-browser, the final manifestation of the information.

That is, the primary function of the template files is to supply "decoration" (e.g. layout) for the actual raw information, as well as things like human readable names for the fields, which besides might at times be shown in different languages depending on the preferences of the user. In addition to that, the Lisp code in template may supply some control over which fields are shown and which are not.

But dividing the areas of responsibilities for the template file and the corresponding submethod, which uses it, is ultimately a matter of taste. Regarding the efficiency, we may note that submethods, being just Lisp functions defined with a special form, are compiled with the rest of the ETEP server sources, as contrast to embedded Lisp code in templates which is currently always interpreted. (It should not be too hard to implement a template compiler in case that the server load grows substantially...) On the other hand, it is quite easy and fast to modify the template files, and the effects are immediately seen without any compiling. I.e. that provides an easy way for making local customizations to lay-outs, changing languages, etc. at the other installations of the ETEP-servers.




Statelessness

In the design of WWWGATE I decided to follow the statelessness principle also used by the most other Web (HTTP) servers.

That is: the Web server itself should not try to remember anything from the previous connections. Bear in mind that HTTP protocol forces the current browsers to fetch every HTML page and image needed with a separate TCP/IP connection, and so the server sees each request received to be entirely separated from all the other requests.

If one can notice any state in the system, it should be the state of the external information source. [1] In the case of WWWGATE the state is kept entirely in the backend database(s), and it reflects the persistent changes made to the information system, e.g. inserting, modification or deletion of persistent data objects. [2] If there are any hash tables cached by WWWGATE, they only reflect the state of corresponding persistent data structures in the backend database(s), like is the case with the GDD count information of which the most frequently needed CxA points (i.e. the top levels of the matrix) are kept in the internal counter cache of the ETEP server.

In many cases, however, non-persistent states are also needed. This is the case for example with long inquiries done to the backend database(s), where we can't show all the matching results at one HTML-page, but must divide them over many pages. To implement this efficiently we need a cursor, a cursor which is a little bit more sophisticated than just a count of how many items have been shown this far. That means a cursor that is understood by the backend database so that it can make a fast positioning on the table where the results are fetched from.

Because WWWGATE doesn't want to remember the state of this cursor from one connection to next, the responsibility for remembering it has instead been given to the client browser. [3] This is done by encoding the state of the cursor into the URL of the hyperlink which is generated to the end of the page listing the current batch of the items/records. (Saying for example: Next 50 items)

The immediate consequence of this approach is that the time span of the queries done can be much longer. E.g. we can imagine an extreme case where the user lists the first 50 items from the database, goes on three month vacation, leaving his computer on and the browser open, and when he eventually returns to his work clicks the Next 50 items button, and provided that the state of the external information source has not meanwhile changed too much, the continuation of the query should produce meaningful results.

The other consequence is that because we are transmitting in the URLs data which may have a lot of "internal significance" to the system, we should validate it carefully before transmitting it to any other agents (e.g. backend database(s)) that might behave in unexpected or undesired ways if they received arbitrarily constructed sequences of operation data/codes. This might happen if users start experimenting with the system by manually modifying the URLs produced by WWWGATE. See also note [4].


Special WWWGATE Tools: www-sql & www-src

I have also implemented two important tools for the benefit of the system administrators and developers of the ETEP-system. They are both implemented as special submethods on WWWGATE platform, utilizing many features offered by its common modules. Both of them check that their user has the required privileges for the operation, by using the basic authorization dialog of the HTTP standard.

The first of them, called www-sql, is a tool for doing arbitrary, "raw" SQL queries and other operations (updates, deletions, etc.) interactively via WWWGATE, which also formats any output of queries nicely on the screen of the Web-browser. (The backend database returns the exact list of all the column names used, even if the query had contained a wildcard expression like select * FROM TABLE1)

Since Kübl WI M2, the backend database, supports also the persistent object model in addition to ordinary SQL (i.e. relational operations), www-sql also provides an easy way to follow references between the objects, wherever those references might be encountered.
This is quite easy to do because Kübl uses tagged types à la Lisp and the discreteness of these types is preserved when data is transmitted to the ETEP server (implemented in Lisp), hence it's always possible to distinguish object ids from ordinary strings because they are separate data types at both ends.

The other one of these tools, called www-src, is a submethod for converting Lisp and other source files on-the-fly to cross-referenced HTML, showing them on the browser's screen just as they are, except that all the calls to non-built-in functions, instantations of objects, usages of parameters, etc. are shown as hyperlinks to their respective definitions, whether they are in the same or separate Lisp module.

www-src also appends to the end of the produced HTML page a hyperlink index of all the definitions of the module.

The latter tool has been very useful when I have written system documentation with ordinary, "static" HTML, interspersing it at times heavily with hyperlink references to the actual function definitions at the source files, thus keeping the documentation and that which is documented in close contact with each other.

So although the documentation itself would lag a little bit behind the sources, at least the reader has a possibility to quickly check, with just a few clicks of mouse, the actual state of the matters from the current source files, provided that he can read some Lisp and assorted comments.




Miscellaneous

Redirections

On many occassions redirections provided by the HTTP protocol (response code 302) are used. E.g. in the distributed environment one WWWGATE might issue redirections to another one on the other side of the globe, in case that ETEP-server has better changes in fulfilling the request. (One could even imagine a kind of global URL ping-pong being played by ETEP-servers, where the client is the ball...)

Redirections are also used when the "state" (from the browser's view) of an object of information to be entered/modified by the user is changed from "change" to "solid", that is, of the HTML page containing an input form, generated from an input template, to the corresponding "output" HTML page generated from the corresponding output template. Specifically, when the user has filled a HTML input form (produced in this case by the submethod www-feed) and saved it into the database with an appropriate submit button, WWWGATE will immediately issue a redirection to the submethod www-obj with the id of the object thus created/modified that forces user's browser to show the object in the same way that all the other users of the system would see it, and thus he can proofread it and get back to the "change" mode in case there are anything to be corrected.

Because the redirection is used here (from www-feed to www-obj) the URL shown at the location line of the browser really reflects the contents of the page and the way it is shown, and thus the user can safely add the URL into his bookmarks.


Dynamic Tags

There exists a feature that allows replacing all TABLE and assorted tags with the nearest "corresponding" tag (currently defined ad hoc as DL, DT and DD which produce reasonable results with tables of max. two or three columns), if the user's browser can't handle table tags. This substitution is universally done by all submethods using the template files, in case the submethod name has been prefixed with the option -T, e.g. /-T/www-obj/fi1.Ceg.1

There is also a syntax for the generic closing tag that can be used when writing template files. It works by popping up the topmost HTML-tag pushed to the tag-stack at the reading and parsing time of the template file. (In which case we have to know which HTML tags are used in pairs and for which only the single opening tag suffices. Not recommended...)



The History and Future of This Project

When I came to work at Infosto R&D (located at Tampere, Finland) in the beginning of February 1995, there already existed a simple "Webgate" to the data, which at that time was planned to be primarily operated with the proprietary MS Windows based ETEP-client, the Webgate serving only as a "showcase" of the system.

However, that Webgate allowed only a limited set of output operations, and it needed the help of an ordinary Web server to run a small cgi-script, through which both the URL and the generated HTML page had to be transmitted with the special RPC protocol, which was quite cumbersome, error-prone and slow. Also, the mapping from internal presentation of data to the corresponding HTML page was defined using relatively fixed definitions inside Lisp, and thus it was not really configurable.

Even before I came to Tampere, I had already played with the idea of HTML templates to be used for both the input and output and had especially thought about embedding pieces of Lisp code inside HTML tags. I had also done some investigation on the Web about various WWW-RDBMS gateways and interfaces already implemented or developed at that time. Having just a while ago ported my old Kanji Dictionary software to the Web at Cute Communications Ltd, in Helsinki, implementing it with Franz Lisp as a stand-alone Web-server, I already knew the relatively simple techniques for listening to, reading and parsing HTTP-requests directly from the client browsers.

So, having these basic elements in my mind, I built a wholly new WWWGATE version from scratch, adding also a new submethod www-matrix for showing GDD ("Global Data Dictionary") with HREF-hyperlinks inside the TABLE tags. Here I took the idea how GDD is presented as a matrix almost straightly from the MS Windows based ETEP-client.

After that, it took some time before I had designed also the input template system with the adequate validation and authorization procedures, and solved the quirky problem how to implement the browsing tool for hierarchical classification categories so that it is also smoothly integrated with the rest of the input feed form.

All these done, WWWGATE was already much like it is nowadays. Since then, I have added the free-text searching capability to the www-hier submethod, dynamic substitution of certain HTML tags, and various tool submethods for the administration and development, like www-sql and www-src mentioned earlier. Of course, the integration of WWWGATE with the new distributed database has required some extra tricks to be added. Not to mention anything about the fact that not until the late 1995 I realized how pissy certain proxy gateways are (corrupting wrongly formatted URLs) which forced me to adopt a new URL syntax for WWWGATE and to rewrite the whole URL parsing system from scratch.

The Future

Now in February 1996 I am heading for new opportunities. When the distributed database system is finally ready and well tested, we will see the sprouting of many new ETEP Trade Point Servers all around the globe, all around the Web. GDD shown by any one server will reflect the situation of all of them.

The same ETEP Server - WWWGATE software is now customized also to other projects. For example, Infosto Oy will soon open the Web-version of its Finnish Free Ad magazine Keltainen Pörssi, which will offer a free submission of offers, but only the restricted public browsing, the full read rights requiring a paid subscription. That is, it's almost the opposite way of how the ETEP server itself works, which allows unlimited browsing and reading for any Internet user, but the leaving of new offers is strictly limited to the paid customers.

Martti Ylikoski has done the customization for the Keltainen Pörssi server and he also continues the further development of WWWGATE at Infosto R&D.




References

Note: We are just software engineers and don't decide the actual layouts or graphics used.



Author's Address

Antti Karttunen



Footnotes

[0]
I.e WWWGATE doesn't need or use any cgi-scripts. The only thing for which WWWGATE currently needs the help of "standard" WWW servers is to show few button icons (small images) and static help pages. And also for those tasks it would be very easy to write a new submethod.

[1]
It might be noted that similarly the ordinary Web (http) servers just reflect the state of their "external" information source, which is usually the underlying file system.

[2]
With further contemplation it occured to me that also HTTP servers have and may use a state, that universal state that applies to all objects which exist, namely the time constantly ticking away.

[3]
There are some things Web-browsers implicitly "remember" for us, without even being needed to tell them to do so. For example the User-Authorization field which, after first time asked for, is then always transmitted when the URL begins with the same node and the same port. (Well, how it is really defined in the most recent specs of HTTP ?)

That information can then be used by the server as the user were in the "logged in" state, and if there's an unique object in the database for each user, then that object can be used for storing some facts about the said user (e.g. the preferred language).




To the top of document