The Evolution of User Interface Markup Languages

Rohit Khare

University of California at Irvine
School of Information and Computer Science

March 17, 2000

Abstract

One of the central, if accidental, innovations of the World Wide Web was the advent of a platform-independent graphical user interface markup language. With HTML's <FORM> tagset in 1993, the Web transcended the reflective information-delivery posture of previous hypermedia research to incorporate the kind of transactional behavior only associated with 'applications' previously. Today, the W3C's XHTML Extended Forms (XForms) effort is considering several designs for a more comprehensive solution to the triple problems of representing: the layout and sequence of interactors; the schema and format of data elements; and the logical topology of user interface decisions. Such a quest recapitulates a broader struggle in the field of software engineering to automatically synthesize user interfaces. This paper, then, lays out the history of user interface innovations on the Web, evaluates the current XForm submissions, and provides a broader theoretical context for further debate.

Keywords: Form markup languages, Digital signatures, XML Schemas

Introduction

In the beginning, Tim spake <ISINDEX>, and it was good. He brought forth unto the world (or at least that patch bounded by the 27-kilometer diameter of the CERN particle accelerator) a new and wonderful scheme for storing all the words and deeds of the multitudes, forever stealing man's innocent pride in claiming idly "I'm sure there's nobody out there who cares about PEZ dispensers as much as I do." Yea, in that moment we saw the flash from the future of all the possible documents great and small that would one day roam the World Wide Web.

But in the meantime, it made a darned convenient phonebook...

Indeed, the first killer app on the Web was an intranet telephone directory for CERN --- not quite as profound as, say, physics paper preprint distribution. Tim Berners-Lee and Bernd Pollerman had just hooked up a web server to a database query engine to extract employee entries in 1991 as the first "intelligent hypertext" on the Web. By this point, the Web already had its URL syntax, while Gopher had already pioneered the searchable index file. Johnny-come-lately Web, too, needed a prompt to say "This page is a searchable index. Enter a search term: __." And thus it was that an unused question mark rattling around in the URL syntax toolbox was patched on the end to demarcate a search specifier: hence http://host/phonebook?Caillau

The lone empty element <ISINDEX>, then, was the primordial step towards today's interactive Web. But the browser we see today -- "a mere multimedia 3270 terminal" -- was not dealt out of the original deck of Web technology. The modern <FORM> element, bristling with drop-downs and radio buttons and file uploads and text boxes only dates back to Dave Raggett's HTML+ proposals from July-November 1993. But, of all the battle fronts in the Browser Wars, this remained the quietest: there's been very little innovation here since the initial Mosaic release. Instead, the long Interregnum we're passing through has been marked by a slew of media- and device-specific interaction markup for handhelds, voice, paper forms, and so on. This column, then, traces the haphazard evolution of User Interface Description Languages (UIDLs) for the Web to this point; in the next issue we'll detail the challenges facing W3C's XForms effort to reengineer it.

A Turing-Complete Novel?

Putting aside the torrent of names and historical claims, let's consider the philosophical limits of a Turing-complete hypertext. Suppose you begin at a page representing a chessboard in its opening state. Clicking on any chess piece traverses a hyperlink to another document, namely another chessboard depicting the result of that move. Is it possible, then, to claim that "You Are Here" at a single point within the near-infinite hypertext of all possible chess games?

What if the computer takes the next move as, say, a CGI script choosing the next link to be traversed. If you lose every time you start browsing this Borgesian "Never-Ending Library" of chess games, are you interacting with a book or a grand master? Is the question easier to answer for the case of Amazon.com?

While the hypertext research community has long recognized the potential for computation over a hyperweb ("What Problem documents are not linked to Solution nodes?"), and within a hyperweb ("Select the link target closest to your hometown") there was less appreciation for computation as a hyperweb in the 80's. Interactive components within compound document architectures were barely becoming practical on desktop PCs at the genesis of the Web. Remember Dynamic Data Exchange (DDE) and, later, Object Linking and Embedding (OLE) for Windows? Or Apple's publish-and-subscribe? Or even more exotic promises from Taligent and NeXTstep?

Instead of interactivity at the client -- a dream that Java applets attempted to resuscitate in 1995 -- Web tools merely submitted input strings back to server-side applications to "compute the next document" in the hypertext. Soon, the portion of the URL after the ? adopted an internal syntax for lists of argument name/value pairs, as in http://host/phonebook?LastName=Caillau&FirstName=R. Clickable imagemaps even went so far as to adopt a convention for naming the x and y coordinates of the click (though client-side imagemaps are the only significant case of server-side processing migrating back in Web history). Over time, the argument vectors grew so long they were split off as the HTTP payload of a new request method, namely POST. Nominally capable of submitting any MIME object (as for file-upload forms), the most common form by far has remained x-www-form-urlencoded -- forcing the simplest Web server application to understand multiple native character set, SGML entities and quoting, and URL character escapes to extract a single form value.

Fields and Scripts and Bells and Whistles, Oh My!

However, the power and elegance of the Web's remote invocation model (ahem) wasn't the reason it took off. HTML's new <FORM> tag became the first successful cross-platform UIDL, allowing Web developers to code once and see it rendered automatically under a litany of GUI windowing systems and widget sets -- even text-only line-mode browsers. Of course, the more common the denominator, the lower it can be. To this day, we have pop-up menus, but not sliders or drag-and-drop, to say nothing of a dynamic purchase-order template that could 'grow' new line-items on the fly.

Before we go off designing übersolutions, though, let's trace how we got here. As the emerging HTML community coalesced on the www-talk mailing list and hammered out HTML+ (which became HTML 2.0), NCSA Mosaic ran away with the show. After all, running code is at least as important as rough consensus, and the NCSA team built off a rich-text view object for X-Windows that made it possible to roll out new <INPUT> types easily. By the January 1994 Mosaic 2.0 release, it defined the modern web "widget set" as text areas, fill-in-fields, checkboxes, radio buttons, pick-lists, and pop-up lists, not to mention a now-lost input type called 'scribble' for pen-drawn images in the Jot media type (remember Go, Eo, and Windows for Pen Computing?).

Instead of new INPUT TYPEs, the baton of innovation passed on to scripting languages and cookies. The demand for even minimal client-side input verification and assistance inspired what became JavaScript (however minimal its connection to Java proper). Rather than being able to include spreadsheet-like declarative formulae, the trend was to Turing-complete programming languages running within the context of the web page & browser. That, in turn, highlighted the need for state management mechanisms to refer to earlier transactions. In the earliest days, HIDDEN input values were used to convey the entire state back and forth within a continuous exchange of forms, but cookies identified browsers over very long periods of time since cookies are sent with each access to a given domain. This was especially important as dialup consumers with dynamic IP address allocation obsoleted customer tracking by IP.

The net effect was that the Web reshuffled the long-standing Model-View-Controller paradigm for UI development. Consider W3Kit, developed at the late Geometry Center of the University of Minnesota back in 1994. Paul Burchard's system migrated a desktop GUI application that tiled user's sketches in 2D space onto the Web (Kali-Jot) by using an inline GIF for the View, the browser's scribble input as Control, and hidden fields to pickle the Model. As users changed the algorithm's parameters, each action submitted the form back to the server, which repopulated a fresh "real" Model object -- the one with the mathematical algorithms -- and re-rendered the new application view to ship back to the user.

Interregnum

W3Kit defined the genre of split-interface Web applications to this day. Web browsers became a very basic virtual machine for rendering Views (as HTML documents), with only a few Controls. The Web server maintained the Model, working around HTTP's stateless submission of form input elements by passing along the model's context by value (in hidden fields) or by reference (via cookies or HTTP login).

Since then, innovation has been limited to layout, usability, and a very few new INPUT TYPEs. Visually, forms have evolved alongside the rest of HTML with new fonts, colors, spacing, and alignment capabilities from Cascading Style Sheets (CSS). Structurally, HTML 4.0 introduced a few additional decorations to aid usability and accessibility for alternate platforms. Tags like LABEL and LEGEND help screen-readers associate controls with explanations, while OPTGROUP dividers distinguish presentation of menu segments. Furthermore, TABINDEX allowed designers to specify a default navigation order for filling in fields, as well as Focus and Blur event handlers when the user enters or exits the context of an INPUT element.

But there has been only one successful new input widget, the file-upload extension. RFC 1867 defines a new MIME type for POSTing forms, multipart/form-data. Each file uploaded is included as a separate MIME body part after the existing (still URL-encoded) data fields. Of course, this runs into the same problems that made file-PUT problematic enough to warrant the Web Distributed Authoring and Versioning (WebDAV) extensions to HTTP, including forcing the server to silently shut off clients posting larger files than it is prepared to receive. More typical is the case of <KEYGEN>, a deservedly-obscure Netscape-specific input element instructing the browser to generate keying material and public-key pairs for certificate submission.

The most confounding case, though, is the inability to accommodate forward-incompatible input types. Consider moving the HTTP username & password dialog box from being a browser window to an HTML form (and thus allowing the Webmaster to control its presentation and explanation). This required five new parameters: "AUTHUSER, AUTHSECRET, AUTHLOAD, and AUTHUNLOAD, and a SELECT element with the special type AUTHREALM." But:

[Using FORM] left open the possibility that such a form might be sent to the server as a GET or POST, exposing the credentials. Since AUTHFORM will not be understood by existing software, the various INPUT elements should not be rendered as a form, and this problem does not occur. A similar case might be made for using new elements where this proposal uses types on INPUT elements."

In fact, "innovation" in web interactivity (if it can be called that) has proceeded outside of HTML Forms entirely: VoxML for voice browsing, Wireless Markup Language for cellphones, Java applets, and even custom UI markup languages based on traditional paper forms automation formats.

XForms: A New Hope

In other words, FORMs recapitulate the central dilemma XML began with: How can we introduce new ontologies (tagnames, input types) and know their grammar rules (document type definitions; form input validation) while easing interoperability and reuse of domain-specific ones, which are tailored to particular applications, media, or devices?

So in due course, the World Wide Web Consortium has applied the same solution pattern. Now that basic HTML 4.0 capabilities have been translated into an XML-validatable series of modules as XHTML, the MarkUp Working Group has established an XForms subgroup to propose further innovations. Their initial goals include:

Support for handheld, television, and desktop browsers, plus printers and scanners

Richer user interfaces to meet the needs of business, consumer and device control applications

Decoupling data, logic and presentation

Improved internationalization

Support for structured form data

Advanced forms logic

Multiple forms per page, and pages per form

Suspend and Resume support

Seamless integration with other XML tag sets

Naturally, there are already several contending entrants for the title of 'next-generation forms markup language', so the group's first product was a requirements statement (see Table 1). As their public overview page at http://www.w3.org/MarkUp/Forms/ suggests, these requirements separate into three layers of concerns (although what follows is my own opinion and analysis, not the group's). First, the Presentation layer addresses rendition of interactors, whether as GUI widgets, voice prompts, or fax-back paper forms. Second, there is a Logical layer govering the order of form field fill-in, multipage and sequenced forms, and scripting for input validation. Finally, the Data layer adds more structure and coherency to existing text-string-only values by reusing other schemas.

Interoperability and Accessibility

Separation between purpose and presentation
Definition of form functionality in XML
Device and application independent navigation
Device and application independent event syntax

Presentation

Alignment with existing and emerging presentation mechanisms
Enhanced visual possibilities for form controls
Custom form controls

Forms Logic

Field calculations
Integration with the XML DOM

Interaction

Richer client/server interaction mechanisms
Security and authentication
Broader range of input devices
Preserving the current state of a form

Internationalization

Support for various languages and character sets
Region-specific data formats
Region-specific common field groups

Data types

Input validation
Field and data dependencies
Defining fields or fieldsets for arbitrary instances
Splitting a form across multiple pages
Forms oriented addressing scheme

Table 1. Excerpts from the W3C XForm group's XHTML™ Extended Forms Requirements Working Draft

It's hard to believe it's been over six years since HTML Forms were invented. Six calendar years, not Web years! But the dream hasn't died: the effort to smarten up forms is just the latest incarnation of the Software Engineering vision of automated interface design. The kinds of issues XForms faces are grounded in past SE research at each layer -- and the same kinds of traps and ratholes. Furthermore, the challenge is not just to automate human interface construction for the universe of Web-accessible devices, but to create automatable interfaces suitable for programmatic reuse on the "Semantic Web", as W3C terms it.

In the next section, we'll recap the theoretical roots of the problem and present a detailed analysis of serveral entrants' divergent strategies: Formsheets as declarative extensions to interact with any tag; Forms Markup Language to generate procedural forms, and XML Forms Description Language to replicate the role of paper precisely. And of course, we'll also have to ask what makes us think radical innovation will actually be adopted by the Web community -- or Is Worse Really Better?

XForms: New Challenges

There's a new space race of late: a quest to build the world's smallest Web server. The current record holder is the size of a match-stick head. iPic is a mere quarter cubic centimeter, yet includes a full TCP/IP stack and HTTP server!

But what about the world's thinnest Web client? Would you believe less than .01 millimeter thick? Xerox PARC has turned an ordinary sheet of paper into a functional Web browser. They recently demonstrated Web access through a fax machine. In their demo, the client takes a regular HTML form, prints it out with gridlines and checkboxes for its input fields, faxes it to a field worker, applies Optical Character Recognition (OCR) to the filled-in-form, submits the resulting HTTP transaction to the original website, and faxes back the printed results.

This little hack has its limits, of course. Consider using it at a travel website that 'helpfully' offers hundreds of airport locations in a pop-up list for the origin. And another copy of the same list for choosing the destination. Putting aside the wasted bandwidth of transmitting the world airport database (twice!), there's no way for this poor fax-back translator to recognize it should just give up and replace this pages-long pick list with a three-letter airport code input.

Today's Web FORMs are hopelessly tied to the original GUI of NCSA Mosaic for X Windows, circa 1994. That Xerox's "thinnest client" works at all is due only to the 2D graphic abstraction it shares with current GUI browsers. Stray much further from the Windows, Icons, Menus, and Pointer (WIMP) paradigm, and HTML FORMs fall over and can't get up.

For example, one of the standard canards of our Wonderful Twenty-First Century™ is that more people will soon access the Web from a cellphone than from a PC. That certainly could be true – but not by dint of compressing a WIMP interface into a four-line display!

Even more people could access it from an ordinary phone by Interactive Voice Response (IVR). But how would our robot concierge know what order to inquire for the origin and destination airport? Even more pointedly, how will it realize they cannot be the same airport?

Send in the… Experts?

Designing completely abstract user interfaces for the Web requires addressing three separable aspects: Presentation, Logic, and Data. Our virtual assistant needs to know how to 1) prompt the user, 2) do so in a specific order, and 3) recognize spoken or typed entries as valid airports. The first layer, Presentation, addresses rendition of interactors, whether as GUI widgets, voice prompts, or paper blanks. Second, the Logical layer governs the order of form field fill-in, multipage and sequenced forms, and scripting for input validation. Finally, the Data layer adds more structure and coherency to existing text-string-only values by applying richer schemas (types).

This kind of coordinated evolution is precisely the mission of the World Wide Web Consortium, whose XForms Working Group (WG) is tackling these interdependent issues. While XHTML™ brought existing HTML 4.0 usage into XML compliance, XForms was specifically chartered to innovate solutions to support handheld, television, and desktop browsers; deploy richer user interfaces to meet the needs of business, consumer and device control applications; improve internationalization; and decouple presentation, logic, and data. It also has more concrete engineering goals: supporting more structured data formats and multi-page forms; integrating well with other XML tag sets; and supporting suspend-and-resume of partially-filled-in forms.

Broadly construed, the XForms subgroup is tackling a long-cherished dream of Software Engineering: automatic user interface construction. Compiling an abstract functional interface into a working UI has been tackled in many ways; stepping back to understand that context will help us better evaluate specific XForm contenders. Specifically, we'll look at proposals for Formsheets, which add interactivity to any existing tag just as stylesheets add presentation hints; Forms Markup Language (FML), which generates procedural forms; and XML Forms Description Language (XFDL), which replicates the role of paper forms precisely. Whether the whole Web will be upgraded to any of these approaches is another question entirely…

Presentation: Device-Independent Widget Sets

The same write-once-run-anywhere rhetoric championed for the Java Virtual Machine (VM) applies to entire Web browsers as well. While the domain of discourse is pixels in the former and HTML INPUT elements in the latter, both are late entries in a long timeline of user interface VMs. X widgets, the Motif toolkit, the NeXTstep AppKit, the Macintosh Toolbox… these are only a few examples of User Interface Management Systems (UIMS) offering an abstract interactor set to software developers. By the early 90's, UIMS research abstracted one more step above them to offer multi-toolkit interoperablity. Tools like OPENSTEP or UC Irvine's Chiron-2 system bound virtual interactors to toolkit-specific peer objects on the fly. Allocate a scrolling text pane, and such meta-toolkits would bind to whatever the local window system's conventions were (left or right? Proportional or fixed? Pixel-at-at-a-time or line-at-a-time?).

Accessibility concerns drove complementary research that inferred presentation rules from actual renderings. William Gaver's SonicFinder (1989) added auditory feedback to mouse gestures in the Macintosh Finder interface. Even more ambitious, Georgia Tech's Mercator (1991-4) system automatically transformed X event streams into interactive auditory interfaces for the blind. Today, Everypath.com could also be cast in the same light, by applying an intelligent external model to interpret a stream of web pages for phones, pagers, palmtops, and television. As theoretical grounding for such inferences, CMU professor Brad Myers famously proposed seven fundamental affordances of mouse-and-keyboard direct manipulation GUIs in his Interactor Model (1990):

menu-interactor

move-grow-interactor

new-point-interactor

angle-interactor

text-interactor

trace-interactor

gesture-interactor.

Web browser FORMs today provide only two: text and menus. To this day, HTML doesn't offer sliders or other continuous range pickers. At the other extreme, it already hard-codes distinctions between pop-up and pick lists -- distinctions that can't be distinguished in voice or paper renderings! As we discussed last issue, HTML 4.0 and the latest User Interface extensions to Cascading Style Sheets (CSS3-UI) do patch up around the edges of this model. For example, form authors can now explicitly articulate the order to tab between fields; indicate LABEL text associated with a particular input control; and can change appearance on gaining or losing user focus as the "active field".

XForms has an opportunity to raise the level of discourse for Web UIs to reason not only about the affordances of GUI interactors, but also, in conjuction with the Web Accessibilty Initiative (WAI), to accommodate many other limited-interface situations. That means designing XForms for a UI virtual machine running on everything from cellphones to TV screens -- and, crucially, invisible systems, without humans in the loop at all. Forms, after all, are becoming the default Application Programmer's Interface (API( to Internet information. Tools like webMethods' Web Interface Definition Language (WIDL) allow new applications to reuse, say, FedEx's package tracking form. Providing richer interactor specifications is like annotating a header file to aid program reuse. Such specifications can help infer the range of legal inputs and expected outputs or exceptions that could be raised.

Logical: Process Sheets and Scripting

Abstracting up one more layer brings us to a discussion of input sequence, validation, and state management. Most Web forms are embedded in a larger process: selecting the city pair is only the first step in a series in order to buy an airline ticket. Furthermore, the Web model splits some of the processing for input validation with the client, using scripting languages and the Document Object Model (DOM) APIs. That at least allows some fields -- for example, sales tax -- to be calculated on the fly.

Validating that a three letter combination is indeed an airport code, on the other hand, can only be done by constraining the choice through a massive popup list, or by sending it back to the server for verification in a multi-step Web transaction. To date, the only way to manage the state of such a partially-complete form (if we cast the entire multipage airline reservation as a single XForm) is to send the entire state of every input field back to the server every time.

Explicitly articulating the logic behind these processes can make Web forms more powerful and portable. The XForms WG began with an aim to replace simple calculations done in JavaScript today with declarative, spreadsheet-like formulae. Knowing the role of various fields such as Item, Quantity, and Price can also let the browser dynamically add additional "rows" to a purchase order form.

Beyond tracking the logical dependence between individual data elements, though, the Xforms WG aims to mark up the presentation dependence of groups of data elements. This will allow browsers to present multipane, tabbed dialog boxes, or multipage forms from a single XHTML transfer. Voice browsers could use this information to disambiguate "barge-in" speech recognition when the user starts "filling in" a field before the voice prompt or menu is completed. Knowing about field subgroups could also allow interactive validation, such as sending a completed Zip Code field back to the Web server to fill out the companion City name field.

Data: Client-awareness of Data Types

Suppose I'm ready to submit my airline reservation. I've used the XForm to construct an XML document containing groups of fields I've filled in; perhaps even a few inputs in the airline's own specific XML namespace. Can I expect to send the subpart representing my itinerary to my friend without also including the credit card portion? At the same time, the airline may expect this entire form submission to be digitally signed to ensure that we agree on the exact specifics of the ticket I'm about to buy?

These are questions that require inference of the actual data types in use. Today's HTML FORMs reduce every kind of input type to a text string. Dates, prices, addresses, names… all illusions created by the page's author with natural (human) language. XForms will need to interoperate with other mechanisms to teach computers what various piles of XML might actually "mean." The XML Schemas effort is pinning down some concrete forms for encoding basic data types (integer, float, time, etc) and basic grammatical rules ("Every <ADDRESS> must contain a <POSTALCODE>"). Completing abstractions such as "Reservation," though, calls upon even more sophisticated metadata management. Resource Description Framework (RDF) is the technology W3C looks to for encoding semantics such as "origin and destination airport cannot be the same."

When a form designer can use this data layer to clearly indicate the type of input required (beyond just naming the field something heuristic like "expiryDate"), then it's also clearer where to annotate various inputs as 'secure.' Just as we classify cookies into two security classes today, we can then ensure parts of forms only flow over secure or public network connections.

Who's the Mastermind here?

This three-layered vision fulfills the Software Engineering dream of automatic user interface management. The literature relating to this dream dates back to the days of automating screen layout for text terminal access to mainframe databases and up through gesture recognition by demonstration for virtual reality environments. However clearly programmers can "see" the logical structure of the application and the role of user-supplied inputs at each stage, reducing that lattice to a clear sequence of commands and a considerably simpler end-user model of the process remains a painstaking trial-and-error proposition.

Not for lack of trying, though. The rise of WIMP GUIs in the 1980s arguably drove the commercial adoption of event-based, object-oriented programming as well as frameworks embodying both declarative and model-based UI development methodologies. First, the Mac popularized the event loop, putting the user truly in control of the program. Once rewritten as a series of event handlers onMouseDown, onKeyDown, and so on, it was a short hop to the object-oriented lessons of Smalltalk-80 and thence to C++, Objective-C, Common Lisp, and the rest.

Developers using the Model-View-Controller (MVC) pattern leveraged platform-specific Control and View widgets, as best embodied by NeXT's AppKit. Using its InterfaceBuilder, developers could visually wire a program's Model methods to controls such as sliders and buttons. The act of drawing a link to a target object and the action to be performed upon it declared a relationship that was stored along with layout geometry into a UI layout files. Separating the "program" and its UI thusly, even end-users could go back and edit the GUI of published applications (to localize it, for instance, or add keyboard shortcuts). Advanced research tools of this ilk could even apply externalized UI style guidelines and constraint-based layout engines to automatically synthesize, evaluate, and select dialog designs.

The Common Object Request Broker Architecture (CORBA) was supposed to be the revolution after OOP languages. Its Interface Definition Language (IDL) abstracted away the details specific to particular OO languages, operating systems, processors, and network topologies. The new dream was to cleave the programmers' and UI designers' lives at that interface. Suitably annotated IDLs would not only indicate how to setOriginAirport(), but also that it was to be invoked before setDestinationAirport() and the parameter itself was a typed IATACode string three characters long.

In the early 90s, Pedro Szekeley's group at USC's Information Sciences Institute built MASTERMIND along these lines. It combined the utility of prior dialog design tools with annotated interface definitions to automatically synthesize graphical input and presentation for a given application. As they described it:

In the model-based paradigm, developers create a declarative model that describes the tasks that users are expected to accomplish with a system, the functional capabilities of a system, the style and requirements of the interface, the characteristics and preferences of the users, and the I/O techniques supported by the delivery platform. Based on the model, a much smaller procedural program then determines the behavior of the system.

There are several advantages to this approach. The declarative model is a common representation that tools can reason about, enabling the construction of tools that automate various aspects of interface design, that assist system builders in the creation of the model, that automatically provide context sensitive help and other run-time assistance to users.

If the NeXTstep AppKit used by Tim Berners-Lee to develop the first Web browser could be said to underlie today's HTML FORM tag, Mastermind's complaints also ought to ring true to today's Web authors:

Most applications have interface requirements that go far beyond the menus and dialogue boxes that can be constructed using interface builders:

Data with complex structure

Heterogeneous data

Variable amounts of data

Time varying data

A musical notation editor is a fine example of all four objections: the complex visual form of a staff and its unique fonts; the different kinds of notes and their interrelationships (e.g. chords); in several kinds of data formats and the need to incrementally view a few bars out of a whole database; and the synchronization of the melody as symbols, commands to the synthesizer, and the output waveform. It's all quite beyond the range of even a fifth-generation Web browser, to say nothing of the additional assistance model-based UI tools offer in automating Undo, Help, and Internationalization facilities.

Forming a consensus

Not to say that XForms are intended to compose symphonies inside a Web browser! There are several candidate technologies for the WG to choose amongst, none of which have the expressive power to tackle that musical UI problem. We can still use it as a guide to understanding the various approaches on offer.

FormSheets. With a custom XML tagset for musical scores, an separate XML Stylesheet Language Tree (XSLT) transformation could render a graphical interface, while Formsheets would indicate which elements of the score were editable and would submit collected score changes back to the server.

To understand Formsheets, consider that to migrate from the specifics of HTML to arbitrary new XML tagsets, the XLink working group had to devise a mechanism for expressing the hypertext linking semantics of the <A> tag. Instead of dedicating a specific tag to that task, they allow designers to add xml-link attributes to any new tag to indicate that, say, <FLIGHT> could be a link source or anchor.

Anders Kristensen's WWW8 paper applied the same technique to turn any XML tag into a form input. Upon detecting a tag with an xf:form attribute, an external stylesheet could render the appropriate GUI or voice or printed input controls for that tag. Later, a Formsheet would be run over the current state of the XML document to produce a subset of name-value pairs, or more intriguingly, a structured XML tree to submit back to the server.

"We believe that specifying the core properties of forms independently of specific data and layout elements is a big advantage as it means that the same basic mechanism can be used regardless of the exact nature of the XML language at hand. This is analagous to how linking, stylesheets, and scripting are defined independently of the languages that use them, and this approach generally leads to better and more modular standards."

Paper Forms. If that seems too abstract, both XML Forms Description Language (XFDL, by PureEdge.com) and XML Forms Architecture (XFA, by JetForm) start with a detailed visual representation mirroring paper forms and add sophisticated formulas, logic, and digital signature security. As Kristensen described it:

"The approach taken in XFDL is very different from that of XForms. XFDL is an XML application and defines a fixed set of form elements, structural markup, GUI display elements, and scripting capabilities all within the same language. The emphasis seems to be on defining a markup language (form controls and other markup) which allows for the construction of visually pleasing on-line forms and which is powerful enough to faithfully reproduce their paper-based equivalents. Additionally XFDL adds scripting functionality (for checking form values on clients) through it's own scripting language. It doesn't seem to address form value construction or typing."

Interactive Forms. Form Markup Language (FML, by Stack Overflow AG) follows its own Third Way, adding a few new modules to existing HTML forms like multiple panes, reusable templates (remember that wasted duplication of the airport list?), popup alert panels, and calculated fields. The nifty trick to FML is that their Mozquito Factory authoring tool can 'compile' it back into Dynamic HTML and JavaScript so it's immediately usable on current browsers.

As Sebastian Schitzenbaumer, its inventor, put it, "Key problems for FML to solve are the definition of dynamic forms, online wizards and webapplications that cover multiple screen pages but originate from a single FML document, including input validation, navigation, event handling, template management and run-time calculations."

Can XForm transform the Web?

Isn't it convenient that the World Wide Web Consortium is doing all this heavy thinking for us? Perhaps -- if the XForms WG has goals clear enough to ever converge on a solution. True, they're not going down the rathole of "representing GUIs in XML," as XML User interface Language (XUL) does for Mozilla's own look & feel. But pursuing the dream of cleanly separating Presentation, Logic, and Data across the wide, barren plateau Software Engineering research has already mapped out could be equally futile.

One of the only lessons a degree in Economics is good for is that there are no $20 bills lying on the sidewalk. If model-based user interfaces were such a great idea, we'd already be using them. The XForms WG is struggling for clarity because it is trying to standardize and innovate simultaneously, a difficult balance indeed for an organization chartered to "Lead the Evolution of the Web."

And evolution proceeds by fits and starts -- the sheer list of yet other W3C technologies XForms must account for! XHTML Modularization, XML Schemas, Web Accessibility Initiative, Internationalization, Style Sheets, Synchronized Multimedia, Scalable Vector Graphics, Document Object Model, Common scripting languages (ECMAScript) -- it's hard enough to keep score on the home game… even before the committee tackles newer mandates, such as synchronizing form data among multiple devices or digital signature requirements.

Ultimately, the power to migrate to a new forms language is in Web authors' hands -- and if hand-coding a new-fangled XForm requires learning even a fraction of all these technologies simultaneously, it can't get anywhere. All the browser support in the world isn't going to make some of these approaches any more legible to an HTML hacker.

It's hard to believe that technology so central to the Web's success could be so static. Jim Whitehead recently presented an analysis of how the Web outstripped other hypertext tools in the early '90s. Its success was governed by the Network Effect: the increase in utility of the whole system with every new reader and publisher who chose to use HTTP, HTML, and URLs. Open publishing, decentralized control, anonymous surfing; it would appear that freedom (bordering on anarchy) was the Web's fundamental difference compared to HyperCard or Xanadu. Instead, Jim argued "Once Gopher and the Web came into direct contact, the richer content of the Web was far more capable of generating network effects than the more strictly controlled, yet more simple Gopher user interface."

That is to say, the Web won because the dominant GUI browsing idiom controlled the user experience so thoroughly that authors could expect to use the same fonts, layout, color, and input widgets across every platform from workstation to wristwatch. Mosaic was surely richer than Gopher, but it has proven just as tight a straitjacket around user conceptions of how to interact with this medium.

Sounds to me like an opening for the Next Big Thing…

References

[BPS+98] T. Bray, J. Paoli, and C. M. Sperberg-McQueen, eds. "Extensible Markup Language (XML) 1.0", World Wide Web Consortium Recommendation, 1998.

SonicFinder by William Gaver
http://www-crd.rca.ac.uk/~bill/refs/sonicfinder.rtf

iPic Web Server
http://www-ccs.cs.umass.edu/~shri/iPic.html.

[KR98] R. Khare and A. Rifkin. "The Origin of (Document) Species," in Proceedings of the 8th International WWW Conference, published as Computer Networks and ISDN Systems, Volume 30 (1998), issues 1-7.

Formsheets by Anders Kristensen
http://www8.org/w8-papers/1c-xml/formsheets/formsheets.html

XML Forms Architecture (XFA), by Gavin McKenzie / Jetform, Inc.
http://xfa.org/

Mercator, GUIs for the Blind, by Beth Mynatt, et. al.
http://www.cc.gatech.edu/gvu/multimedia/mercator/mercator.html

[Mye90] B.A. Myers. "A New Model for Handling Input", ACM Transactions on Information Systems, 8(2), July 1990, pp. 289-320

Declarative and Model-based UI, Lectures 15 and 16 by Brad A. Myers
http://www.cs.cmu.edu/~bam/uicourse/1997spring/schedule.html

Future of Forms, by Dave Raggett
http://www.w3.org/MarkUp/Group/WD-forms-ng.html (W3C Members Only)

XHTML Extended Forms Requirements, by Sebastian Schnitzenbaumer, Malte Wedel, Dave Raggett (W3C Working Draft)
http://www.w3.org/TR/xhtml-forms-req

Forms Markup Language, by Sebastian Schnitzenbaumer
http://www.mozquito.org/
http://www.mozquito.com/documentation/spec_xhtml-fml.html (alternate form)

[Sze96] P. Szekely. " Retrospective and Challenges for Model-Based Interface Development", 1996 International Workshop of Computer-Aided Design of User Interfaces (CADUI'96), Faculés Universitaires Notre-Dame de la Paix (FUndP), Namur, 5-7 June 1996. J. Vanderdonckt (ed.)
http://www.isi.edu/isd/Mastermind/Papers/DSVIS96.doc.zip

Mastermind, by Pedro Szekeley, et. al.
http://www.isi.edu/isd/Mastermind/mastermind.html.old
http://www.isi.edu/isd/Interchi-beyond.ps (use both)

Control Choices and Network Effects in Hypertext Systems by Jim Whitehead
http://www.ics.uci.edu/~ejw/papers/whitehead_ht99.html

XForms Working Group
http://www.w3.org/MarkUp/Forms/

Extensible Forms Description Language (XFDL)
http://www.PureEdge.com/xfdl
http://www8.org/w8-papers/4d-electronic/xfdl/xfdl.html (alternate form)