WEBDAV Working Group J.A. Slein INTERNET-DRAFT Xerox Corporation < > E.J. Whitehead, Jr. U.C. Irvine D.G. Durand Boston University F. Vitali University of Bologna February 1997 Expires August 1997
Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress".
To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.ietf.org/
Distribution of this document is unlimited. Please send comments to the WWW Distributed Authoring and Versioning mailing list, <w3c-dist-auth@w3.org>, which may be joined by sending a message with subject "subscribe" to <w3c-dist-auth-request@w3.org>. Discussions are archived at URL: http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/. The HTTP working group at <http-wg@cuckoo.hpl.hp.com> also discusses the HTTP protocol. Discussions of the HTTP working group are archived at URL: http://www.ics.uci.edu/pub/ietf/http/. General discussions about HTTP and the applications which use HTTP should take place on the <www-talk@w3.org> mailing list.
***Fabio - Many of the versioning requirements call for extensions to URLs, not to HTTP.
***Judy - There is controversy in the group about whether we should be extending HTTP or defining a separate protocol.
Other authoring applications have wanted to access document repositories or version control systems through Web gateways, and have been similarly frustrated. Where this access is available at all, it is through nonstandard extensions to HTTP that force clients to use a different interface for each vendor's service.
This document describes requirements for a set of standard extensions to HTTP that would allow distributed Web authoring tools to provide the functionality their users need by means of the same standard syntax across all compliant servers. The broad categories of functionality that need to be standardized are:
(Get rid of references to current tools altogether, or do more thorough research.)
***Judy - Query is not supported in the specification.
Attributes can be used to define fields such as author, title, subject, and organization, on resources of any media type. These attributes have many uses, such as supporting searches on attribute contents, and the creation of catalog entries as a placeholder for an object which is not available in electronic form, or which will be available later.
A hypertext link is a relationship between resources which is browsable using a hypertext style point-and-click user interface. Relationships, whether they are browsable hypertext links, or simply a means of capturing a interrelation between resources, have many purposes.
Relationships can support pushbutton printing of a multi-resource document in a prescribed order, jumping to the access control page for an resource, and quick browsing of related information, such as a table of contents, an index, a glossary, help pages, etc. While relationship support is provided by the HTML "LINK" element, this is limited only to HTML resources, and does not support bitmap image types, and other non-HTML media types.
AOLpress from America Online [1] currently "allows pages to add toolbar buttons on the fly using the HTML 3.2 <LINK REL....> tag. For example, your page can add toolbar buttons that link to a home page, table of contents, index, glossary, copyright page, next page, previous page, help page, higher level page, or a bookmark in the document."
***Fabio - The definition of locking here conflicts with the one that was used in the versioning requirements paper. More in a separate mail note.
4.3.1.2. Multi-Resource Locking. It should be possible to take out a lock on multiple resources in the same action, and this locking operation must be atomic across these resources.
***Judy - Multi-resource locking is not in the specification.
4.3.1.3. Partial-Resource Locking. It should be possible to take out a lock on subsections of an resource.
***Judy - Controversy on this issue at Irvine.
4.3.1.4. Multi-Person Locking. It should be possible to assign a lock to a single person or to multiple persons with a single action.
***Judy - Multi-person locking is not in the specification.
***Fabio - Add a statement that support for locking is optional. Also say that systems that do not support locking should provide some other type of consistency management.
***Fabio - The definition of a write lock should be this: A write lock states that no consistency problem will ever occur by changing the resource, not that no one else is allowed access to that resource. On the other hand, it can be said that access rights to successfully Unlocked resources should be allowed to all authorized users.
4.3.2.2. Read Locks. It should be possible, via HTTP, to indicate to the HTTP server that the contents of a resource should not be modified until the read lock is released.
***Judy - Read locks are not in the specification.
4.3.2.3. Lock Query. It should be possible to query for whether a given URL has any active modification restrictions, and if so, who currently has modification permission.
***Judy - Should add Unlock.
An author may wish to lock an entire web of resources even though he is editing just a single resource, to keep the other resources from changing. In this way, an author can ensure that if a local hypertext web is consistent in his distributed authoring tool, it will then be consistent when he writes it to the server. Because of this, it should be possible to take out a lock without also causing transmission of the contents of a resource. Since it should not be assumed that because a resource is locked, that it will necessarily be modified, and since many people may wish to have simultaneous guarantees that a resource will not be modified, but still not want to modify the resource themselves, it is desirable to have a "read" lock capability. A read lock, by being less restrictive, provides better support than a write lock for providing a guarantee that a resource will not be modified. Put differently, a read lock states that the resource is guaranteed not to change for the duration of the lock. A write lock states that a resource is guaranteed not to change only if the owner of the lock does not change it, and only the owner of the lock may change it.
It is often necessary to guarantee that a lock or unlock operation occurs at the same time across multiple resources, a feature which is supported by the multiple-resource locking requirement. This is useful for preventing a collision between two people trying to establish locks on the same set of resources, since with multi-resource locking, one of the two people will get a lock. If this same multiple-resource locking scenario was repeated by using atomic lock operations iterated across the resources, the result would be a splitting of the locks between the two people, based on resource ordering and race conditions.
Partial resource locking provides support for collaborative editing applications, where multiple users may be editing the same resource simultaneously. Partial resource locking also allows multiple people to simultaneously work on a database type resource.
***Judy - It should be possible to notify the server that one no longer intends to edit the resource.
***Judy - Support for notification of intent to edit is found in the specification only in the context of version management. The specification does not allow such notification for non-versioned resources.
Experience from configuration management systems has shown that people need to know when they are about to enter a parallel editing situation. Once notified, they either decide not to edit in parallel with the other authors, or they use out-of-band communication (face-to-face, telephone, etc.) to coordinate their editing to minimize the difficulty of merging their results. Notification is separate from locking, since a write lock does not necessarily imply a resource will be edited, and a notification of intention to edit does not carry with it any access restrictions. This capability is supportive of versioning, since a check-out typically involves taking out a write lock, making a notification of intention to edit, and getting the resource to be edited.
***Judy - Not in the specification.
There are many cases where the source stored on a server does not correspond to the actual entity transmitted in response to an HTTP GET. Current known cases are server side include directives, and Standard Generalized Markup Language (SGML) source which is converted on the fly to HyperText Markup Language (HTML) [2] output entities. There are many possible cases, such as automatic conversion of bitmap images into several variant bitmap media types (e.g. GIF, JPEG), and automatic conversion of an application's native media type into HTML. As an example of this last case, a word processor could store its native media type on a server which automatically converts it to HTML. A GET of this resource would retrieve the HTML. Retrieving the source would retrieve the word processor native format.
This requirement should be met by a general mechanism which can handle both the "single-step" source processing described above, where the source is converted into the transmission entity via a single conversion step, as well as "multi-step" source processing, where there are one or more intermediary processing steps and outputs. An example of multi-step source processing is the relationship between an executable binary image, its object files, and its source language files. It should be noted that the relationship between source and transmission entity could be expressed using the relationship functionality described above in "4.2. Relationships."
***Judy - Not in the specification.
During distributed editing which occurs over wide geographic separations and/or over low bandwidth connections, it would be extremely inefficient (and frustrating) to rewrite a large resource after minor changes, such as a one-character spelling correction. Ideally, support will be provided for transmitting "insert" (e.g., add this sentence in the middle of a document) and "delete" (e.g. remove this paragraph from the middle of a document) style updates. Support for partial resource updates will make small edits more efficient, and allow distributed authoring tools to scale up for editing of large documents.
***Judy - Need more details of the semantics of copy and move, especially for collections, versioned resources, and resources with attributes.
***Judy - In the specification, but not mentioned here: Destroy, Undelete, CopyHead, MoveHead.
There are many reasons why a resource might need to be duplicated, such as change of ownership, a precursor to major modifications, or to make a backup. In combination with delete functionality, copy can be used to implement rename and move capabilities, by performing a copy to a new name, and a delete of the old name. Due to network costs associated with loading and saving a resource, it is far preferable to have a server perform a resource copy than a client. If a copied resource records which resource it is a copy of, then it would be possible for a cache to avoid loading the copied resource if it already locally stores the original.
It is often necessary to change the name of a resource, for example due to adoption of a new naming convention, or if a typing error was made entering the name originally. Due to network costs, it is undesirable to perform this operation by loading, then resaving the resource, followed by a delete of the old resource. Similarly, a single rename operation is more efficient than a copy followed by a delete operation. Ideally an HTTP server should record the move operation, and issue a "301 Moved Permanently" status code for requests on the old URL. A move operation, if implemented with attribute support, should also preserve most attributes across a move. Note that moving a resource is considered the same function as renaming a resource.
***Judy - Not in the specification.
In [3] it states that, "some URL schemes (such as the ftp, http, and file schemes) contain names that can be considered hierarchical." Especially for HTTP servers which directly map all or part of their URL name space into a filesystem, it is very useful to get a listing of all resources located at a particular hierarchy level. This functionality supports "Save As..." dialog boxes, which provide a listing of the entities at a current hierarchy level, and allow navigation through the hierarchy. It also supports the creation of graphical visualizations (typically as a network) of the hypertext structure among the entities at a hierarchy level, or set of levels. It also supports a tree visualization of the entities and their hierarchy levels.
In addition, document management systems may want to make their documents accessible through HTTP. They typically allow the organization of documents into collections, and so also want their usersto be able to view the collection hierarchy through HTTP.
There are many instances where there is not a strong correlation between a URL hierarchy level and the notion of a collection. One example is a server in which the URL hierarchy level maps to a computational process which performs some resolution on the name. In this case, the contents of the URL hierarchy level can vary depending on the input to the computation, and the number of resources accessible via the computation can be very large. It does not make sense to implement a directory feature for such a namespace. However, the utility of listing the contents of those URL hierarchy levels which do correspond to collections, such as the large number of HTTP servers which map their namespace to a filesystem, argue for the inclusion of this capability, despite not being meaningful in all cases. If listing the contents of a URL hierarchy level does not makes sense for a particular URL, then a "405 Method Not Allowed" status code could be issued.
AOLpress from America Online currently supports "Save As..." dialog boxes, and graphical network visualization of a portion of a site's hypertext structure, which they term a "mini-web." FrontPage from Microsoft [6] also currently supports a graphical network visualization and additionally supports a tree visualization of a portion of a site's structure.
4.8.2. Make Collection. Via HTTP, it should be possible to create a new collection.
The ability to create collections to hold related resources supports management of a name space by packaging its members into small, related clusters. The utility of this capability is demonstrated by the broad implementation of directories in recent operating systems. The ability to create a collection also supports the creation of "Save As..." dialog boxes with "New Level/Folder/Directory" capability, common in many applications.
AOLpress from America Online currently supports this capability through their "Save As..." dialog box, and their custom MKDIR method.
***Judy - new definitions
***Judy - Does the specification support this?
4.9.1.2. Policy-free Versioning. Haake and Hicks [5] have identified the notion of versioning styles (referred to here as versioning policies, to reflect the nature of client/server interaction) as one way to think about the different policies that versioning systems implement. Versioning policies include decisions on the shape of version histories (linear or branched), the granularity of change tracking, locking requirements made by a server, etc. The protocol should not unnecessarily restrict version management policies to any one paradigm. For instance, locking and version number assignment should be inter-operable across servers and clients, even if there are some differences in their preferred models.
4.9.1.3. Separation of resource retrieval and concurrency control. The protocol must separate the reservation and release of versioned resources from their access methods. Provided that consistency constraints are met before, during and after the modification of a versioned resource, no single policy for accessing a resource should be enforced by the protocol. For instance, a user may declare an intention to write before or after retrieving a resource via GET, may PUT a resource without releasing the lock, and might even request a lock via HTTP, but then retrieve the document using another communication channel such as FTP.
***Judy - The specification assumes that it's the server, not the user, that determines the policy -- order of operations and what operations are required.
***Judy - "Separation of resource retrieval and concurrency control" is supported by the Request-Lock, Request-Intent, and Request-Working-Loc parameters to the CheckOut method and the discovery mechanism. This is all embroiled in the controversy over how much latitude we want to give servers, how simple we want to make things for clients, whether we want to rely on the discovery mechanism, etc.
4.9.2.1. Access to specific versions via a URL. For each version of a resource, on a server, there should be a URL to refer to that version. That is, a version is itself a resource.
This is required for version-specific linking, and for non-versioning client support.
4.9.2.2. A URL to denote a versioned resource itself, rather than specific versions of it.
This identifier is needed for queries about the versioning status of a resource, that do not apply only to one version of that resource. It is also used to perform operations (such as adjusting attributes, changing locks, or reassigning URLs) that affect all versions of a resource, rather than any specific version.
4.9.2.3. Direct access to a server-defined "default", "current" or "tip"version of a resource.
This is one of the simplest ways to guarantee non-versioning client compatibility. If no special version information is provided, the server will provide a default. This does not rule out the possibility of a server returning an error when no sensible default exists, but it does provide a standard way to support non-versioning clients, and one of the most common version access disciplines.
4.9.2.4. A way to access common related URLs from the URL of a particular version or of a versioned resource:
It must be possible in some way for a versioning client to access versions related to a resource whose URL it has. In particular, access to the "default" version of a resource is an extremely important operation, that a client should be able to perform at any time that a URL for a particular version or for a versioned resource is seen.
***Judy - Specification provides some, but not all, of these navigationpaths.
4.9.2.5. A way to retrieve the complete version topology for a resource. There should be a way to retrieve information about all versions of a resource. The format for this information must be standardized so that the basic information can be used by all clients. Other specialized formats should be accomodated, for servers and clients that require information that cannot be included in the standard topology.
4.9.2.6. A way to determine whether a given URL points to a version of a versioned resource.
***Judy - Are we requiring that you be able to tell this just by examining the URL?
4.9.2.7. A way to distinguish, given a URL of a version, the part of the URL that identifies the version from the part that identifies the versioned resource.
***Judy - Do we really have to (want to) require that you be able to find out the URL of the versioned resource by examining the URL of the version? Is the requirement really just that there be some way to find out, for any version, the URL of its versioned resource?
***Judy - Specification does not provide a way to find out the URL of the versioned resource(s) to which a version belongs.
Being able to determine the URL of the versioned resource makes it possible to implement browsing the version tree.
It also supports some comparison operations: It makes it possible to determine whether two URLs designate versions of the same versioned resource. However, given the phenomenon of URL aliasing, it is insufficient to determine that they are not versions of the same resource.
***Judy - If 4.9.2.8 - 14 are intended to require separate operations for each of these functions, they conflict with the approach taken in the WEBDAV specification.
4.9.2.8. A way to request exclusive access to a version of a resource (Lock). (See Section 4.3 "Locking" above.)
Since not all systems implement lock-based access, the protocol should not require clients to take out a lock before editing, nor should it require servers to support locking.
4.9.2.9. A way to release exclusive access to a resource (Unlock). This is the inverse of Lock.
4.9.2.10. A way for a client to declare an intention to modify a resource (Reserve). (See Section 4.4 "Notification of Intent to Edit"above.)
This operation is required before any versioned update. Its effects may vary depending on server policy, from locking a resource, to forking a new variant, to a NOOP on servers that do not track sessions or restrict updates. If this operation returns a version number, the client is required to make sure that it uses a copy of the data associated with that version number of the resource for any update operations it carries out. Servers that wish to enforce a mandatory GET operation before update, should simply use a fresh version identifier on the return from this operation.
4.9.2.11. A way to declare the end of an intention to write a resource (Release). This is the inverse of Reserve. Typically, servers will commit updates at this time, and return a final version identifier if possible and if it was not already returned.
4.9.2.12. A way to submit a new version of a resource (PUT). The server should be able to attach it to the correct part of the version tree, based on the version number associated with the resource before its modification.
4.9.2.13. A way for a client to request a version identifier for a checked out version. Such an identifier will not be used by any other client in the meantime. The server may refuse the request.
4.9.2.14. A way for a client to propose a version identifier upon submitting a version of a resource. The server may refuse to to use the client's suggested version identifier.
4.9.2.15. A way for a client to supply metadata to be associated with a version. (See Section 4.1 "Attributes" above.)
The kinds of data supplied here might be simple textual comments or more structured data. An ability to attach arbitrary fields and content is probably required, but a standard set of attributes that would enable interoperation would be useful. At a minimum, it must be possible to associate comments with a version, explaining what changes were made, when it is checked in.
4.9.2.16. A way for a server to provide a version identifier to be used for a resource in further operations.
This identifier must accompany client requests to manipulate the resource. In particular, if a resource is being modified, the identifier must be used when submitting an update. This allows servers to track active sessions by assigning version identifiers when documents are retrieved, locked, or reserved.
4.9.2.17. A way to track resources that have been Reserved (Session Tracking). This allows the server to ensure that the user operating on a resource is the same one who Reserved it.
***Judy -- Not in the specification.
***Judy - Uncheckout is neither in the requirements nor in the specification. Do we need it?
It provides infrastructure for efficient and controlled management of large evolving web sites. Modern configuration management systems are built on some form of repository that can track the revision history of individual resources, and provide the higher-level tools to manage those saved versions. Basic versioning capabilities are required to support such systems.
It allows parallel development and update of single resources. Since versioning systems register change by creating new objects, they enable simultaneous write access by allowing the creation of variant versions. Many also provide merge support to ease the reverse operation.
It provides a framework for access control over resources. While specifics vary, most systems provide some method of controlling or tracking access to enable collaborative resource development.
It allows browsing through past and alternative versions of a resource. Frequently the modification and authorship history of a resource is critical information in itself.
It provides stable names that can support externally stored links for annotation and link-server support. Both annotation and link servers frequently need to store stable references to portions of resources that are not under their direct control. By providing stable states of resources, version control systems allow not only stable pointers into those resources, but also well-defined methods to determine the relationships of those states of a resource.
It allows explicit semantic representation of single resources with multiple states. A versioning system directly represents the fact that a resource has an explicit history, and a persistent identity across the various states it has had during the course of that history.
Martin Cagan, Continuus Software, Marty_Cagan@continuus.com Dan Connolly, World Wide Web Consortium, connolly@w3.org Ron Fein, Microsoft, ronfe@microsoft.com David Fiander, Mortice Kern Systems, davidf@mks.com Roy Fielding, U.C. Irvine, fielding@ics.uci.edu Yaron Goland, Microsoft, yarong@microsoft.com Phill Hallam-Baker, MIT, hallam@ai.mit.edu Dennis Hamilton, Xerox PARC, hamilton@parc.xerox.com Andre van der Hoek, University of Colorado, Boulder, andre@bigtime.cs.colorado.edu Gail Kaiser, Columbia University, kaiser@cs.columbia.edu Rohit Khare, World Wide Web Consortium, khare@w3.org Dave Long, America Online, dave@sb.aol.com Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org Ora Lassila, Nokia Research Center, ora.lassila@research.nokia.com Larry Masinter, Xerox PARC, masinter@parc.xerox.com Murray Maloney, SoftQuad, murray@sq.com Jim Miller, World Wide Web Consortium, jmiller@w3.org Andrew Schulert, Microsoft, andyschu@microsoft.com Christopher Seiwald, Perforce Software, seiwald@perforce.com Richard Taylor, U.C. Irvine, taylor@ics.uci.edu Robert Thau, MIT, rst@ai.mit.edu
[2] T. Berners-Lee, D. Connolly. "HyperText Markup Language Specification - 2.0." RFC 1866, MIT/LCS, November 1995.
[3] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource Locators (URL)." RFC 1738, CERN, Xerox PARC, University of Minnesota, December 1994.
[4] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1." RFC 2068, U.C. Irvine, DEC, MIT/LCS, January 1997.
[5] A. Haake, D. Hicks. "VerSE: Towards Hypertext Versioning Styles", Proc. Hypertext'96, the Seventh ACM Conference on Hypertext, 1996, pages 224-234.
[6] Microsoft. "Microsoft FrontPage for Windows Data Sheet." WWW page. http://www.microsoft.com/msoffice/frontpage/productinfo/brochure/ default.htm.
[7] K. Osterbye. "Structural and Congitive Problems in Providing Version Control for Hypertext", Proceedings of the ACM Conference on Hypertext, Milano, Italy, 1992, pp 33-42.
[8] "Version Control in Hypermedia Databases" Technical report TAMU-HRL-91-004, Hypertext Research Lab, Texas A&M University. 1991.
Judith Slein Xerox Corporation 800 Phillips Road 128-29E Webster, NY 14580 EMail: slein@wrc.xerox.com E. James Whitehead, Jr. Department of Information and Computer Science University of California Irvine, CA 92697-3425 Fax: 714-824-4056 EMail: ejw@ics.uci.edu David G. Durand Department of Computer Science Boston University Boston, MA
EMail: dgd@cs.bu.edu Fabio Vitali Department of Computer Science University of Bologna ITALY EMail: fabio@cs.unibo.it