VRML is a markup language. It (basically) describes a static scene as a series of 3D objects. It supports inclusions (allowing reuse of objects) and uses a scoping mechanism to restrict visibility.
But it doesn't do collaborative, interactive virtual reality.
Why do I want collaborative, interactive virtual reality? Aside from the general fun of the idea, I believe that it will promote new ways of interacting (look at MUDs), new ways of programming and provide a better environment for information retrieval and management.
Did I mention fun? Fun!
However, the proposals do not address the fundamental issue: VRML is currently oriented towards a browser environment. It provides static declarations of object behaviour and does not allow one user's interactions with an object to influence another's. Simplistic approaches use server-side scripts to alter the description downloaded from the server as clients fetch different objects, but the end result is basically still static.
To achieve true VR, it is necessary to extend the definition of an object beyond static description to support interaction which alters its behaviour and appearance. Such an object could be conceptually similar to a merged VRML object description and a CORBA object interface.
Hmmm.
In a virtual world, such mechanisms will not be sufficient to allow available machinery to support the demands of hosting a virtual world. Some other constraint on the interaction domain will be required.
Fundamental to the notion of ad hoc direct interaction is that the initiator cannot be required to have previously prepared stubs for using the target object.
Container objects (including rooms) have an ability to distribute notification of events to all enclosed objects. Notice that the container object naturally scopes the distribution of notifications.
In a collaborative virtual reality, indirect interaction is essential, however, supporting wide-scale notification of events is extremely expensive. It is necessary to restrict the visibility (or propagation distance) of events.
An important attribute of these systems is that the client is dumb: the server does the work of rendering (in text) the scene for the client.
Extending a MUD engine to support 3D object presentation requires an architectural change to move rendering to the client. It is not feasible to perform rendering at the server because of the bandwidth and processing required to render a video streams for each of many clients.
Instead, each client must maintain a model of the current environment and render it themselves. This has the additional bonus of allowing the client to tune the level of detail, speed and other characteristics of the rendering.
The role of the server then is twofold: to maintain the state of the environment and to distribute updates to the attached clients. Obviously, a single server should not have to service all clients, the burden should be sharable across many machines for better performance.
Additionally, a client may not require notification of all events, but only those in which it is interested.
As a means of scoping, we introduce the notion of domains: a special class of object, like a container, with their own appearance and other behaviour. Domain objects exist within a three dimensional space, and have finite, rectilinear 3D volume.
All other objects must exist inside a domain. At any point in time, an object is contained within exactly one domain. When moving between ajoining domains, a large object may appear to protrude beyond its hosting domain.
A client program, rendering the view of an object, uses the description of the domain object for artifacts such as floors, sky, etc. Such descriptions should include a "translucence" factor for the general content of the domain ("air", "fog", "water"?) which will affect the rendered view within the domain.
Of course, it is possible to construct a domain object without constraining the view out of the domain. At this point, the effects of crossing domain boundaries intrude: to allow scalability of implementation, it is impossible to support views reaching an arbitrary distance.
Within a domain, the domain object is responsible for ensuring that all objects are able to receive notification of any event occuring within the domain. The client is responsible for filtering these events to those within the client object's view and retrieving the appropriate presentation data from its sources.
Ajoining domains must also exchange notifications. Given the structure of the virtual space, a domain is able to determine its spatial neighbours, and can request from them notification of events for subsequent redistribution to its own constituents.
However, some limit must be placed on the propagation of these events. Simplistic approaches based on the number of domain hops are attractive, but do not cope well with "thin" domains. It is desirable that the virtual distance between the domains is used as the basic criteria.
In addition, there are cases where it is necessary to receive notification of events beyond your normal "seeing" distance. A modelled telescope (for example) will require the ability to manually subscribe to distant domains.
Beyond restricting the general propagation of notifications, it will also be possible to scope the visual "importance" of an event. As an example, a distant tree on a plain (several domains away) should be visible to you, but the falling of a leaf should normally not.
This will be implemented by supporting a range of "levels of detail" in the events, which domains can use to filter those they propagate.
For simplicty, domains are hosted by a single server. That server is responsible for the distribution of notifications for all events occuring within the domain. While it will be possible to replicate the domain in future, we do not initially propose to deal with this complexity.
Note that the object descriptions must be fetched separately: they are not included in the notification. It will be a matter for the user's policy whether the object's own description is used or a local one substituted on the basis of the object's type. Object descriptions can refer to VRML or other files served by different servers to that hosting the domain (and for performance, this is indeed likely).
Given the structure of the virtual space, caching the object's presentation information might best be linked to the domain servers. This would reduce the distribution burden on an interesting object's host.
The client program must provide a means of either replacing or augmenting the presentation provided by an object with one of their own.
Uses of this could include tagging object types for the users quick recognition, replacing the default presentation for unpresented objects with a type-based presentation devised by the user, etc.