Chapter 6

Spaces

A computer game consists of entities -- objects or creatures -- and the spaces they inhabit. The spaces are the medium upon which the action of the game occurs. A chessboard defines chess-space, the squarish path around the board defines Monopoly-space, and an ice-rink defines hockey-space. The spaces of computer games are portrayed upon the glass of computer display screens.

The action of the first video games was confined within a single display screen, but designers soon realized that the small screen could be a window into a much larger space. A shifting viewpoint allowed a user to inspect parts of a complex world in detail, and control of this shifting allowed him to move through world-space. The space viewed through the screen was sometimes two-dimensional, allowing a convenient mapping from the 2-D screen to a 2-D fragment of the space. With more difficulty, a three-dimensional space could be viewed through the screen, using the familiar convention of perspective that makes a flat landscape painting have "depth."

The manner in which the viewpoint moved through a space depended on the nature of the space. Scrolling was one of the first view-changes tried in video games, perhaps because scrolling text had long been a standard convention in editor programs. An alternative to scrolling was to connect screens edge-to-edge, as in Atari 2600 Adventure and Rocky's Boots, so that the view shifted from screen to screen in discrete hops. A third method of viewing a large 2-D space was to divide the screen into independent windows which showed different parts of it. For a perspective view into a 3-D space, the obvious view-transformations were moving the viewpoint through the space, and changing the direction and magnification of the view (panning and zooming).

Scrolling worked well for moving around in a plane, whereas room-to-room hops of viewpoint allowed wrap-around paths, one-way connections, and non-unique diagonal rooms. A network of edge-connected rooms could have a very strange interconnection pattern, as the mazes of VS Adventure illustrated.

Dividing the screen into independent, overlapping windows was an idea originally used in the user interface to the programming language Smalltalk. These windows were not really thought of as views into parts of a connected space. Rather, they were used as a means of simultaneously showing several related batches of information. The screen itself was the global space through which the user moved (with his cursor), with the corners of partially covered windows being the links to the contents of those windows. The metaphor was that the windows were sheets of paper on a desk, and a corner of a sheet could be grasped and the sheet pulled to the top of the pile. Windows were the sub-units of a space filling the screen, whereas in a network of rooms, the screen was a sub-unit of a larger, surrounding space.

Individual windows also had an internal spatial structure. Some windows scrolled through larger spaces of text. Windows full of graphics could shrink or grow. Parts of some windows could be activated, producing "pop-up menus," which might, in turn, contain their own subparts that would expand upon activation. A pop-up menu behaved like a Jack-in-the-Box -- a menu would appear, overlaying part of the window that spawned it, and then disappear after it had been used. Scrolling and pop-up menus made a window an entrypoint into a complicated information-space through which the user could move.

Zoom

The ability to zoom the viewpoint in to examine details in a small region of a space makes it practical for the space to contain details at different scales. Maps on paper do a good job of showing several levels of detail, considering the unzoomable medium upon which they are printed. A computer display offers the possibility of making interactive maps, which can pan and zoom. A zooming map could handle a much larger range of scales than a printed map. For example, a zooming map of the solar system could handle the million-fold scale difference between the orbit of Pluto and the orbit of Phobos, Mars's inner moon.

A dynamic map also offers the possibility of mapping things that move with respect to one another. Planets orbit the sun. Continents drift. Species spread and then die out. Birds migrate. Explorers wander. Embryos grow and develop. Proteins curl up as they are constructed. These processes can be mapped in time, as well as space. An interactive simulation of a process is probably preferable to a fixed-sequence "movie" of the process in action, but making a movie may be much simpler than making a simulation. A movie of the archetypal events in a process can be quite informative, particularly when the movie can be stopped, and run at different speeds forward and backward, and when the user can pan around to locate and zoom in on events of interest. Anyway, the sequence of events in some processes admits no variation. The script has already been written for continental drift on Earth.

For processes with events on widely varying time scales, time-zoom is necessary. This allows the process to be viewed at different rates. A movie of the first few minutes of the universe, starting with the Big Bang, needs time-zoom because events happen in trillionths of a second (and faster) at first, but rapidly slow down to a scale of seconds (and, later on, millennia).

A "conceptual zoom" is an idea proposed by artist Aaron Marcus. The name "zoom" was given to the magnification of real scenes with lenses. Its effect is the expansion of a region in an image, revealing details therein.

A conceptual zoom preserves the idea of expansion and emerging detail, but discards the requirement of magnifying real objects. A conceptual zoom can fade from one superimposed representation to another as scale changes. For example, a satellite photo of New York City could expand, giving way to a street map. The street map might, in turn, give way on expansion to architectural blueprints of individual buildings.

An enormous amount of data would be necessary to record the floor plans of all the buildings in New York City. A similar problem exists in every system that allows zooming and panning with computer-generated images. It is impractical to store complex images for every point that might be zoomed in on. (Although, if every architecture firm in the world stored its plans in data banks accessible through an Archi-Net, maybe . . .) One good solution is to generate images based on parameters stored for the area of the image being examined. Some phenomena like galaxies or atoms are sparse, with vast empty areas surrounding relatively few detailed objects. In other cases, many instances of a once-defined object can be placed, perhaps randomly, as details in a region of an image. For example, one grass-plant subroutine could provide all the detail for a vast prairie. The grass subroutine might have parameters allowing plants of different ages, so that rather than being an orchard of identical plants, the prairie could have both random and systematic variation in its vegetation. A specific grass-plant, in turn, might expand to reveal details dependent on the parameters from which that plant was generated.

A more abstract variety of conceptual zoom is the ability to "go inside" objects which appear on the screen. Going inside an object can be an abrupt change of view (to its interior) rather than a continuous magnification. Jumping inside an object is an abbreviated zoom, just as room-to-room motion is an abbreviated pan. Like panning or scrolling, continuous zooming requires a continuous space, like a plane. The view-jump of going inside allows movement through abstract, discontinuous spaces with many levels of detail. For example, a computer is organized hierarchically -- logic gates and memory cells are combined into registers, adders, counters, memories, and processors. A user could move through this hierarchy, going inside the symbol for a complex assembly to examine its internal construction.

3-D

We live in a three-dimensional space. Our stereoscopic vision perceives objects as having three-dimensional positions. It is therefore a natural goal to simulate 3-D space through the window of the computer screen. However, adapting a 2-D screen to display a 3-D space offers some challenging problems. Furthermore, the commonly available pointing devices -- joystick, mouse, and tablet -- are two-dimensional, and so not really adequate for manipulating three-dimensional objects.

A perspective transformation can be used to project points in a 3-D space onto a 2-D one. This is like using a flashlight to make a flat shadow on the wall from a 3-D birdcage. The equations for implementing the perspective transformation are simple and well-understood. However, in general, current personal computers are not fast enough to generate complex 3-D objects for use in interactive programs. Several multiplication and division operations (which are slow on personal computers) are needed for every point in a 3-D object, whereas the display routines for 2-D bitmap objects can get by with the fast operations of addition and table lookup.

The surface of a 3-D object can be represented with many triangular facets, as on a jewel. The display routines for such an object must separate the visible facets from the hidden ones and calculate the color of each facet. In addition, it is desirable to give a texture to each facet, to simulate the highlights, reflections and shadows produced by light sources, and to smooth the object's faceted surface. All of these operations are very slow, taking hours on large computers to construct the images for a few seconds of animation.

The image on the screen of a computer game must be produced in a fraction of a second. Some games have achieved 3-D effects by restricting the position and rotation of objects in the game and of the viewpoint so that display routines can be simplified and thus be faster. For example, Zaxxon, an arcade video game, showed a climbing, banking, diving airplane. The point of view was behind, above, and to the side of the plane. The player could change the plane's altitude and lateral position as the terrain moved under it. On the screen, the terrain scrolled diagonally under the plane, and the plane moved up and down, or sideways at right angles to the scrolling terrain. Both the scrolling terrain and the airplane could be shown with (2-D) bitmap images, so that Zaxxon was able to give a very powerful 3-D effect using fast 2-D display techniques. It is important that the plane was confined to flying in a narrow corridor over the terrain -- as long as the plane stayed in this narrow range of positions relative to the viewpoint, the same image for the plane could be used at all of its positions. Without constraints on the plane's position, the player might fly up near the viewpoint, which would require a display algorithm for making airplane images of different sizes.

Flight simulation programs have to display the terrain under the aircraft from different heights and angles. Programs have appeared on personal computers which can handle this display task in real-time. Since, in a flight simulator, the point of view is out the window of the airplane, an image of the plane itself does not need to be displayed. However, if there are other objects in the simulation, like another plane to have a dogfight with, then it must be possible to view these objects from different angles and at different distances. It is fitting that flight simulation programs use joysticks for input since the computer joystick input device is descended from the joystick in early airplanes which was mechanically linked to the flaps of the plane.

A human has two eyes, and uses slight differences in the images falling upon his two retinas to judge the distance of objects in his field of view. Vision based on two side-by-side points of view is called stereoscopic. Using a single computer screen as a window into a three-dimensional world throws away the benefits of stereoscopic vision, because both eyes see the same (flat) image on the screen. Systems which present separate images to each eye can give a very convincing impression of three-dimensional depth. Stereoscopy is used for viewing 3-D models of complex molecules, like proteins. A few arcade video games have tried it, too.

At the University of Utah in the early days of computer graphics, a wonderful idea incorporating stereoscopy was explored -- the head-mounted display. A lightweight stereoscopic display was built into a helmet for the user to wear, along with a device for sensing the orientation and position in the room of the helmet. Two images were generated, based on the positions and orientations of the two eyes, which were known from that of the helmet. This allowed an image of a three-dimensional object to hover in the middle of the room while the user moved his head (and the rest of himself) to examine the object from different angles. This 3-D object existed in the same space as the user. The head-mounted display offered interesting possibilities. An architect could look at a proposed building in its context of the city skyline, with him floating high above the city, and then he could descend and enlarge the building so that he could walk through it. With a 3-D pointing device, like the one that sensed the helmet's position, a user could grasp and deform the ephemeral objects that surrounded him. Thus, on the practical side, a user could sculpt models of parts to be manufactured.

On the artistic side, he could sculpt moving three-dimensional forms unfettered by gravity or structural materials. Two wearers of head-mounted displays could enter into the same universe of objects, working cooperatively on a project, or throwing fireballs at one another in a fantasy game. The promise of the head-mounted display has not been explored over the last fifteen years. Only a few have been built. Perhaps today's cheaper technology makes the head-mounted display a good solution in some area of industry or entertainment.

Adventure Game Real Estate

A computer network can allow several players to play in a common game even though they are physically separated from one another. The players communicate over the telephone lines through a central host computer. The game being played can be, among others, an adventure game full of rooms and objects. Several players can simultaneously inhabit the same adventure game world. This can work with any kind of adventure game, text or graphic, but the multi-player graphical adventure game is an interesting case to consider. A player of a graphical adventure game sees one room at a time on the screen. Two players in the same game world would see one another only if they were in the same room. Having two or more human-controlled actors in an adventure game allows for both cooperation and competition among the players. A band of cooperating players could explore an unknown labyrinth, in the style of Dungeons and Dragons. Two players (or two tribes of players) could fight, stealing objects from one another, having tugs-of-war over important weapons or treasure, and throwing things at each other. Algorithmically-controlled creatures could be indistinguishable from human-controlled actors. Since all the players and creatures in the game would be represented on the screen by arbitrary small colored shapes, the game would be like a masquerade ball, with everyone anonymous.

A computer network can have dozens, or hundreds, or thousands of users connected to it simultaneously. Several of these people could agree to play together for a specified time, after which the game would be over. However, another possibility is to have an on-going adventure game, an evolving scenario into which people could enter when they logged on to the network. In an on-going game, events would be occurring continuously as different players entered and left the game, and as the creatures in the game responded to the situations they encountered. It would seem that after a player had gotten a valuable treasure, he would need to stay connected to the network 24 hours a day in order to protect his new possession from the predatory hordes of players and creatures in the game. Clearly, a habitual player of an on-going adventure game would need a secure place to store his gear -- his weapons, armor, treasure, maps, tools -- while he was gone from the game-world.

Imagine that the adventure game world was a giant checkerboard of rooms -- a thousand screens by a thousand screens. Each player could have a particular room as a stronghold in which to store his stuff and know that it would be safe while he was logged off the network. The player would, in effect, own that room. This room would be the player's bank vault, his castle, his inner sanctum. Ownership of such a vault-room would be important to a serious player in this game world. In addition to the usual network charges, it is conceivable that a player might pay real money, say 50 cents, for ownership of a vault-room. For the entrepreneurially inclined, this presents an interesting opportunity: create an imaginary world and then sell real estate in it.

An adventure world full of empty rooms might not be worth much, but as players began to build their castles here and there, to connect secret magical passages between distant castles, to build pretty-looking facades at castle entrances, the real estate might begin to appreciate. Certain areas might develop a concentration of dwellings -- that is, become a town. The few remaining empty screens within the area of a town might acquire some value, based on their location. Owning some space in an adventure world might be worthwhile because of the things that had come to go on there, the players or creatures that hung out there, or the traffic that passed through.

Real estate derives its value not so much from its intrinsic ability to support a building as from its context of surrounding property. A lot's nearness to desirable places like parks, offices and schools strongly affects its value. The marshy land along Buffalo Bayou is valuable not because of its soil or vegetation, but because it is surrounded by the rest of Houston. An imaginary world could acquire some value from the structures built within it, and the activities and commerce going on there. In fact, since more independent worlds could be created fairly easily, the inhabitants of a world and the things they had built would be the entire wealth of such a world.

Player as Perceiver

Philosophers ask the question "Is the physical world to any degree dependent on a perceiver for its existence?" Or, to cast the problem in a specific situation: "Would a tree falling in the forest make a sound if there was no one there to hear it?" These questions have always seemed rather ridiculous to me. However, in a simulated reality in which every detail seen by the viewer is explicitly generated, the question's significance is easier to understand. Can events occur, unseen, in parts of a simulated world beyond the viewer's direct observation? This really is a question of consistency. That an unobserved event occurred must be inferred from its observable consequences. The bat in Atari 2600 Adventure moved objects around regardless of whether the player was there to observe it. Evidence of the bat's actions could be seen later in the rooms where these actions occurred.

In a sense, a computer game for one player is a universe created for one perceiver. Without the player-perceiver, the game-universe would not exist. In a galaxy simulation program, stars could be randomly generated to surround the player as he wandered through the galaxy. Consistency would demand that on returning to the same part of the galaxy, the same configuration of stars should be there. To model a galaxy like our Milky Way, which contains l00 billion stars, it is clearly impractical to record data on position and brightness for each individual star. Stars must be somehow described in the aggregate. A scheme of random star-generation based on densities in various regions of space could repeatably generate the same stars for a given region of the galaxy, thus solving the return-visit-consistency problem. All l00 billion of these stars would exist, implicitly, in the star-generation algorithm, but only a small fraction of those could ever be visited, explicitly, by a player of the game.

Most computer games, up to the present time, have populated their spaces with a limited number of objects. Each object required a few bytes of memory to describe it, so the number of objects was limited by the amount of memory available. Adventure games, for example, used tables to describe the position and properties of the various treasures, weapons, and tools in the game. An adventure game could contain hundreds of table-defined objects, but not millions. In contrast, an algorithm which used a position in space to generate the characteristics of the objects in that region could define millions or billions of objects.

In the case of both table-defined and algorithm-defined objects, an input number -- an object number or a position -- was accepted and some output data was returned describing the properties of that object or of the object at that position. For example, using the number "5" to access the object tables of an adventure game produced the properties of object number 5. In the galaxy simulation, the coordinates of a sector of space could be used to calculate the characteristics of the stars in that sector. The difference between the two methods was that the table required some memory for each possible input value, whereas the algorithm didn't. Thus, the table method was restricted to a limited range of input values -- a few hundred or a few thousand -- but the algorithm method wasn't.

Coordinates can be thought of as a way of numbering all the sectors in a galaxy. The star-algorithm takes a sector number and produces a description of the stars in that sector. A sector can be treated as an object, and the stars within it as characteristics of that sector. Thus, this descriptive method assumes that each possible object has a number, and that the object's characteristics can be described as a function of that number.

This idea of connecting the coordinates in space with an enumeration of objects allows a space to be populated with an immense number of objects. An enumeration of the l00 billion stars of the Milky Way would require about 27 bits, which is slightly less than 4 bytes of memory. An enumeration of all the cubic inches in a cube l00,000 light-years across (just the right size to enclose our galaxy) would require l8 bytes. Neither 4 nor l8 bytes is too large to serve as input to a routine which generates characteristics for the objects at each position.

An algorithm-populated space seems to solve the problem of providing detail for whatever region of a space a viewer chose to zoom in on. However, some problems present themselves. Zooming demands consistency between close-up and distant views of the same region. Considering two views of the same region which differ a millionfold in scale, it is impractical to search through all the details of the small view and its neighbors in order to construct the large view. There must be a way to extract the prominent details of a large-scale view without scanning all the details of its sub-views.

Although there are problems to be solved, it seems that it may be possible to construct a simulation of a galaxy in which the viewer could zoom in on any part of it, zooming down from the spiral-shaped entirety, through clouds and clusters of stars, to finally center upon one growing star that fills the screen. Such a galaxy model might have many identical or nearly identical stars scattered through it, concentrating resources upon the description of the spatial distribution of stars. Or, alternatively, it might generate many different types of stars, with various sizes, temperatures, ages, and compositions. Having zoomed in on one particular star, it would be possible to use the characteristics generated for that star to drive a planetary system model, which assigned likely distances, masses, and compositions to planets based on the properties of the parent star. As with the galaxy model, it should be possible to zoom down from the entire planetary system to view any individual planet. The star-system model would have assigned certain characteristics to the planet. In turn, these characteristics could drive a planet model, which generated surface features like mountains, craters, atmospheric flow patterns, and storms.

Using details generated in one model to drive a second model would be called cascading the models. This cascading can run through several levels. There is no reason that the galaxy model could not be driven by a simulation of the universe, populated by clusters and clouds of galaxies. Thus, by cascading models of the universe, galaxies, stars, and planets, a simulated universe could be constructed. This one little program would contain too many planets to ever be fully explored in all the future history of humanity.

Furthermore, fanciful universes could be constructed that contained more details than there are particles in the real universe. (The universe contains l088 electrons, which can be enumerated with a mere 37 bytes.) However, there is a difference between detail and complexity. A space can be designed with an infinite number of details -- for example, a fractal into which one could zoom perpetually. But in the fractal new details repeat the old. Complexity means diversity, variety, new stuff. The complexity of a simulated universe is limited by the number of clauses and provisions in the simulation program. A simulated universe could never approach the complexity and diversity of the real universe, which has many surprises waiting for us. However, the universe designer should not be too disheartened to learn that he can create universes that, although being more repetitive than the real universe, can have more details in them.

---------------------------------------------------