Interaction between Real and Virtual Humans in Augmented Reality Selim Balcisoy and Daniel Thalmann Computer Graphics Laboratory, Swiss Federal Institute of Technology EPFL, DI-LIG, CH- 1015 Lausanne, Switzerland (ssbalcis, thalmann}@lig.di.epfl.ch we propose a system to overcome these limitations by using the augmented reality(AR) technology. We can summarize the AR as a combination of distinct technologies spanning from virtual reality to computer vision. By definition AR enhances the user’s view of the real world with visual information from a computer. In our case we need to enhance the virtual humans’ synthetic vision or other sensors with information obtained form the real world also. One possible solution is to acquire data from the real world through a high level interface from a computer vision system with one or more cameras. This vision system should obtain essential data from the real world, and transform it into a machine understandable form. Simply put, the vision system should sample the real world. There are several examples of such interfaces, mainly 2D or 2 1/2D vision systems to track human body or head motions and gestures [5][11]. Our intention is to implement an open software system where the high level interfaces can be used by different input sources such as a human operator or a fully integrated 3D vision system. As implementing a fully operating 3D vision system is an advanced research topic in itself, and as we are not interested in implementing a 2D vision system with major limitations, we decided to use a human operator to obtain information from the real world. We use a single static camera to obtain a color image of the real scene with a human actor. A human operator feeds the interaction system with necessary input data by using 3D input devices, and a high end graphics workstation renders the synthetic world with virtual actors in correct correspondence to the real world. This synthetic image is merged with the image from the real scene. The resulting output images are displayed on a large video monitor facing the human actor in the real scene for visual feedback. We developed two different softwares to design a virtual scene and to calibrate a single video camera. Under question of interaction our central issue is to let virtual humans suitable for acting just like a human actor on a theater stage. Human actors are following a text based script, and portraying a character on the stage. They have the ability to understand this script and perform the acting with some improvisation. Their virtual colleagues have no real cognition and therefore are unable to read and understanding a written text. In our case we let our virtual humans follow a strict Abstract Interaction between real and virtual humans covers a wide range of topics from creation and animation of virtual actors to computer vision techniques for data acquisition from real world. In this paper we discuss the design and implementation of an augmented reality system which allows investigation of d@erent real virtual interaction aspects. As an example we present an application to create real-time interactive drama with real and virtual actors. 1 Introduction Though virtual human models have been in existence for the last few years mainly for research purposes to simulate human movements and behaviors, only recently, there has been very encouraging interest from outside the academic world. Virtual humans have potential applications in entertainment and business products such as films, computer games, and distributed virtual worlds; in populating empty 3D worlds, or representing a remote participant in a distributed virtual world, or as TV a talk-show host. New applications are demanding new ways of interaction between real and virtual humans as we will investigate in this paper. Until now virtual humans have been ‘living’ in homogenous virtual environments. State of the art virtual environments are human designed worlds with low level of detail compared to our real world. T o achieve higher immersive experiences we need to reduce some basic limitations of the current virtual reality technology: Rendering of photo realistic, detailed and interactive environments in real-time. Although the current computer graphics technology can model, animate and render the human figures with near photo-realism in real-time[6], we cannot say the same about rendering of a complex virtual world. Usage of restrictive human machine interfaces like gloves, magnetic trackers, head mounted displays. Current human-machine interfaces with excessive connections are hampering the interactivity. Another Eict is that they limit the usage of the virtual reality technology for a wide range of applications. In this paper 31 1087-4844/97 $10.00 0 1997 IEEE organic body to be modeled with SkeletonEditor. Metaballs are used to approximate the shape intemal structures which have observable effects on the surface shape. Each metaball is attached to its proximal joint, defined in the joint local coordinate system of the underlying skeleton which offers a convenient reference frame for positioning and editing metaball primitives, since the relative proportion, orientation, and size of different body parts is already well-defmed. The designed body shape and face models are integrated using a real-time body deformations library, DODYLIB [12], and IRIS Performer graphics toolkit [SI. DODYLIB is built on the HUMANOID software for animating and rendering virtual humans in real-time on SGI Workstations. The IRIS Performer allows textures on 3 0 surfaces. By using our model configuration, we can apply different textures on each body part to simulate simple clothing. Using hardware texture mapping, performance is same as without texture, but applying texture is a good way of enhancing surface realism. machine understandable script in order to behave like human actors. We designed and implemented a novel software architecture to verify all our concepts. This architecture integrates several existing input devices, such as Spaceball, and has interfaces for hture possible extensions like 3D vision systems. The virtual human creation and animation is based on the existing HUMANOID 2 ESPRIT European project software [2]. A software layer to create task oriented scripts has been developed over the HUMANOID Agent structure [3] to produce virtual actors. Finally, in an example sequence virtual humans are integrated into a real theater stage as virtual actors using the augmented reality technology. 2 Creating and animating the virtual humans Currentiy there are several virtual human creation and animation software sets like: Marilyn from Swiss Federal Institute of Technology in Lausanne (EPFL) & University of Geneva (UG), or Jack from University of Pennsylvania (UPENN). Our virtual human creation and animation software, Marilyn, was partly developed in the framework of the HUMANOID Esprit project. One of the objectives of this European project is to create virtual humans with deformable body, face and hand. Another objective is to achieve agent controlled human figure animation. In this chapter we briefly present the basic procedure for creating a virtual human using the our software tools, and then present some human figure animation modules which are currently used in our system. 2.2 Body animation for the virtual humans HUMANOID environment supports many facilities for body animation. In this paper we used a subset of these, which were adequate for performing basic actions. We can analyze virtual human motions under two distinct groups: robotic movements to perform low level tasks like locomotion of the body from one point to another point, and gestures to express current state of mind of an actor. We used motion motors to perform low level tasks. For gestures we used a set of keyfi-amed animation sequences. Motion Capturing and Predefined postures A traditional way of animating virtual humans is playing keyframe sequences. We can record specific human body postures or gestures with a magnetic motion capturing system and an anatomical converter[7], or we can design human postures or gestures using the TRACK system[l]. Motion capturing can be best achieved by using a large number of sensors to register every degree of freedom in the real body. Molet et a1.[7] discuss that a minimum of 14 sensors are required to manage a biomechanically correct posture. The raw data coming from the trackers has to be filtered and processed to obtain a usable structure, The software developed at the Swiss Federal Institute of Technology permits converting the raw tracker data into joint angle data for all the 75 joints in the standard HUMANOID skeleton. TRACK is an interactive tool for the visualization, editing and manipulation of multiple track sequences. To create an animation sequence we create key positions of the scene, store the 3D parameters as 2D tracks of the skeleton joints. The stored keyframes, from the TRACK system or magnetic tracker, can be used to animate the virtual human in real-time. We used predefmed postures 2.1 Design and rendering of virtual humans The creation of a virtual human is performed in two separate parts: the design of the hce and of the body shape. Later the face and the body shape are integrated using a real-time deformation software library. For the face, the operations conducted in a traditional sculpture can be performed by computer for computer generated objects. Our sculpting software is based on the Spaceball, a 6D interactive input device. This allows the user to create a polygon mesh surface. Local deformations based on an extension of FFD [9] are applied while the Spaceball device is used to move the object and examine the progression of the deformation from different angles. Mouse movements on the screen are used to produce vertex movements in 3D space f?om the current viewpoint. Local deformations make it possible to produce local elevations or depressions on the surface and to even out unwanted bumps once the work is near completion. For the body shape, we use an interactive metaball editor, BodyBuilder [lo], for shape designers. We start the shape design by fwst creating a skeleton for the 32 and gestures to perform realistic hand and upper body gestures for interpersonal communications. Motion Motors We used two motion motors, one for the body locomotion and the other one for the placement of the endeffectors which are hands and foot in our case. The first one is a walking motor developed by Boulic[2], and the second one is a inverse kinematics motor developed by Emering[3]. Current walking motor enables virtual humans to travel in the environment using instantaneous velocity of motion. One can compute walking cycle length and time from which necessary skeleton joint angles for animation can be calculated. The inverse kinematics motor defmes several chains in the body skeleton (left arm, left leg, etc.) for end effectors, and performs inverse kinematics on these chains. This motion motor outputs skeleton joint angles also. We can investigate our Augmented Reality System in three subtopics. The Figure 2 presents the data flow diagram with those three subtopics: Data acquisition from the real world, real-time processing and rendering, and compositing. 9 1””1 Camera - device 0 Viiual Scene Actor Guidance Rendering 3 Augmented reality system overview 3.1 System configuration We designed our system using off the shelf, commercial equipment. The principal hardware platform is a Silicon Graphics OnyxTM Realily EngineZTM (Onyx) graphics workstation with four MIPS R4400 processors, a single graphics pipeline, and a Sirius VideoTM (Sirius) real-time video capture device. The Sirius acquires image sequences from a Sony HyperHad Camera on its analog input. The graphics engine of the Onyx generates the virtual scene, and both the image sequences are chroma-keyed by the Sirius. The output of the Sirius is distributed to a Digital Belacam, a stage monitor for visual feedback for the real actor, and a user monitor close to the Onyx for operator. The motion capturing for keyfiamed sequences is performed with an Ascension Technology Flock of BirdsTM magnetic tracker with ten sensors and extended emitter. Figure 1 shows our hardware configuration. Video Monitor Legend Figure 2. Data flow diagram Data acquisitionfrom the real world A flawless combination of real and virtual worlds is only possible if the virtual world obtains information about the current state of the real world in terms of positions of the real objects and humans, and lighting. This information needs to be acquired as and when it changes in time. For static objects and most of the time for lighting, we just need to obtain the necessary data only once. For moving objects and for human actors we need real-time data acquisition from the real world to update the virtual world. In our system information about real objects, and lighting are measured on the real scene and entered into the application in the setup phase. Our AR system has currently one real-time input source from the real scene namely a video camera to sample the real world. Camera Calibration A correct correspondence between the real and virtual worlds can be achieved if the virtual camera characteristics exactly match with their real camera counterparts. We choose a pin hole camera model with 6 extrinsic and 9 intrinsic parameters. The extrinsic 9 Figure 1. Hardware configuration 3.2 System operation 33 input from the operator and other users are processed simultaneously to determine the current state of the virtual actors and objects. Operator and other users have access to several different input devices. The data fi-om such input devices are transformed to joint angles to animate the virtual actors or to 3D coordinates to transform virtual objects. The rendering is implemented on the IRIS Performer toolkit to achieve the highest possible framerates on Silicon Graphics workstations. In our application rendering accounts for 70% of the delay between two frames. To achieve higher fi-amerates and higher degree of realism we applied several well know tricks like: Low polygon count for background objects, and mapping rich textures on them. Adjustable Level of Detail for different kind of applications. In DODYLIB we can select three different levels of detail for virtual humans. We choose low level of detail for computing demanding applications to keep the framerate high enough. Implementing a synthetic “fake” shadow for the virtual actor by projecting the shape of the body on the ground. Usually a single virtual actor with the highest level of detail and a synthetic shadow can be rendered faster than 10 frames per second. Compositing To mix video sequences from our graphics engine and camera, we use conventional chroma-keying technique with some modifications. In chroma-keying technique an intermediate color, acts as a “keyhole” and determines where one video layer shows through another. Compositing is performed by the Sirius, which has a blender and a chroma-key generator. Currently commercial video production studios are using the blue room technique for chroma-keying, which works well for placing human actors in virtual sets. This approach has some major limitations like: Setting up a dedicated studio. Usage of real background or real objects is limited by keying color. We decided to design the virtual scene as the blue room. In our case black is the keying color, which allows us to use any room as a studio. On the other hand the black color is rarely used to render 3D objects and actors, which fi-ees us fiom any design limitations. Figures 3 shows the virtual scene with virtual human. Real objects, a table and a chair, are represented in the virtual scene as 3D models in black color. Virtual objects which augment the real world are rendered in full color. Figure 4 presents the result of merging real and virtua scenes. The real objects correctly occlude the virtua objects and virtual human. parameters are three for the position and three for the orientation of the camera. The intrinsic parameters are: Lens distortion: three for radial two for tangential distortion. Camera interior orientation parameters: two principle points and one principal distance (focus length). Electronic Influence: one for the x, one for the y distortion of the CCD array. To determine these parameters we used a standard resection algorithm [4]. The calibration process demands well known points in the image. Common approach is to have landmarks or a grid on the scene. Instead of some landmarks or grids we use furniture, chair and table, to obtain some well-known points in the real scene. We can summarize our objectives for such an approach under two topics: As our operating area is large, we cannot use small grids which are commonly used for several applications [ 111. Landmarks are inflexible and will distort the realism on the stage. As we have a static camera position, we do not need any further complex camera tracking algorithms. The system passes the camera parameters to the rendering engine as virtual camera parameters to perform the perspective transformation to the computer generated images. Until now we discussed off-line operations like camera calibration. During an interactive session we have human actors and some real objects in locomotion. The changes of position and orientation should be registered by the system in real-time also. As mentioned before there are some systems with limited performance using computer vision. We decided to carry on with one or more human operatorshers as in a virtual reality application. User input The input from a user is restricted to guiding a virtual human. He can interact with real or virtual objects through his representation in the virtual world namely an avatar. In our current system we implemented software interfaces for a remote user for guiding a virtual human with Spaceball and keyboard. In near future with proper hardware and software a remote user will be able to participate through Intemet with a PC or through POTs(Plain Old Telephony) with a touch tone telephone. 0 Operator input An operator uses several 3D input devices to modify a virtual world according the changes in a real world. Operator can move virtual objects, guide virtual humans, or enter high level commands for managing a script. The operator uses the same virtual human guidance tools as a remote user. Real-time processing and rendering Real-time processing of the data and rendering of the virtual scene is implemented on an Onyx. Real-time 34 animation modules, like motion guidance, object transformation, activation of pre-recorded animation sequences. Our application is connected to a file system which contains scene data about 3D models and virtual humans. This data is read during the initialization of the system. The virtual humans are animated with the library AGENTlib [ 3 ] dedicated to the coordination uf perception and action and the combination of motion generators. The AGENTlib defines agent entities responsible for the integration of a fixed set of perception senses and a dynamic set of actions. The Core integrates the DODYLIB with the AGENTlib. The DODYLIB is responsible for real-time body deformations and integration of the textures for rendering. Finally the virtual scene is rendered using the IRIS Performer toolkit. script data r-xGF- I intefices Visual data C, -D IAGENTlibl Figure 5. Software architecture block diagram Figure 4. Merged image 4. Software architecture We defined three different data structures in the Core of our software application. The Figure 6 presents the Core data structures and their intemal connections. In modeling interactions with virtual humans the key challenging issues lie in combining several different topics like artificial intelligence, computer vision and virtual reality. On the other hand human-machine interfaces should be designed to let human users perform similar tasks with different input devices or mediums. A possible scripting ability should be considered too, to create long thematic animations. To integrate all these requirements and to provide an open system we propose the following software architecture with several layers, where the whole animation sequence can either be scripted or interactively driven. The Figure 5 presents a block diagram of our software architecture. We have two interface modules far two distinct input sources: One interface module far virtual human guidance and another one for acquiring visual data from the real scene. The input from the real world in the form of 3D device signals is processed by dedicated high level device interfaces. The visual input can be obtained by a computer vision system or by a human operator. This module should update the Core continuously about the current state of the real world. The Core directs output of the interfaces to specific Figure 6. Core data structures The ACTOR sets up the connection to DODYLIB for body representation and to the AGENTlib for motion control. The ACTOR data structure contains essential information about the virtual actor concerning rendering, interaction capabilities, current position and state. The primarily state is IDLE, where the ACTOR is not 35 I managed by any TASK. Depending on the TASK an ACTOR can have several distinct states. For example 8' a user is guiding an ACTOR with a Spacebar the ACTOR cannot be manipulated by another TASK far body locomotion, but can perform inverse kinematics for the left hand, which should handled by some other TASK. The second one is OBJECT, which manages the rendering information, current position and state of a virtual object. Again OBJECTs have similar but more simplified states. A TASK sets the state of a virtual object to OCCUPIED to perform picking and to FREE after performing the task. Some static OBJECTs have permanent STATIC state if they represent a real object. The last data structure is the TASK, which is more complicated than the former ones. The TASK data structure contains rules for specific interactions, provides connections between ACTORs and motion generators and user interfaces. A TASK may control several ACTORs and OBJECTs. It manages the interactions between ACTORs and OBJECTs, and allows device drivers to control virtual humans. TASKS may call other tasks sequentially or concurrently. We use the ACTION structure from the AGENTlib to manage concurrent and sequential TASKs[3]. We defined several low level tasks for user interfaces and motion generators. As an example we implemented a high level task for drinking a cup of tea while sitting on a chair, which we term DRINK. This task requires connections to an ACTOR and to several OBJECTs: chair, table and cup. If the DRINK task is activated, this task checks if the virtual human is sitting on the chair, and whether this chair is close to the cup. For simplicity, we assume that this cup is on the table close to the chair. The actor activates a low level task REACH to reach this chair. This low level task is controlling the walking motor to reach a given point with a given orientation. The REACH task checks continuously if the actor has reached the chair. After reaching the chair successfully, another low level task, SIT, is activated to let the virtual human sit on this chair. This is a k e y h e player to animate the ACTOR from standing posture to sitting posture realistically. After finishing the sitting, the DRMK task checks the current position of the cup and one of the hands of the virtual human may move according to the position of the chair. The DRINK tasks activates the inverse kinematics motion generator to reach the cup with a certain orientation. Afterwards the I I I I IDLE I I drink I I I Figure 7. A high level TASK example: DRINK Figure 8. Virtual actor performing DRINK At the highest level we have the animation loop, where we implement a script like animation flow. The scripting is implemented in two ways: state driven and time driven. We can consider the animation loop as a metatask, where the goal is to perform a script. Creation of several detailed tasks enables us to define complex behaviors with multiple parameters. This complexity is important in performing realistic behaviors. 5 Interaction virtual human holds the cup, the head is set as the new By interaction, we mean triggering some meaninghl actions in a virtual human in response to body, social gestures or verbal output from a real human. As our AR system does not acquire audio sequences fiom the real world, verbal communication remains out of the scope of this paper. The interaction between real and virtual actors has several problems to be solved. One key issue is the human machine interface. In our AR application we let target for the inverse kinematics module, and the cup is left back on the former position. The Figure 7 shows the sequential task management at the DRMK task. The Figure 8 is a snapshot of the performance of DRINK task action. 36 the virtual humans share the same stage with the real humans. In this case the human actors cannot use any 3D input devices or magnetic sensors which will reduce the realism of an application. One possible solution is using computer vision [5].We choose to let a human operator manage the interactions between the real and virtual actors according to a scenario. We can modi@ our system and let participants use input devices f a different kinds of applications. In this chapter we will investigate interactions between real humans and virtual objects or humans in general terms with a reference to current state and possible future improvements of our system. postures for virtual humans. To ease the transition between several postures we propose to create whole gestures. We define a gesture as a combination of postures to express a specific state of mind, like disagreement. Gestures can be created for each virtual human separately to give them a unique character. On the other hand we can create a repertoire of gestures for long interactions like a discussion, and choose any gesture randomly to animate the virtual human. The management of gestures can be done by autonomous agents [3] or a human operator. To test the acceptance of such an interaction technique with a virtual human we prepared a simple application based on our augmented reality system. We created a set of gestures using the TRACK system. The gestures contained several basic arm and head movements which we can investigate during a normal conversation such as shaking the head. We entered this set into our software environment as keyframe sequences. An operator is using a keyboard to trigger appropriate ones. Our test environment is a real table, two real chairs, one for a real human and the other for a virtual human, and a video monitor facing the real human. As a participant sits on his chair a virtual human appears on the monitor sitting on the other chair. According to the actions of the participant an operator triggers some recorded keyframe sequences. In the beginning the test persons had difficulties in watching the video monitor all the time. We had to adjust the place of the monitor several times. For the realism of the interaction we got positive responses. The test persons enjoyed this new kind of experience. We can list some of the interesting results: The video monitor, if it is well placed and large enough, is not a major disturbance factor. If the head height of the virtual human and the position of the video monitor are close to each other, test persons experience the similar feeling as if looking at a real person. We observed that orientation in a synthetic world is much easier with such a setup. In many VR applications the user looks into a virtual world fiom the first person or third person perspective. In our case he can see the whole mixed world including himself. Although we had a limited set of recorded gestures, the test persons were not frustrated. The operator selected sometimes wrong gestures on purpose or interrupted a continuing gesture and started another one immediately to surprise the test persons. Such semirandom actions added realism to the whole experience. According to this result we will add some improvisation possibilities for the autonomous virtual humans during interactions for future applications. The Figures 9 and 10 are snapshots fiom a live demonstration. According to a script the real human and virtual human hold a discussion in a bar. Finally they cannot argue anymore and the virtual human leaves the synthetic stage. 5.1 Object manipulation In our application virtual humans are ;able to interact with virtual objects or with the virtual representations of real objects. Considering the trivial fact that a virtual human cannot move or deform a real object, interactions between real objects and virtual humans have some limitations. If real objects are static ones like a table or a chair, a virtual human can perform tasks like sitting on a real chair or putting a virtual cup on a real table. Virtual humans use representations of real objects, in the virtual world to interact with them. With well known coordinates and a 3D model of a static object, a virtual human can go close to this object and interact with it. It is quite difficult even for a human to determine exact coordinates of a moving object. Virtual humans have more difficulty in handling real objects in motion. T o interact with moving real objects like a flying ball, we definitely need a 3D computer vision system, capable of tracking moving objects in 3D. Real humans have similar limitations like their virtual counterparts. They can use haptic VR devices to touch a virtual object, or get an immersive 3D visual feedback fiom a head mounted display. But such techniques are not useful for potential applications we are considering in this paper. A possible solution is to use computer vision techniques to register the hand movements of a participant, and then defoim or translate virtual objects according to these movements. The interaction between real humans and virtual objects is currently performed by an operator, who tracks the movements of the real human and translates desired virtual objects. 5.2 Real human virtual human interaction In principle interactions between real and virtual humans should be similar to interactions between real humans. In this paper we are interested in nonverbal interactions between real and virtual humans. Nonverbal interactions are concemed with body postures and their effects on other peoples’ feelings. We should provide realistic body postures, and transitions between these 37 European Project VISTA and Foundation for Scientific Research. Swiss National References E13 Boulic R., Huang Z., Magnenat Thalmann N., and Thalmann D., “Goal Oriented Design and Correction of Articulated Figure Motion with the TRACK system”, Computers and Graphics, Pergamon Press, Vol. 18, No 4, 1994, pp. 443-452 [2] Boulic R., Capin T., Huang Z., Kalra P., Lintermann B., Magnenat Thalmann N., Moccozet L., Molet T., Pandzic I., Saar K., Schmitt A., Shen J., Thalmann D., “The Humanoid Environment for Interactive Animation of Multiple Deformable Human Characters”, Proceedings of Eurographics’95, 1995 [3] Boulic R., Becheiraz P., and Emering L., “Heterogeneous Actions Integration for Autonomous Virtual Human and Avatar Animation with the AGENTlib framework”, D. Thalmann & N. Magnenat Thalmann Eds., to appear in 1997 [4] Gruen A., Digital close range photogrammetry: development of methodology and systems. Chapter 4 d the book “Close Range Photogrammetry and Machine Vision”, Editor K.B. Atkinson, Whittles Publishing 1996 [5] Maes P., Darrel T., Blumberg B., and Pentland A. “The ALIVE System: Full-body Interaction with Autonomous Agents”, Proc. of the Computer Animation’95 Conference Geneva, IEEE Press, April 1995 [6] Magnenat Thalmann N.& Thalmann D., “Digital Actors for Interactive Television”, Proc. of the IEEE, Vol. 83, no.7, July 1995 [7] Molet T., Boulic R., and Thalmann D., “ A Real Time Anatomical Converter For Human Motion Capture”, Eurographics workshop on Computer Animation and Simulation’96, R. Boulic & G. Hegron (Eds.), pp. 79-94, ISBN 3-21 1-828-850, SpringerVerlag Wien [SI Rohlf J., Helman J., “IRIS Performer: A High Performance Multiprocessing Toolkit for Real-Time 3P Graphics”, Proc. SIGGRAPH’94, ACM Press, 1994 [9] Sederberg T.S. & Parry S.R, “Free From Deformation of Solid Geometric Models”, Proc. SIGGRAPH’86, ACM Press, pp. 151-160 [lo] Shen J., “Human Body Modelling and Deformations”, PhD Thesis, LIG-EPFL, 1996 [ l l ] State A., Gentaro H., Chen D.T., Garret W.F., and Livingston M.A., “Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Trucking”, Proc. of SIGGRAPH’96 New Orleans, ACM Press, 1996 [12] Thalmann D., Shen J., Chauvineau E., “Fast Realistic Human Body Deformations for Animation and VR Applications”, Proc. Computer Graphics International’96, Pohang, Korea, 1996 Figure 9. Interaction 1 Figure 10. Interaction 2 6 Conclusions In this paper we described an AR system, which is capable of letting virtual humans perform script driven action. We described a system to solve the problems of integration of many distinct technologies from VR to computer vision. We presented current problems and possible solutions for interaction between real and virtual humans. Finally we proposed a soRware architecture to master these problems. In near future we will concentrate our efforts to develop a 3D computer vision system to acquire data kom real world, and perform interactions between real and autonomous virtual humans. 7 Acknowledgments The authors would like to thank Patrick Keller and Mireille Clavien for 3D models and keyfi-amed animation sequences, and Shrikanth Bandi for proofreading. This research was supported by the 38 .
© Copyright 2025