Mobile 3D quality of experience evaluation: A hybrid data collection and analysis approach Timo Utriainena, Jyrki Häyrynenb, Satu Jumisko-Pyykköa, Atanas Boevb, Atanas Gotchevb, Miska M. Hannukselac a Human-Centered Technology/bDept. of Signal Processing, Tampere University of Technology, Korkeakoulunkatu 10, P.O. Box 527, FI-33101 Tampere, Finland c Nokia Research Center, P.O. Box 1000, FI-33721 Tampere, Finland ABSTRACT The paper presents a hybrid approach to study the user's experienced quality of 3D visual content on mobile autostereoscopic displays. It combines extensive subjective tests with collection and objective analysis of eye-tracked data. 3D cues which are significant for mobiles are simulated in the generated 3D test content. The methodology for conducting subjective quality evaluation includes hybrid data-collection of quantitative quality preferences, qualitative impressions, and binocular eye-tracking. We present early results of the subjective tests along with eye movement reaction times, areas of interest and heatmaps obtained from raw eye-tracked data after statistical analysis. The study contributes to the question what is important to be visualized on portable auto-stereoscopic displays and how to maintain and visually enhance the quality of 3D content for such displays. Keywords: experienced quality, visual quality, 3D, autostereoscopic display, binocular eye-tracking 1. INTRODUCTION Subjective quality evaluation is used to identify critical system factors for development and objective modeling purposes21. Conventionally, quantitative preference evaluation methods have been conducted following the guidelines of International Telecommunication Union36,37. To complement these preference ratings and gain deeper understanding of varied factors with heterogeneous stimuli, novel or high perceptual qualities both qualitative descriptive methods 42,86 and objective eye-tracking has been proposed102. The importance of these complementing methods is to provide deeper understanding beyond quantitative excellence evaluations. Under the high quality or heterogeneous stimuli, the preference can be hard to identify while the descriptive data enables to differentiate between studied factors 5. The hybrid methodological approach provides a rich description of subjectively experienced quality which also requires systematical co-integration and interpretation of results for providing the appropriate insight to the phenomenon under the study and benefit for further technical development. This paper examined eye-tracking as objective method for examining visual 3D video quality on small display. Eyetracking reveals information about human visual attention. While several studies 17,27,62,70,72,102 have used eye-tracking in data-collection, it is difficult to make meaningful interpretation of the results from the viewpoint of human attention and comparisons between studies. Attention contains always a stimuli-driven bottom-up and a task driven top-down component and these both play a role when measuring quality and defining the tasks for the experiments 47. There are also numerous eye-movement parameters83,19,89 which reveal different aspects of human information processing (e.g. fixations for coding information, saccades for searching whereas blink rate and pupil size indicate fatigue, emotion and cognitive effort83,10,11). For the meaningful use of eye-movement data to complement the understanding of mobile 3D visual quality, it is important to identify the relevant set of eye-tracking parameters. The goal of this paper is three-fold. Firstly we present a review on eye-tracking meaningfully interpretable eye-tracking parameters to be used in quality evaluation research. Secondly, we present an overview of use of eye-tracking on 3D quality research on small screens. Finally, we present examples from two subjective quality evaluation studies where a set of identified interpretable parameters are analyzed for synthetic and natural video contents. 2. OVERVIEW OF METHODS FOR QOE EVALUATION BASED ON EYE TRACKING Eye-tracking is based on the eye-mind hypothesis, which assumes that the viewer‟s attention is directed at the object the viewer is looking at13. Even though this assumption is usually valid, there are exceptions. During higher level cognitive activity, such as intensive thinking, the location of the gaze does not provide accurate information on the focus of attention100. Additionally, the point at which the eyes are focused on can deviate up to 1° away from the target of attention and the human visual system is able to recognize targets up to 1.5-2.6° away from the actual target of fixation68,65. The problem for the validity of eye-tracking experiments is the sheer number of eye-tracking parameters one can extract from the vast data provided by eye-trackers. To further complicate matters, there are no set guidelines as to what each parameter can relay from human visual information processing. Literature uses the parameters in contradicting ways. 2.1 Eye-tracking parameters Eye-tracking parameters are usually divided into three main categories: fixation based, saccade based and scanpath based measures88,83. A fixation is a longer period of time when the eye is relatively stationary and taking in information from the scene19. Saccades are rapid eye movements between fixations during which no information processing takes place 24. A scanpath is a complete saccade-fixation-saccade sequence83. The thresholds used to classify eye movements into saccades and fixations are not commonly agreed upon or standardized, which makes comparison of different studies difficult. Even a small change in the thresholds that define a fixation can have dramatic changes to the results48,90. As the thresholds in terms of acceleration, velocity and duration are not universally agreed upon, studies are difficult to compare as the reported thresholds vary from study to study. The mechanisms governing eye movements are divided into two competing mechanisms: bottom-up and top-down. Their relative importance is debated within the research community 34,6. Bottom-up mechanism is controlled by low-level primitives of the perceived scene – such as contrast, luminance or edge density – termed visual salience75,4,35. The topdown mechanism uses the semantic informativeness of the areas in relation to more complex higher level cognitive functions (viewer‟s task, goals, familiarity with similar scenes) – such as faces and people, and other task dependent important objects13,107,91. It is believed that bottom-up processes are more influential early after stimulus onset 76,35, while the influence of top-down mechanisms increase with time96. 2.2 Recommendations of eye-tracking parameters The review targeted published work in the fields of psychology, psychophysiology, human-computer interaction and traffic research. A total of 88 publications were reviewed. The publications ranged from basic research to applied research. The goal of the review of eye-tracking parameters was two-fold (Table 1): to identify and measure the locations the participants regarded as important in the presented video clips and to measure the participant‟s emotional response to the viewed stimuli. 2.2.1 Measures of important locations Fixation position is the fundamental basis of all eye-tracking studies, as they are based on the eye-mind hypothesis13,107. This hypothesis states that the viewer‟s visual attention is focused at the position his gaze is focused on. Higher cognitive functions (such as intensive thinking, speaking or memory recollection) can however override the link between gaze and attention100. Fixation duration is used to indicate important objects13,69,58,16,57,88,70,67. Ninassi et al. used fixation durations to calculate regions of interest in still images while varying impairments and tasks 67. Nyström and Holmqvist used fixation durations to identify important areas to be used as an input for off-line foveation of video content70. Fixation frequency on an area is used to indicate important locations23,57,58,83. Fitts et al. used it to measure the relative importance of cockpit indicators in aviation research23. Poole and Ball noted that increased fixation frequency on a particular area can indicate greater interest in the target83. The amount of re-fixations has also been used to measure region importance with still image and user interfaces88,29,39,83. First fixation position and latency has been used to indicate important and informative objects in a scene 58,20,14,28. Loftus and Mackworth found that first fixation density was greater for semantically informative than uninformative regions and the viewers fixated earlier on informative objects58. Ellis et al. used the time of first fixation to identify the most important objects of web search user interfaces20. Byrne et al. used the location of first fixation in a study of menu items in pull-down menus14. Häkkinen et al. used the locations and times for first fixations between areas to compare gaze distributions for 2D and 3D versions of the same video content28. Later studies have however questioned the Loftus and Mackworth conclusion that first fixation placement is affected by region informativeness or consistency18,61,30,29. These studies suggest that the visual features of the scene determine initial fixation positions instead of semantic features. Percentage of participants fixating on an area can be used as a between-subjects indicator of important objects39,83,6,28. Birmingham et al. used the proportions of fixations as an indicator of important locations with video scenes of social content6. Häkkinen et al. used the percentage of fixations on an area to explore how the locations change when comparing 2D and 3D versions of the same video content28. Table 1 Identified eye-tracking parameters Goal Parameter Identifying Fixation position important locations Fixation duration Interpreted meaning Example study The location of viewer's visual How does gaze behaviour change attention (eye-mind hypothesis) when changing the viewer's task107 Important objects are fixated on Where do people look while longer than unimportant examining works of art13 locations Fixation frequency Important objects are fixated Relative importance of cockpit often controls23 First fixation Important objects are fixated Where do human observers look position first in videos with social scenes6 Latency to first Important objects are fixated Where do human observers look fixation faster in synthetic images with out-ofplace objects58 Percentage of As more of the participants The importance of eyes, faces, participants fixating fixate on an object the more bodies and foreground objects in on an area important it is social scenes6 Measuring Pupil size Pupil size changes according to Pupil responses while listening to emotional emotional response negative, positive and neutral response sounds77 Blink rate Blink rate changes according to Human reactions to images emotional response showing threatening situations74 Fixation length (on Objects with emotional Where do people look in images an object) attachment are fixated on longer with criminal or neutral content57 Limitations Higher cognitive functions can override the link between gaze and attention Definition of saccades and fixations, lack of engagement, fatigue Applicability with dynamic content Temporal masking, definition of first fixation Temporal masking, definition of first fixation References 13, 107, 19 Gender, age, experience differences 58, 39, 83, 28, 6 Lighting level, fatigue, mental workload 56, 32, 41, 11, 78, 95, 77 Fatigue, mental workload, visual discomfort, eye dryness Applicability with dynamic content 11, 64, 84, 74 13, 69, 58, 88, 70, 67, 16, 57 23, 58, 57, 83 58, 20, 14, 28, 6 58, 20, 14, 28, 6 46, 16, 57 2.2.2 Measures of emotional response Pupil size has a tendency to grow with positive emotions and shrink with negative 56,32,41,11,78,95. Pupil size can however also decrease with fatigue32,59 and increase with mental workload95,82. Pupil size naturally also adjusts by the amount of light entering the eye, which can also be a problem with changes in screen brightness if the room background illumination is not sufficient95,66. Blink rate has been shown to increase with negative emotions11,64,84 and decrease with attentional engagement74. However, a higher blink rate can also indicate fatigue 11,10 and mental workload95,11,10,94. Blink rate can also change due to external variables, such as visual discomfort, dryness of the eyes and masked emotions associated with deception83,84. Fixation length on a single object has also been used as a measure of emotional attachment to that object, with a longer fixation denoting stronger attachment46,16,57. 2.2.3 Test design considerations Typical amount of incomplete experiments with eye-trackers hovers between 10-20%39,25, while individual lower yields of acceptable data are also reported 92. Eyewear, contact lenses, small or large pupils and eyelids, and head movements can cause tracking difficulties. According to Peli et al., a typical yield of acceptable eye sample data per participant in a successful testing session is between 91-98%80. The remaining data is lost due to blinks, temporarily losing tracking of the pupil(s), head movements of the participant and random errors introduced by the eye-tracker. Huge individual differences exist between participants in visual strategies and gaze behavior 107,15,3,9,26,25,2. For this reason, within subject (i.e. repeated measures) design is encouraged 25,19. Age, gender and experience level also affect gaze distribution80,26,23,15,101. Brasel and Gips note that content has an impact on gaze dispersion9. In their experiment with a nature documentary that included advertisement breaks, the adverts had larger gaze dispersion than the documentary. Gaze dispersion is more uniformly distributed between participants when using moving video than still images, movement is more important than color in drawing viewers‟ gaze, and written text (such as subtitles or other textual information) has a tendency to draw the attention of the viewers97. Repeated viewings of the same stimuli change viewing behavior70,9,60,55. Mannan et al. found that fixations are not the same when viewing the same scene for the second time60. In a study by Nyström and Holmqvist, viewers described their viewing patterns as natural during the first viewing and after multiple viewings they began “to look more around the video” in search of quality impairments70. Brasel and Gips found that previously seen advertisements exhibit larger gaze dispersion than during their first exposure9. Le Meur et al. found that repeated viewings changed visual attention deployment, even though introducing quality impairments did not 55. The task of the viewer affects how eye movements are distributed when looking at a scene 107,46,72,102,67. Yarbus noted that participants looked at faces when the task was to evaluate age, and looked at other areas when instructed to evaluate the prosperity of the family in the same image107. Similarly, a quality evaluation task made participants look at areas they did not look at during a free-viewing task72. Ninassi et al. noted that fixation durations increased and the visual strategy changed when evaluating quality instead of free-viewing67. 3. EYE TRACKING OF 3D CONTENT 3.1 Overview of 3D cues Human visual system uses several methods to distinguish distances of objects in a scene. These separate subsystems work together to enable 3D vision98. The different cues used in perception of depth depend on the distance of the observed target. Visual system combines “layers” of these subsystems to get accurate depth estimation104. These different layers of 3D vision are presented in Figure 1. ~10-1m ~101m ~102m ~103m + inf Accommodation Binocular Disparity Pictorial Cues Motion Parallax Figure 1 Separate “layers” of 3D cues. Accommodation is the ability of the eye to change the optical power of its lens. This is needed to focus targets from different distances to the retina. Ciliary muscles of the eye are controlling the curvature of the lens and the control is based on the retinal blur98. As shown in Figure 1 the accomodation is used only on short viewing distances. Binocular disparity leads from the fact that the human eyes are separated by a small distance and share partially same visual field. This enables two binocular depth cues: vergence and stereopsis. In vergence the both eyes are rotated so that the target is visualized in the fovea. This gives signals from the oculomotor system to the visual system about the angle of rotation and this information is interpreted as depth98. Because of separation of the eyes they capture slightly different view of the target object stereopsis uses this disparity to depth estimation. The usability of binocular depth cues is limited by the fact that from certain point forward eyes point straight and the oculomotor signal and disparity remains the same. For longer distances the human visual system uses pictorial cues and motion parallax for depth asesment. Both, pictorial and motion parallax, cues can be perceived also with one eye only. Pictorial cues are for example linear perspective, shadows and scale. Occlusion is a strong pictorial cue that can give sure information about the objects depth relationship104. Scale on the other hand is related to familiarity of real sizes of objects such as people, cars and buildings. As illustrated in Figure 1 the pictorial depth cues are affecting the depth perception also on shorter distances. In motion parallax observer moves in relation to the surrounding scene. Depth cue is created when objects in the scene move relatively to each other. Similarly as pictorial cues the motion parallax can be utilized on wide distance scale. Even with short distances is possible to observe the function of motion parallax. 3.2 Optical characteristics of portable auto-stereoscopic displays Stereoscopic displays create illusion of depth by projecting separate images to the eyes of the observer. Displays that can create 3D illusion without requiring the observer to wear special 3D glasses are known as auto-stereoscopic displays. Wearing 3D glasses is considered impractical for mobile applications, and thus most portable 3D displays are autostereoscopic. The most common design of such displays involves TFT-LCD display and additional optical layer mounted on top. The layer makes the visibility of each TFT color element (also known as sub-pixel) a function of the observation angle. As a result, from each observation angle different group of sub-pixels is visible. The image, formed by the visible pixels is called a view. Since portable 3D displays are meant for a single observer, they typically have two views as shown in Figure 2a99,106,1. When the observer is in the proper position (called sweet-spot), each eye is supposed to see only half of the sub-pixels. However, due to less-than-perfect optics, or wrong observation angle, it is possible that part of the image intended for one eye is visible by the other. This process is modeled as inter-channel crosstalk51. In order to be visualized properly, the sub-pixels of a stereo-pair need to be reordered, so that each view contains the proper image. This process is called interleaving51. The binary map, which describes the mapping of TFT sub-pixels to one or another view, is known as interleaving map. Usually, the resolution of one view is lower than the resolution of the TFT matrix. Also, the resolution of one view is typically two times lower in one direction (most often horizontal) than the other 106,1. Thus, the interleaving process involves downsampling, and requires anti-aliasing filter, designed specifically for the interleaving map of the target display. The two most pronounced artifacts, visible on an auto-stereoscopic display are Moiré (caused by aliasing) and ghosting (caused by crosstalk)51. The amount of the visible crosstalk defines the sweet-spots, at which the image would be seen with sufficient quality. According to subjective quality experiments 52,79, crosstalk levels beyond 25% produce stereoscopic image with unacceptable quality. The sweet-spot that allows the display to be seen uniformly lit with the least amount of crosstalk is the optimal observation position of the display. The distance between the optimal observation position and the display is known as its optimal viewing distance (OVD) 8. In this work, we define the minimum viewing distance (VDmin) and the maximum viewing distance (VD max) as the distances from which an observer with interpupilar distance of 65mm is still able to perceive image with level of crosstalk lower that 25%. The positions of OVD, VDmin and VDmax are shown in Figure 2b. Optical layer TFT-LCD TFT-LCD Optical layer IPD=65mm Left view Right view VDmax OVD VDmin Figure 2 a) Mobile autostereoscopic display - principle of operation and b) optimal observation position. The display we selected for the experiments is an autostereoscopic 3D display with HDDP arrangement, produced by NEC99. It has resolution of 427x240 pixels at 157 DPI. One particular feature of the HDDP display is that it has the same resolution in 2D and 3D. The optimal observation distance of the display is 40cm, and its VD min and VDmax are 25cm and 28cm respectively7. For the experiments, we selected observation distance of 30cm as it was good compromise between visual quality and eye-tracking precision. Our eye-tracker has resolution of 0.1 degrees, and for viewing distance of 30cm this yields tracking precision of approximately 3 pixels. 3.3 Overview of binocular eye tracking Eye tracking has been utilized in visual attention studies for years. Major part of these studies has been conducted with monocular test setup. This is reasonable with two-dimensional content, as the hypothesis is that both eyes are fixated on the same point. When eye tracking is used with three-dimensional content the scenario is different as the eye disparity differs with the content. Therefore both eyes need to be tracked separately in order to calculate the point-of-gaze (pointof-regard/point-of -interest) in three-dimensional space. Same methods that are utilized in monocular eye tracking are also feasible in binocular eye tracking. The research in binocular eye tracking is focused mainly on three things: how to estimate the point-of-gaze in three dimensions, how to utilize this information in user interfaces and development of binocular eye tracking systems that allow non-restricted head movements. Several eye tracker manufacturers are currently offering solutions for binocular eye tracking. Both head mounted and desktop mounted devices are available. The currently mostly used method in eye tracking is video based eye tracking with pupil and corneal reflection detection. Manufacturers are offering devices with similar technology and the biggest difference is in the sample rate of the devices. Systems are available from different manufacturers, offering sampling rates of 120 Hz, 220 Hz, 500 Hz and 1000 Hz. The need of higher sampling rate depends on the aim of the study. To be able to track properly saccades and micro saccades a 500 Hz sampling rate is recommended. In user interface studies the sampling rate can be lower as the interest is more in easy calibration of the system. Research groups have constructed their own binocular eye tracking systems for the user interface studies. For example Shih and Liu have constructed binocular tracking system with easy calibration and 30 Hz sampling rate 93. Their system has good accuracy of 1 degree of visual field, however the freedom of head movement is limited. Kwon et al. have been working on binocular eye tracking system that is used to control user interface on autostereoscopic display 53. Their system has 15 Hz sampling rate and uses pupil center information to calculate geometrically the depth of gaze. Biggest difficulties are faced in the calibration process and calculation of the gaze depth. Traditional calibration method in the video based eye trackers is to show calibration targets on the display screen and ask the test subject to focus on them. The relation between tracked eye and calibration target is solved to get the point-of–gaze information. Essig et al. have introduced improvements into the traditional calibration and depth calculation process by utilizing artificial neural network in the estimation of 3D gaze point22. They have utilized a binocular eye tracker and anaglyph 3D display in their research. The selection of anaglyph display was based on the similar vergence movement of the eyes as with natural content. Their new 3D calibration process utilized 3x3 calibration grids that were positioned on three depth levels forming a 3x3x3 calibration matrix. 3D gaze point was calculated based on both the traditional 2D calibration and the new 3D calibration. Depth calculation from 2D calibration was based on geometrical solution of gaze depth and in 3D calibration the gaze depth was estimated with artificial neural network based on parameterized self-organizing maps (PSOM). They received considerably better results from the 3D gaze depth estimations when compared to geometrical methods. The downside of their method is the complexity of the calibration process. The work of Essig et al. has been repeated with similar results by Pfeiffer et al.81. The test setup in the study by Pfeiffer et al. was more extensive as they utilized two eye tracking systems and more complex test stimuli. Both eye trackers were head mounted devices. 3D display was based on shutter glasses. Pfeiffer et al. studied three points: benefits of PSOM calibration versus the geometrical calibration, expensive eye tracker versus cheap eye tracker and the usefulness of depth information in object selection. Results of this study were clear on the better performance of PSOM when compared to geometrical estimation of 3D gaze depth. Also the cheaper eye tracker got partly better results in the experiments. However this can be partly explained by the calibration difficulties with the more expensive eye tracker. The used shutter glasses were disturbing the eye tracker cameras. The result of the third research interest was maybe most interesting as according to Pfeiffer et al. the used of depth information did not bring benefit to their object selection test when compared to 2D method. Only in situations where the objects were occluded the depth information gave slight improvements. The calibration problems caused by the test setup were noted as one source of poor results. In more recent research Hennessey and Lawrence introduced their novel binocular eye tracking system for 3D point-ofgaze estimation31. Their system consisted of eye tracking camera with 200 Hz sampling rate and volumetric display setup with 2D display and a Plexiglas mounted on rails. The calibration process was performed in different depth levels similarly as in the study by Essig et al22. System developed by Hennessey and Lawrence offers significant benefits as it is noncontact (user does not wear glasses for 3D visualization) and head-free (opposed to other systems where user needs to wear the eye tracker or use chin support or bite-bar). The biggest difference in most other eye tracking studies is the use of eye models in 3D point-of-gaze estimation. By improving the eye model the tracking results for the 3D point-ofgaze can also be improved as opposite to geometrical or neural network solution. Binocular eye tracking research has also been done in more traditional studies of visual system properties. Jainta et al. studied the binocularity during reading40. They tested the disparity during fixations and determined the minimum disparity values. Their work was performed on two-dimensional content, but results of the varying disparity values in different parts of fixation are interesting because in three-dimensional case the gaze depth is calculated based on eye disparity. Wismeijer et al. have performed binocular eye tracking studies on correlation between perception and spontaneous eye movements105. They have utilized polarization-multiplexed 3D display and SR research Eyelink II eye tracker. The experiment stimuli contained two depth cues disparity and perspective with varying amount of conflicting information. Their results indicated that for small conflicts the depth cues were averaged, but for large conflict one of the depth cues was dominating and the dominating cue varied between test subjects. To the best of authors‟ knowledge binocular eye tracking studies has not been performed with high speed eye tracking system and autostereoscopic display. Previous studies with autosterescopic displays are limited to research by Kwon et al. and in their case the focus was on the gaze interaction with 3D display. Also the use of mobile display is new aspect to the binocular eye tracking research. 4. EARLY RESULTS 4.1 Experiment 1 Goal of the experiment was to examine influence of depth, motion and object size on the visual attention with synthetic content. 4.1.1 Test content design - synthetic content The synthetic content consists of 63 stereoscopic movies, each 10 seconds long. The movies were prepared using software for rendering of 3D images called POV-Ray85. In each movie, one simple object (a ball) is moving in 3D space, with its position and apparent depth known for every frame. The movement is restricted within a space with minimum apparent depth of 28.5cm (corresponding to -20px disparity) and maximum apparent depth of 31.5cm (corresponding to 20px disparity). The content of each synthetic movie (Figure 3b) is controlled by four parameters: 1. Ball size – we used two ball sizes of the ball in our experiment: “small”, with angular size of less than 1 degree and “big”, with angular size of more than 2 degrees. The reason is to have objects with angular size are smaller of bigger than the fovea. In our case, the “small” ball is 20px wide, and the “big” one is 100px wide. 2. Movement direction – there are 5 types of movement direction: “x”, “y”, “xy”, “z”, and “xyz”. Types “x” and “y” and “xy” are planar movements in the display plane, where the ball translating in horizontal, vertical or diagonal direction respectively. Type “z” indicates movement in depth without changing the planar coordinates, and type “xyz” is unrestricted 3D movement in arbitrary direction. 3. Ball speed – the movement in our experiments is with three possible speeds: “slow”, “fast” and “sudden”. “Slow” is within the smooth-pursuit tracking speed of the eyes, which is 2 deg/s. “Fast” is comparable with the fast moving objects in cinematic content – 11 deg/s. In “sudden” movement, the object changes its position within a single frame. Movements in depth are calculated, so the object moves with constant apparent speed – 1cm/s for “slow”, and 5.8cm/s for “fast” speed. The “fast” speed is calculated for object passing through the cinema screen for 5 seconds, and typical angular screen size of 55 degrees. 4. Background type – this is combination of background texture type and background depth. There are three possible values for this parameter: “none”, “textured” and “deep”. If set to “none”, the background is of uniform black color without any particular depth. “Textured” background is one with rich texture and apparent depth at display level (with zero disparity). “Deep” background is with rich texture and apparent depth of 31.5cm (20px disparity). 4.1.2 Research method Participants – A total of 13 naïve assessors, assessors equally stratified by gender and by age between 18 and 45 years participated to the study. Procedure – The test procedure contained three parts. Firstly, sensorial tests (visual acuity (20/40), color vision and acuity of stereo vision (.6)) and Simulator Sickness Questionnaire (SSQ49) were measured. Secondly, calibration with 9 measurement points with the required accuracy (worst point error <1.5 degree, average error <1.0 degree) was conducted. During the actual tests, participants were given a task to follow the ball presented in the scenes. After completing the actual evaluation task (duration 15 min), SSQ was filled again. Viewing conditions and presentation of stimuli – The experiment took a place in the controlled laboratory circumstances38. The 3D display used was NEC horizontally double-density pixel arrangement (HDDP) using the native resolution of 427x240px at 155DPI and the physical size of 3.5”99. The display utilizes lenticular sheet technology to provide the stereoscopic 3D effect. The 3D display was placed on a mount above the eye-tracker‟s desktop unit which housed the eye-tracking camera and IR lights. The participant‟s head was kept still by using a headrest and the viewing distance to display was set to 30cm. The stimuli were presented one by one in a randomized order with a 5 second pre-stimulus marker between each 10 second stimulus (Figure 3b). The pre-stimulus marker consisted of a 3 second mid-gray (50%) clip, followed immediately with a 2 second pre-stimulus marker with a white (100%) cross at the center of the screen and dark-gray (25%) background. The cross had a disparity value of 0, i.e. it was situated on the display surface. SEQUENCE Pre-stimulus marker 2s Stimulus 10s Break 3s Pre-stimulus marker 2s Stimulus 10s Break 3s Figure 3 a) Viewing conditions, and b) sequence of stimuli presentation. Apparatus – The used eye-tracker was EyeLink 1000 from SR Research Inc. The eye-tracker‟s Desktop Mount unit follows the eye using a combination of pupil and corneal reflection tracking. The eye-tracker was set to use binocular tracking at a 1000Hz sampling frequency. The desktop mount unit was placed facing the participant below the display (Figure 3b). The eye-tracker used following settings during the experiments: Saccade velocity threshold = 30°/sec (min. velocity), Saccade acceleration threshold = 8000°/sec2 (min. acceleration), Saccade motion threshold = 0.1° (min. motion), Saccade pursuit fixup = 60°/sec (max. pursuit velocity), Fixation update interval = 50ms, Fixation update accumulate = 50ms, Blink offset verify time = 12ms. 4.2 Experiment 2 Goal of the experiment was to examine influence of depth on visual attention and emotion with natural content. 4.2.1 Test content design - Natural content The natural content consists of 30 stereoscopic movies with mobile resolution. We used 10 high-resolution multi-camera sequences as source. From each multiview sequence, three stereo-movies were created. One of the cameras was selected as “left” camera, and was used in all stereo-movies created by this particular multiview sequence. Typically, this was the leftmost camera, but in some cases different camera was used, in order to avoid cases where neighboring cameras had noticeably different color balance. The “left” camera (after cropping and resizing) was paired with various “right” cameras from the multiview sequence, thus creating stereo-pairs with different camera baseline. The movies were resized and eventually cropped with the aim to achieve the desired final frame size and aspect ratio. From each multi-camera sequence the following pairs were created – monoscopic, where the same camera was used for both channels, narrow baseline stereo, and wide baseline stereo. The “wide” camera baseline was selected to represent 3D video with pronounced depth, while still in the range, allowing comfortable observation. The “short” baseline was selected to represent down-scaled HD content with limited, shallow depth, but still distinguishable from 2D content. The resulting disparity ranges be seen in Table 2, while screenshots of each content are available in Figure 4. Table 2 Natural content properties and disparity ranges. Original Sequence name resolution Newspaper 1024x768 Dog 1280x960 Nagoya balloons 1024x960 Pantomime 1280x960 Undo dancer 1920x1080 Undo sneakers 1920x1080 Ghost town duel 1920x1080 Kendo 1024x960 Lovebirds 1 1024x768 Ghost town flight 1920x1080 Short baseline Wide baseline Spatial Temporal disparity range disparity range details motion min max min max Medium Low -6 12 -7 17 Medium Low 0 3 0 6 Medium Medium -2 8 -2 11 Medium Medium -4 6 -8 11 Low Medium -2 2 -4 3 High Low -1 3 -1 5 Medium Low 0 4 0 7 Medium Medium -4 5 -5 8 High Low 1 7 5 17 Low High -5 5 -4 10 Figure 4 Screenshots of the natural content stimuli, from top left to bottom right: a) Newspaper, b) Dog, c) Nagoya balloons, d) Pantomime, e) Undo dancer; f) Undo sneakers, g) Ghost town duel, h) Kendo, i) Lovebirds1, j) Ghost town flight 4.2.2 Research method Participants – A total of 40 naïve assessors equally stratified by gender and by age between 18 and 45 years participated study. The sample contained mostly naive or untrained (80%) participants with no prior experience of quality evaluation experiments, were not experts in technical implementation and were not studying, working or otherwise engaged in information technology or multimedia processing37,38. Procedure – The test procedure contained four parts. At the beginning, the sensorial tests and SSQ were measured identically to the experiment 1. The actual test contained two parts and the calibration was conducted prior and SSQ filled after each of them: 1) Free-viewing test – participant‟s task was to view content and any other additional tasks were not given. 2) Quality evaluation task – participant‟s task was to evaluate overall quality on a discrete unlabeled scale from 0 to 10 and the acceptance of quality for viewing mobile 3DTV (binary yes/no scale) during a 5-second answer time between clips38,44. Combined anchoring and training where participants were shown the extremes of the sample qualities and all contents were conducted prior the start of evaluation task. During the evaluation task, the stimuli were presented one by one, rated independently and retrospectively. To reduce the participant movements during the evaluation task, the assessor said aloud their quality judgment and it was marked to the answer sheet by moderator. To reduce the experimenter expectancies the moderator did not see the stimuli assessed. In the final part, the participants impressions of quality per stimuli were gathered on 17 descriptive dimensions, constructed from the model of Descriptive Quality of Experience for Mobile 3D Video45. Viewing conditions, presentation of stimuli, apparatus – The laboratory conditions, used devices and apparatus were identical to the experiment 1. A total of 30 stimuli were presented in all evaluation tasks in random order. The duration of stimuli was 10s and the use of pre-stimulus markers was identical to the experiment 1. Stimuli content – Ten visual stimuli contents were used with variable spatial and temporal details. The content were called as „Newspaper‟, „Dog‟, „Nagoya balloons‟, „Pantomine‟, „Undo dancer‟, „Undo sneakers‟, „Ghost town duel‟, „Kendo‟, „Lovebirds1‟ and „Ghost town flight‟. 4.3 Early results on synthetic content We started working on our 3D gaze model based on the eye tracking results from the experiments with synthetic content. In the first stage we worked on the content that contains small and big objects moving suddenly in x-, y- and z- directions. The object movement is so fast that it needs to be followed with saccadic eye movements. The parameters we are interested are: reaction time, travel time and time to arrive. Reaction time is the time from the target movement to the time when the eyes are moving towards the new location of the target. This time contains the time to react and launch the saccadic eye movement. In the Figure 5a is illustrated the target movement and eye tracking data related to it. Target is moving in x-direction on the screen level. Target movement is plotted with red color and eye movement with blue. Figure 5 a) Example of eye movement parameter measurement from observer data and b) table of average reaction times. The reaction times (saccadic latencies) are researched previously with monocular eye tracking 103 and in reading context88. Initial results from our experiments are shown in Figure 5b. Obtained results are in accordance with the reaction times between 150 – 400ms introduced previously in literature88,103. According to our research the reaction time in horizontal movement is faster than in vertical and depth movement. Our initial expectation was that reaction time in depth would be significantly larger than the other reaction times, but it turned out to be close to reaction time in vertical movement. 4.4 Early results on natural content This section presents the results of influence of depth on visual attention in three different parameters and emotion in the analysis of blink rates. The analysis is conducted for the free-viewing tasks. 4.4.1 Visual attention Analysis – In this paper, the analysis is presented for the content called ‘Newspaper’ from experiment 2. The procedure of analysis contained 3 steps: 1) Identification of Areas of Interest (AOIs, Figure 5a). They were determined by examining frame-by-frame heatmaps (Figure 6b) and frame-by-frame disparity maps for the studied content. Disparity maps were used to identify objects that were at different depths within the scenes and typical determined AOIs can include protruding objects, moving objects and people97. EyeLink Data Viewer v1.10.123 by SR Research Ltd was used in this part of analysis. The left eye data was used for analysis as the calibration was based on it and it did not change according to the required disparity compared to the right eye. 2) Analysis was based on three parameters and identified AOIs: a) The relative importance of each AOI as the percentage of participants fixating on it. The higher the percentage the more important the AOI. b) The important locations as the latency of the first fixation to that AOI (Mean IA First fixation time). The lower the latency, the more important the AOI. c) The total fixation duration (Mean IA Dwell time) on each AOI. The higher the total duration, the more important the AOI. 3) Finally statistical analysis was conducted. The relative importance of AOI – The AOI man, plant and woman were the mostly fixated objects in the scene while table was significantly less fixated (Figure 7a). The AOI screen represents the entire area of the display. The results show that the depth levels influenced on frequency of participants fixating on the identified AOI‟s (Friedman: FR=8.167, df=2, p<.05) when averaged across the AOI‟s. When presenting content with stereoscopic short baseline significantly higher number of participants fixated on the important objects on the scenes (Figure 7a; Wilcoxon: p<.01) and similar tendency was shown in the content by content analysis. This result indicates that use of stereoscopic presentation mode can attract visual attention over monoscopic presentation mode. Figure 6 a) Areas of Interest (AOI) for Newspaper content, b) Heatmap of frame 25 of Newspaper content. The more participants fixating, the brighter the area. The important locations as the latency of the first fixation - The man was the fastest fixated object in the scene (within the first second after stimulus onset), followed by significantly slower fixated plant (in 2-4 seconds), and woman (4-5 in seconds) (Figure 7b). The results show that the depth levels influenced on latency of the first fixation when averaged across the AOI‟s (Friedman: FR=3.320, df=2, p=.190; Figure 7b). There is a small tendency that the duration of the first fixation is shorter when the stereoscopic presentation mode is used, but the differences are not statistically significant. However, for the most important object, man, the distribution of fixation time of stereoscopic short baseline is smaller compared to others. The total fixation duration on each AOI – The AOI man has significantly higher total dwell duration, confirming its importance in the scene compared to the other objects (Figure 8a). For this object, the results show the tendency that with the stereoscopic presentation, its importance is decreasing compared to monoscopic presentation (Friedman: FR=4.850, df=2, p=.088; Mono vs. wide baseline: Wilcoxon: Z=-2.13, p<.05). 4.4.2 Emotion – blink rate Analysis: Blink is defined here as the amount of times the participant blinks during a 10 second stimulus. The analysis presented here contains six contents excluding ‘Dog’, ‘Undo dancer’, ‘Undo sneakers’ and ‘Ghost town duel’. Results show that blink rate slightly decreases when stereoscopic presentation is used (F R=13.029, df=2, p<.01; Figure 8b). Blink rate decreased with increasing 3D effect while free-viewing indicating a presence of positive emotions (Wilcoxon: Wide baseline vs. others: p<.01; Short baseline vs. Mono = ns). Figure 7 a) Percentage of participants fixating on an AOI. Higher value: more important AOI. b) First fixation time. Lower value indicates more important AOI. Figure 8 a) Total time spent on each AOI (mean IA dwell time). Higher value indicates more important AOI. b) Mean blink count. Blink rate increases with negative emotions and visual fatigue, and decreases with positive emotions. 5. CONCLUSIONS Subjective quality evaluation is used to identify critical system factors for development and objective modeling purposes. Conventionally, quantitative preference evaluation methods have been conducted following the guidelines of International Telecommunication Union. To complement these preference ratings and gain deeper understanding of varied factors with heterogeneous stimuli both qualitative descriptive methods and objective eye-tracking have been proposed. The importance of these complementing methods is to provide deeper explanation to the quantitative quality preference ratings. There is no previous published work to systematically combine all three data-sets to understand visual quality. While several studies have used eye-tracking in data-collection, it is difficult to make meaningful interpretation of the results from the viewpoint of human attention and comparisons between studies. A review of literature was performed to identify a meaningful set of eye-tracking parameters to be used for hybrid evaluation experiments. The review targeted at published work in the fields of psychology, psychophysiology, humancomputer interaction and traffic research. The goal of the review was two-fold: to identify the locations the participants regarded as important in the presented video clips and to measure the participants‟ emotional response to the viewed stimuli. Our early results show that the identified parameters provide meaningful results with binocular eye-tracking and small portable 3D displays, and are in line with previous research. Future work will continue to deepen the understanding of visual quality on 3D displays. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 3D-LCD product brochure, MasterImage, available online at http://masterimage.co.kr/new_eng/data/masterimage.zip?pos=60 Aaltonen, A. Hyrskykari, A. and Räihä, K., “101 Spots, or how do users read menus?,” Proc. CHI 98 Human Factors in Computing Systems, 132-139 (1998). Andrews, T. J. and Coppola, D. M., “Idiosyncratic characteristics of saccadic eye movements when viewing different visual environments,” Vision Research, 39(17), 2947-2953 (1999). ISSN 0042-6989 Baddeley, R. J. and Tattler, B. W., “High frequency edges (but not contrast) predict where we fixate: A Bayesian system identification analysis,” Vision Research 46, 2824-2833 (2006). Bech S. and Zacharov N., “Perceptual audio evaluation - Theory, method and application,” J. Acoust. Soc. Am. 122(1), 16-16 (2007). Birmingham, E., Bischof, W. F. and Kingstone, A., “Saliency does not account for fixations to eyes within social scenes,” Vision Research 49(24), 2992-3000 (2009). ISSN 0042-6989, doi:10.1016/j.visres.2009.09.014. Boev, A. and Gotchev, A., “Comparative study of autostereoscopic displays for mobile devices,” Proc. Multimedia on Mobile Devices, Electronic Imaging Symposium 2011, (2011). Boher, P., Leroux, T., Bignon, T. and Collomb-Patton, V., “A new way to characterize auto-stereoscopic 3D displays using Fourier optics instrument,” Proc. SPIE Stereoscopic displays and applications XX, (2008). Brasel, S. A. and Gips, J., “Points of view: Where do people look when we watch TV?,” Perception 37, 1890-1894 (2008). Brookings, J. B., Wilson, G. F. and Swain, C. R., “Psychophysiological responses to changes in workload during simulated air traffic control,” Biological Psychology 42(3), 361-377 (1996). ISSN 0301-0511, doi: 10.1016/0301-0511(95)05167-8 Bruneau, D., Sasse, M. A. and McCarthy, J. D., “The eyes never lie: The use of eye tracking data in HCI research,” Proc. CHI '02 Workshop on Physiological Computing, (2002). Burt, P. and Julesz, B., "Modifications of the classical notion of Panum's fusional area," Perception 9(6), 671-682 (1980). Buswell, G., [How people look at pictures: A study of the psychology of perception in art], The University of Chicago Press, Chicago, Illinois (1935). Byrne, M. D., Anderson, J. R., Douglas, S. and Matessa, M., “Eye tracking the visual search of click-down menus,” Proc. HCI 99, 402-409 (1999). Card, S. K., “Visual search of computer command menus,” In Bouma H. & Bouwhuis D.G. (Eds) Attention and Performance X, Control of Language Processes, Hillsdale, NJ: Lawrence Erlbaum Associates (1984). Christianson, S. Å., Loftus, E. F., Hoffman, H. and Loftus, G. R., “Eye fixations and memory for emotional events,” Journal of Experimental Psychology: Learning, Memory, and Cognition 17(4), 693-701 (1991). Cui, L. C., “Do experts and naive observers judge printing quality differently?,” Proc. SPIE 5294, 132-145 (2003). De Graef, P., Christiaens, D. and d'Ydewalle, G., “Perceptual effects of scene context on object identification,” Psychology Research 52, 317-29 (1990). Duchowski, A. T. [Eye Tracking Methodology: Theory and Practice], Second Edition, Springer-Verlag, New York (2007). Ellis, S., Candrea, R., Misner, J., Craig, C. S., Lankford, C. P. and Hutshonson, T. E., “Windows to the soul? What eyes tell us about software usability,” Proc. Usability Professionals Association Conference 1998, 151-178 (1998). 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. Engeldrum, P. G., [Psychometric Scaling. A Toolkit for Imaging systems development], Winchester: Imcotek Press (2000). Essig, K., Pomplun, M. and Ritter, H., ”A neural network for 3D gaze recording with binocular eye trackers,” International Journal of Parallel, Emergent and Distributed Systems 21(2), 79-95 (2006). Fitts, P. M., Jones, R. E. and Milton, J. L., “Eye movements of aircraft pilots during instrument-landing approaches,” Aeronautical Engineering Review 9(2), 24-29 (1950). Fuchs, A. F., “The saccadic system,” In Bach-y-Rita, P., Collins, C. C. & Hyde, J. E. (Eds) The Control of Eye Movements, 343-362, NY: Academic Press (1971). Goldberg, H.J. and Wichansky, A. M., “Eye tracking in usability evaluation: A practitioner‟s guide,” In Hyönä, J., Radach, R., & Deubel, H. (eds) The Mind‟s Eye: Cognitive and Applied Aspects of Eye Movement Research, Amsterdam, Elsevier, 493-516 (2003). Goldstein, R. B., Woods, R. L. and Peli, E., “Where people look when watching movies: Do all viewers look at the same place?,” Computers in Biology and Medicine 37(7), 957-964 (2007). Gulliver S. R. and Ghinea G., "Stars in their eyes: what eye-tracking reveals about multimedia perceptual quality,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 34(4), 472-482 (2004). Häkkinen. J., Kawai, T., Takatalo, J., Mitsuya, R. and Nyman, G., “What do people look at when they watch stereoscopic movies?,” In Woods, A. J., Holliman, N. S. & Dodgson, N. A. (Eds) Electronic Imaging: Stereoscopic Displays & Applications XXI 7524(1), 75240E, 10 pages (2010). Henderson, J. M. and Hollingworth, A., “High-level scene perception,” Annual Review of Psychology 50, 243-71 (1999). Henderson, J. M., Weeks, P. A. Jr. and Hollingsworth, A., “The effects of semantic consistency on eye movements during complex scene viewing,” Journal of Experimental Psychology: Human Perception and Performance 25(1), 210-228 (1999). Hennessey, C. and Lawrence, P., "Noncontact Binocular Eye-Gaze Tracking for Point-of-Gaze Estimation in Three Dimensions," IEEE Transactions on Biomedical Engineering 56(3), 790-799 (2009). Hess, E. H., “Pupillometrics,” In Greenfield, N.S. & Sternbach, R. A. (Eds) Handbook of Psychophysiology, Holt, Richard & Winston, New York, NY, 491-531 (1972). Itti L., Koch C. and Niebur E., “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998). Itti, L. and Koch, C., “Computational modelling of visual attention,” Nat. Rev. Neurosci., 2001/03 (2001). doi:10.1038/35058500 Itti, L., “Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes,” Visual Cognition 12(6), 1093-1123 (2006). ITU-R BT.1438, “Subjective assessment of stereoscopic television pictures,” Rec. ITU-R BT.1438, ITU Telecom. Sector of ITU, (2000). ITU-R BT.500-11, “Methodology for the subjective assessment of the quality of television pictures,” International Telecommunications Union – Radiocommunication sector, (2002). ITU-T P.911 Recommendation, “Subjective audiovisual quality assessment methods for multimedia applications,” International Telecommunications Union (ITU) – Telecommunication sector (1998). Jacob, R. J. K. and Karn, K. S., ”Eye tracking in Human–Computer Interaction and usability research: Ready to deliver the promises,” In Hyona, Radach & Deubel (Eds) The Mind‟s Eye: Cognitive and Applied Aspects of Eye Movement Research, Oxford, England (2003). Jainta, S., Hoormann, J., Kloke, W. B. and Jaschinski, W., “Binocularity during reading fixations: Properties of the minimum fixation disparity,” Vision Research 50(18), 1775-1785 (2010). Janisse, M. P., “Pupil size, affect and exposure frequency,” Social Behavior and Personality 2, 125-146 (1974). Jumisko-Pyykkö S., Häkkinen J. and Nyman G., ”Experienced quality factors - Qualitative evaluation approach to audiovisual quality,” Proc. IST/SPIE conference Electronic Imaging, Multimedia on Mobile Devices, (2007). Jumisko-Pyykkö, S. and Hannuksela, M. M., “Does context matter in quality evaluation of mobile television?,” Proc. 10th MobileHCI '08, 63-72 (2008). Jumisko-Pyykkö, S., Malamal Vadakital, V. K. and Hannuksela, M. M., “Acceptance threshold: Bidimensional research method for user-oriented quality evaluation studies,” International Journal of Digital Multimedia Broadcasting 712380, 20 pages (2008). doi:10.1155/2008/712380 Jumisko-Pyykkö, S., Strohmeier, D., Utriainen, T. and Kunze, K., “Descriptive quality of experience for mobile 3D television,” Proc. NordiCHI 2010, 1–10 (2010). ISBN 978-1-60558-934-3 Just, M. A. and Carpenter, P. A., “Eye fixations and cognitive processes,” Cognitive Psychology 8, 441-480 (1976). Kahneman, D., [Attention and effort], Englewood Cliffs, NJ: Prentice-Hall (1973). Karsh, R. and Breitenbach, F. W., “Looking at looking: The amorphous fixation measure”, In Groner, R., Menz, C., Fisher, D. F., & Monty, R. A. (Eds) Eye Movements and Psychological Functions: International Views, Hillsdale, NJ: Erlbaum, 53-64 (1983). Kennedy, R., Lane, N., Berbaum, K. and Lilienthal, M., “Simulator sickness questionnaire: An enhanced method for quantifying simulator sickness,” Int. J. Aviation Psychology 3(3), 203-220 (1993). Knoche, H. and Sasse, M. A., “The sweet spot: How people trade off size and definition on mobile devices,” Proc. ACM Multimedia 2008, 21-30 (2008). 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. Konrad, J. and Agniel, P., "Subsampling models and anti-alias filters for 3-D automultiscopic displays," IEEE Trans. Image Process. 15, 128-140 (2006). Kooi, F. and Toet, A., “Visual comfort of binocular and 3D displays,” Displays 25(2-3), 99-108 (2004). ISSN 0141-9382, doi:10.1016/j.displa.2004.07.004 Kwon, Y., Jeon, K. and Kim, S., "Research on gaze-based interaction to 3D display system," Proc. SPIE 6392, 63920J (2006). Le Meur, O., Le Callet, P. and Barba, D., “Predicting visual fixations on video based on low-level visual features,” Vision Research 47(19), 2483-2498 (2007). ISSN 0042-6989, doi:10.1016/j.visres.2007.06.015. Le Meur, O., Ninassi, A., Le Callet, P. and Barba, D., “Do video coding impairments disturb the visual attention deployment?,” Signal Processing: Image Communication 25(2010), 597-609 (2010). Loewenfeld, I. E., “Pupil size,” Survey of Opthalmology 11, 291-294 (1966). Loftus, E. F., Loftus, G. R. and Messo, J., “Some facts about „weapon focus‟,” Law and Human Behavior 11, 55-62 (1987). Loftus, G. R. and Mackworth, N. H., “Cognitive determinants of fixation location during picture viewing,” Journal of Experimental Psychology: Human Perception and Performance 4(4), 565-572 (1978). Lowenstein, O. and Loewenfeld, I. E., “The sleep-waking cycle and pupillary activity,” Annals of the New York Academy of Sciences 117, 142-156 (1964). Mannan, S. K., Ruddock, K. H. and Wooding, D. S., “Fixation sequences made during visual examination of briefly presented 2D images,” Spatial Vision 11, 157-78 (1997). Mannan, S., Ruddock, K. H. and Wooding, D. S., “Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images,” Spatial Vision 9, 363-86 (1995). McCarthy, J. D., Sasse M. A. and Miras D., “Sharp or smooth?: Comparing the effect of quantization vs. frame rate for streamed video,” Proc. 2004 conference on Human factors in computing systems, 535-542 (2004). Moorthy, A. K. and Bovik, A. C., “Perceptually significant spatial pooling techniques for image quality assessment,” Proc. SPIE Electronic Imaging 2009 7240, (2009). Narayanan, N. H. and Schrimpster, D. J., “Extending eye tracking to analyse interactions with multimedia information presentations,” Proc. HCI 2000 on People and Computer XIV - Usability or else, 271-286 (2000). Nelson, W. W. and Loftus, G. R., “The functional visual field during picture viewing,” Journal of Experimental Psychology: Human Learning and Memory 6, 391-399 (1980). Ninassi, A., Le Meur, O., Le Callet, P. and Barba, D., “Does where you gaze on an image affect your perception of quality? Applying visual attention to image quality metric,” Image Processing 2007: ICIP 2007 2, II-169 – II-172 (2007). Ninassi, A., Le Meur, O., Le Callet, P., Barba, D. and Tirel, A., “Task impact on the visual attention in subjective image quality assessment,” 14th European Signal Processing Conference: EUSIPCO 2006, 5 pages (2006). Nodine, C. F., Carmody, D. P. and Herman, E., “Eye movements during visual search for artistically embedded targets,” Bulletin of the Psychonomic Society 13, 371-374 (1979). Nodine, C. F., Carmody, D. P. and Kundel, H. L., “Searching for Nina,” In Senders, J., Fisher, D. F. & Monty, R. (Eds) Eye movements and the higher psychological functions, Hillsdale, NJ: Erlbaum, 241-258 (1978). Nyström, M. and Holmqvist, K., “Deriving and evaluating eye-tracking controlled volumes of interest for variable resolution video compression,” Journal of Electronic Imaging 16(1), 013006 (2007). Nyström, M. and Holmqvist, K., “Effect of compressed offline foveated video on viewing behavior and subjective quality,” ACM Trans. on Multimedia Computing, Communications, and Applications 6(1), 14 pages (2008). Nyström, M. and Holmqvist, K., “Semantic override of low-level features in image viewing – Both initially and overall,” Journal of Eye Movement Research 2(2), 2:1-2:11 (2008). Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J., “Interaction in 4-second bursts: The fragmented nature of attentional resources in mobile HCI,” Proc. CHI 2005, 919-928 (2005). Palomba, D., Sarlo, M., Angrilli, A., Mini, A. and Stegagno, L., “Cardiac responses associated with affective processing of unpleasant film stimuli,” International Journal of Psychopsysiology 36, 45-57 (1999). Parkhurst, D. J. and Niebur, E., “Scene content selected by active vision,” Spatial Vision 16, 125-154 (2003). Parkhurst, D., Law, K. and Niebur, E., “Modeling the role of salience in the allocation of overt visual attention,” Vision Research 42, 107-123 (2002). Partala, T. and Surakka, V., “Pupil size variation as an indication of affective processing,” Int. J. Hum.-Comput. Stud. 59(1-2), 185-198 (2003). doi:10.1016/S1071-5819(03)00017-X Partala, T., Jokiniemi, M. and Surakka, V., “Pupillary responses to emotionally provocative stimuli,” Proc. 2000 Symposium on Eye Tracking Research & Applications: ETRA '00, 123-129 (2000). doi:10.1145/355017.355042 Pastoor, S., “Human factors of 3D images: Results of recent research at Heinrich-Hertz-Institut Berlin,” Proc. IDW‟95 3D-7, 6972 (1995). Peli, E., Goldstein, R. B. and Woods, R. L., “Scanpaths of motion sequences: Where people look when watching movies,” Starkfest Conference on Vision and Movement in Man and Machines, 18-21 (2005). Pfeiffer, T., Latoschik, M. E. and Wachsmuth, I., “Evaluation of binocular eye trackers and algorithms for 3D gaze interaction in virtual reality environments,” Journal of Virtual Reality and Broadcasting 5(16), (2008). Pomplun, N. and Sunkara, S., “Pupil dilation as an indicator of cognitive workload in Human-Computer Interaction,” Proc. HCI International 2003(3), 542-546 (2003). 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. Poole, A. and Ball, L. J., “Eye tracking in Human-Computer Interaction and usability research: Current status and future prospects,” In Ghaoui, C. (Ed) Encyclopedia of Human Computer Interaction, (2004). Porter, S. and ten Brinke, L., “Reading between the lies: Identifying concealed and falsified emotions in universal facial expressions,” Psychological Science 19(5), 508-514 (2008). POV-Ray, “The Persistence of Vision Raytracer,” available online at http://www.povray.org Radun, J., Leisti, T., Häkkinen, J., Nyman, G., Olives, J.-L., Ojanen, H. and Vuori, T., "Content and quality: Interpretationbased estimation of image quality," ACM Transactions on Applied Perception 4(4):2, (2008). Rajashekar, U., Bovik, A. C. and Cormack, L. K., “Gaffe: A gaze-attentive fixation finding engine,” IEEE Transactions on Image Processing 17, 564–573 (2008). Rayner, K., “Eye movements in reading and information processing: 20 years of research,” Psychological Bulletin 124(3), 372422 (1998). Rötting, M., “Parametersystematik der Augen- und Blickbewegungen für arbeitswissenschaftliche Untersuchungen,” PhD Thesis, Technische Universität Berlin, (2001). Salvucci, D. D. and Goldberg, J. H., “Identifying fixations and saccades in eye-tracking protocols”. Proc. 2000 Symposium on Eye tracking research & applications: ETRA '00, 71-78 (2000). doi:10.1145/355017.355028 Sarter, M., Givens, B. and Bruno, J. P., “The cognitive neuroscience of sustained attention: Where top-down meets bottom-up,“ Brain Research Reviews 35(2), 146-160 (2001). Schnipke, S. K. and Todd, M. W., “Trials and tribulations of using an eye-tracking system,” CHI '00 Extended Abstracts on Human Factors in Computing Systems: CHI '00, 273-274 (2000). Shih, S. and Liu, J., "A novel approach to 3-D gaze tracking using stereo cameras," IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 34(1), 234-245 (2004). Stern, J. A. and Dunham, D. N., “The ocular system,” In Cacioppo, J. T. & Tassinary, L. G. (Eds) Principles of Psychophysiology: physical, social and inferential elements, Cambridge: Cambridge University Press, 513-553 (1990). Takahashi, K., Nakayama, M. and Shimizu, Y., “The response of eye-movement and pupil size to audio instruction while viewing a moving target,” Proc. 2000 Symposium on Eye Tracking Research & Applications: ETRA '00, 131-138 (2000). doi:10.1145/355017.355043 Tattler, B. W., Baddeley, R. J. and Gilchrist, I. D., “Visual correlates of fixation selection: effects of scale and time,” Vision Research 45(5), 643-659 (2005). ISSN 0042-6989, doi:10.1016/j.visres.2004.09.017 Tosi, V., Mecacci, L. and Pasquali, E., “Scanning eye movements made when viewing film: Prelimary observations,” Int. J. Neuroscience 92(1-2), 47-52 (1997). Tovee, M. J., [An introduction to the visual system], Cambridge university press, Cambridge, 2008. Uehara, S., Hiroya, T., Kusanagi, H., Shigemura, K. and Asada, H., “1-inch diagonal transflective 2D and 3D LCD with HDDP arrangement,” Proc. SPIE-IS&T Electronic Imaging 2008, Stereoscopic Displays and Applications XIX 6803, (2008). Viirre, E., Van Orden, K., Wing, S., Chase, B., Pribe, C., Taliwal, V. and Kwak, J., “Eye movements during visual and auditory task performance,” Society for Information Display SID 04 Digest, (2004). Vu, C. T., Larson, E. C. and Chandler, D. M., “Visual fixation patterns when judging image quality: Effects of distortion type, amount, and subject experience,” Proc. 2008 IEEE Southwest Symposium on Image Analysis and Interpretation: SSIAI, 73-76 (2008). doi:10.1109/SSIAI.2008.4512288 Vuori, T., Olkkonen, M., Pölönen, M., Siren, A. and Häkkinen, J., “Can eye movements be quantitatively applied to image quality studies?,” Proc. Third Nordic Conference on Human-Computer interaction: NordiCHI '04 82, 335-338 (2004). doi:10.1145/1028014.1028067 Walker, R. and McSorley, E., “The parallel programming of voluntary and reflexive saccades,” Vision Research 46(13), 20822093 (2006). Wandell, B. A., [Foundations of vision], Sinauer Associates, Inc, Sunderland, Massachusetts, (1995). Wismeijer, D. A., Erkelens, C. J., van Ee, R. and Wexler, M., “Depth cue combination in spontaneous eye movements,” J Vis 10(6):25, (2010). Woodgate, G. J. and Harrold, J., “Autostereoscopic display technology for mobile 3DTV applications,” Proc. SPIE 6490A-19, (2007). Yarbus, A. L., [Eye movements and vision], NY: Plenum Press, (1967).
© Copyright 2025