1 A Survey on Mobile Affective Computing Shengkai Zhang and Pan Hui arXiv:1410.1648v2 [cs.HC] 11 Oct 2014 Department of Computer Science and Engineering The Hong Kong University of Science and Technology {szhangaj, panhui}@cse.ust.hk Abstract—This survey presents recent progress on Affective Computing (AC) using mobile devices. AC has been one of the most active research topics for decades. The primary limitation of traditional AC research refers to as impermeable emotions. This criticism is prominent when emotions are investigated outside social contexts. It is problematic because some emotions are directed at other people and arise from interactions with them. The development of smart mobile wearable devices (e.g., Apple Watch, Google Glass, iPhone, Fitbit) enables the wild and natural study for AC in the aspect of computer science. This survey emphasizes the AC study and system using smart wearable devices. Various models, methodologies and systems are discussed in order to examine the state of the art. Finally, we discuss remaining challenges and future works. Index Terms—Affective computing, mobile cloud computing, wearable devices, smartphones I. I NTRODUCTION Affective Computing (AC) aspires to narrow the gap between the highly emotional human and the emotionally challenged machine by developing computational systems that can response, recognize and express emotions [1]. With the long-term research on emotion theory from psychology and neuroscience [2]–[4], emotion has been confirmed to be a significant effect on human communication, decision making, perception and so on. In other words, emotion contributes to an important part of intelligence. Although the research on artificial intelligence and machine learning has made big progress, the computer system still lacks proactive interaction with human without the understanding of users’ implicit affective state. Therefore, computer scientists have put more and more efforts on AC to fulfill a dream of constructing a highly intelligent computer system, which enriches the intelligent and proactive human-computer interaction. Existing works detect and recognize affects following three major steps: (1) collecting affect-related signals; (2) using machine learning, pattern recognition techniques to classify emotions; (3) evaluating the performance based on benchmarks. To evaluate the performance and effectiveness of models and methods, many multimedia emotion databases have been proposed [5]–[13]. However, such databases were conducted based on the “artificial” material of deliberately expressed emotions, either asking the subjects to perform a series of emotional expressions in front of camera and/or microphone or using film clips to elicit emotions in the lab [14]. Research suggests that deliberate behavior differs in spontaneously occurring behavior. [15] shows that the posed nature of emotions in spoken language may differ in the choice of words and timing from corresponding performances in natural settings. [16] comes to the facial behavior, demonstrating that spontaneous deliberately displayed facial behavior has differences both in utilized facial muscles and their dynamics. Hence, there is a great need to find data that included spontaneous displays of affective behavior. Several studies have emerged on the analysis of spontaneous facial expressions [17]–[20] and vocal expressions [21], [22]. These studies recruited a few participants to record their speech and facial expressions for certain time (several days to months). The major problem is the scale of databases. Recently, researchers address this problem exploiting crowdsourcing [23]–[28]. Basically, these works designed a mechanism (e.g., a game [27], a web search engine [29]) to collect user’s affective state in large-scale. [25] designed a Amazon Mechanical Turk [30] setup to generate the affective annotations for corpus. An alternative to address the problem and create new research opportunities for affective computing is the combination of smart mobile wearable devices (e.g., iWatch, Goggle Glass, iPhone, Fitbit) and AC. We simply call it Mobile Affective Computing (MAC). Such devices have been widely used over the world. For example, in the third quarter of 2012, one billion smartphones were in use worldwide [31]. The key feature of smart devices is the abundant sensors that enable various affect-related signals unobtrusive monitoring. Exploring the possibility of using smart mobile devices for AC will benefit at least three perspectives by the potential of long-term unobtrusive monitoring user’s affective states: • • • Influence the affect-related research literature by the wild, natural and unobtrusive study Establish the spontaneous affect databases efficiently to evaluate new AC methods, models, systems more accurately Enhance the user-centered human-computer interaction (HCI) design for the future ubiquitous computing environments [1], [32]–[35] It is important to note that computer scientists and engineers can also contribute the emotion research. Scientific research in the area of emotion stretches back to the 19th century when Charles Darwin [2], [36] and William James [3] proposed theories of emotion. Despite the long history of emotion theory research, Parkinson [37] contends that traditional emotion theory research has made three problematic assumptions while studying individual emotions: (1) emotions are instantaneous; (2) emotions are passive; (3) impermeable emotions. These assumptions simplify the considerably more complex story in 2 reality. There is considerable merit to argue that some affective phenomena cannot be understood at the level of small group of people and must always be studied as a social process [37]– [42]. With the widespread usage and rich sensors of smart wearable devices, we are able to verify emotion theories by designing and implementing MAC systems to proceed wild, natural and unobtrusive studies. Establishing a large-scale and trustful labeled data of human affective expressions is a prerequisite in designing automatic affect recognizer. Although crowdsourcing provides a promising method to generate large-scale affect databases. It is either expensive (pay participants) or error prone. In addition, it cannot monitor human affective events in daily lives failing to establish spontaneous affect databases. Current spontaneous emotional expression databases for the ground truth are established by manually labeling movies, photos and voice records, which is very time consuming and expensive. MAC research [43], [44] designed software systems upon smartphones that ran continuously in the background to monitor user’s mood and emotional states. With proper design of mechanisms to collect feedback from users, we can establish large-scale spontaneous affect databases efficiently with very low cost. Furthermore, ubiquitous computing environments have shifted the human-computer interaction (HCI) design from computer-centered to human-centered [1], [32]–[35]. Mobile HCI designs currently involve mobile interface devices like smartphone, wisdom band, and smart watch. These devices transmit explicit messages while ignoring implicit information of users, i.e., the affective state. As proved by emotion theory, the user’s affective state is a fundamental component of human-human communication, motivating human actions. Consequently, the understanding of user’s affective states for mobile HCI would be able to perceive user’s affective behaviour and to initiate interactions intelligently rather than simply responding to user’s commands. This paper introduces and surveys these recent advances in the research on AC. In contrast to previously published survey papers and books in the field [33], [45]–[68], it focuses on the approaches and systems for unobtrusive emotion recognition using human-computer interfaces and AC using smart mobile wearable devices. It discusses new research opportunities and examines the state-of-the-art methods that have not been reviewed in previous survey papers. Finally, we discuss the remaining challenges and outline future research avenues of MAC. This paper is organized as follows: With the significance of AC described in Section II, we introduce the sensing ability of mobile devices relating to AC research and in detail review the related studies combining smart mobile devices in Section IV. Section V outlines challenges and research opportunities on MAC. A summary and closing remarks conclude this paper. II. A FFECTIVE COMPUTING Affective computing allows computers to recognize, express, and “have” emotions [1]. It is human-computer interaction in which a device has the ability to detect and appropriately respond to its user’s emotions and other stimuli. Affective computing was given its name from psychology, in which “affect” is a synonym for “emotion”. It seems absurd. Marvin Minsky wrote in The Society of Mind: “The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without emotions” [69]. Scientists agree that emotion is not logic, and strong emotions can impair rational decision making. Acting emotionally implies acting irrationally, with poor judgment. Computers are supposed to be logical, rational, and predictable. These are the very foundations of intelligence, and have been the focus of computer scientists to build an intelligent machine. At first blush, emotions seem like the last thing we would want in an intelligent machine. However, this negative aspect of emotion is less than half of the story. Today we have evidence that emotions are an active part of intelligence, especially perception, rational thinking, decision making, planning, creativity, and so on. They are critical in social interactions, to the point where psychologists and educators have redefined intelligence to include emotional and social skills. Emotion has a critical role in cognition and in human-computer interaction. In this section, we demonstrate the importance of affect and how it can be incorporated in computing. A. Background and motivation Emotions are important in human intelligence, rational decision making, social interaction, perception, memory and more. Emotion theories can be largely examined in terms of two components: 1) emotions are cognitive, emphasizing their mental component; 2) emotions are physical, emphasizing their bodily component. The cognitive component focuses on understanding the situations that give rise to emotions. For example, “My brother is dead, therefore, I am sad.” The physical component focuses on the physiological response that occurs with an emotion. For example, your heart rate will increase when you are nervous. The physical component helps computers understand the mappings between emotional states and emotional expressions so that the latter can infer the former. Physical aspects of emotion include facial expression, vocal intonation, gesture, posture, and other physiological responses (blood pressure, pulse, etc.). Nowadays, abundant sensors equipped on computers, mobile devices or humans are able to see, hear, and sense other responses from people. The cognitive component helps computers perceive and reason about situations in terms of the emotions they raise. By observing human behaviour and analyzing a situation, a computer can infer which emotions are likely to be presented. With the ability of recognizing, expressing and “having” emotions for computers, we can at least benefit entertainment, learning, social development. In entertainment, a music application selects particular music to match a person’s present mood or to modify the mood. In learning, an affective computer tutor [70] is able to be more effective in teaching by attending to the student’s interest, pleasure, and distress. In social development, affective-sensitive system can instruct people who lack the intelligence of social communications 3 to act more friendly. More sophisticated applications can be developed by recognizing user’s affect, reasoning with emotional cues, and understanding how to intelligently respond given the user’s situation. We next provide more depth for key technologies of AC. B. Existing technologies Affective computing system consists of affective data collection, affect recognition, affect expression, etc. Currently, researchers focus on affect detection and recognition. A computing device could gather cues to user emotion from a variety of sources. This paper will focus on reviewing the recent progress of Mobile Affective Computing (MAC) that affective computing studies using mobile devices. The biggest difference between traditional AC research and MAC is the data collection method. As a result, I will review typical works in terms of different signals for AC. From previous studies, the following signals can indicate user’s emotional state and be detected and interpreted by computing devices: facial expressions, posture, gestures, speech, the force or rhythm of key strokes and physiological signal changes (e.g., EEG). Table I summarizes the traditional data collection methods in terms of different signals. Signals Facial expression Voice & speech Gesture & posture Key strokes EEG [92] Text Collection methods Camera Microphone Pressure sensors Keyboard EEG sensors Human writing material Typical references [17], [18], [71]–[75] [7], [21], [76]–[81] [82]–[85] [86]–[91] [93]–[96] [97]–[100] TABLE I T RADITIONAL SIGNAL COLLECTION METHODS . The majority of affect detection research has focused on detecting from facial expressions since the signal is very easy to be captured by camera. Microphone is commonly equipped for computers, smartphones, etc. to record speech and vocal expressions. Gesture and posture can offer information that is sometimes unavailable from the conventional nonverbal measures such as the face and the speech. Traditionally, researchers install pressure sensors on pad, keyboard, mouse, chair, etc. to analyze the temporal transitions of posture patterns. The pressure-sensitive keyboard used in [86] is the one described by Dietz et al. [101]. For each keystroke, the keyboard provides readings from 0 (no contact) to 254 (maximum pressure). Implementing a custom-made keyboard logger allows researchers to gather the pressure readings for key stroke pattern analysis. EEG studies often used some special hardware to record EEG signals. For example, AlZoubi et al. [93] used a wireless sensor headset developed by Advanced Brain Monitoring, Inc (Carlsbad, CA). It utilizes an integrated hardware and software solution for acquisition and real-time analysis of the EEG, and it has demonstrated feasibility for acquiring high quality EEG in real-world environments including workplace, classroom and military operational settings. With the collected data by different methods and tools, affective detection studies choose appropriate feature selection techniques and classifiers based on the model used (dimensional [102], [103] or categories [61]) to detect affects. The performance of these systems and algorithms can be evaluated by subjective self-report, experts report and standard databases. The subjective self-report is to ask participants of the experiment to report their affective states periodically while the experts report needs some other affective recognition experts or peers to report the affective states of participants with some records (videos, photos, etc.). The third evaluation approach relies on the developed affect databases. Researchers test algorithms and systems using the data from these labeled databases to examine the classification accuracy. However, such databases usually were created by deliberate emotions that differ in visual appearance, audio profile, and timing from spontaneously occurring emotions. In addition, the experiments of traditional AC research are obtrusive that they need to recruit participants to take part in under control. Two obvious limitations are: 1) the data scale is limited that it is not possible to collect affect-related data in a large population; 2) the experimenting duration is limited that it is not possible to involve participants in the experiment too long. With the advent of powerful smart devices, e.g., smartphone, smart glass, smart watch, wisdom band and so on, AC will enter a new era that eliminates the limitations above with three reasons: • Widespread use, in that as of 2013, 65 percent U.S. mobile consumers own smartphones. The European mobile device market as of 2013 is 860 million. In China, smartphones represented more than half of all handset shipments in the second quarter of 2012 [104]; • “All-the-time” usage, in that 3 in 4 smartphone owners check them when they wake up in the morning, while up to 40% check them every 10 minutes. When they go on vacation, 8 in 10 owners say they take their devices along and use them all the time, compared to slightly more than 1 in 10 who take them but rarely use them [105]; • Powerful sensors and networking, in that smart devices boast an array of sensors, which include accelerometer, light sensor, proximity sensor and more. The advantage of using the smart device sensors is that the device itself can collect the sensor output, store and process them locally or communicate them to a remote server [106]. Next section will review the state-of-the-art unobtrusive emotion recognition research. III. U NOBTRUSIVE EMOTION RECOGNITION This section elaborates the related work about emotion recognition based on human-computer interfaces such as keyboard and webcam. Unlike existing emotion recognition technologies, users use a device with an expressionless face or silence, and they do not want to feel any burden associated with the recognition process. There have been attempts to satisfy these requirements; recognise emotions of the user inconspicuously with a minimum cost. 4 Poh et al. [107] presented a simple, low-cost method for measuring multiple physiological parameters using a basic webcam. They applied independent component analysis (ICA) on the color channels in facial image captured by a webcam, and extracted various vital signs such as a heart rate (HR), respiratory rate, and HR variability. To prove lightness and applicability of their approach, they utilized a commonly used webcam as a sensing unit. Their approach showed significant potentials for affective computing, because there is a close correlation between the bio-signal such as a HR and emotion, and it does not require any attention of the user. Independent component analysis (ICA) is a relatively new technique for uncovering independent signals from a set of observations that are composed of linear mixtures of the underlying sources. The underlying source signal of interest in the study is the BVP that propagates throughout the body. During the cardiac cycle, volumetric changes in the facial blood vessels modify the path length of the incident ambient light such that the subsequent changes in amount of reflected light indicate the timing of cardiovascular events. By recording a video of the facial region with a webcam, the red, green, and blue (RGB) colour sensors pick up a mixture of the reflected plethysmographic signal along with other sources of fluctuations in light due to artefacts. Given that hemoglobin absorptivity differs across the visible and near-infrared spectral range, each color sensor records a mixture of the original source signals with slightly different weights. These observed signals from the RGB color sensors are denoted by y1 (t), y2 (t), y3 (t), respectively, which are the amplitudes of the recorded signals at time point t. Three underlying source signals, represented by x1 (t), x2 (t), x3 (t). The ICA model assumes that the observed signals are linear mixtures of the sources, i.e., y(t) = Ax(t) where the column vectors y(t) = [y1 (t), y2 (t), y3 (t)]T , x(t) = [x1 (t), x2 (t), x3 (t)]T , and the square 3 × 3 matrix A contains the mixture coefficients aij . The aim of ICA is to find a demixing matrix W that is an approximation of the inverse of the original mixing matrix A whose output ˆ x(t) = Wy(t) is an estimate of the vector x(t) containing the underlying source signals. To uncover the independent sources, W must maximize the non-Gaussianity of each source. In practice, iterative methods are used to maximize or minimize a given cost function that measures non-Gaussianity. This study was approved by the Institutional Review Board, Massachusetts Institute of Technology. They sample featured 12 participants of both genders (four females), different ages (18–31 years) and skin colour. All the participants provided their informed consents. The experiments were conducted indoors and with a varying amount of ambient sunlight entering through windows as the only source of illumination. Participants were seated at a table in front of a laptop at a distance of approximately 0.5m from the built-in webcam (iSight camera). During the experiment, participants were asked to keep still, breathe spontaneously, and face the webcam while their video was recorded for one minute. With the proper experimental design, they demonstrated the feasibility of using a simple webcam to measure multiple physiological parameters. This includes vital signs, such as HR and RR, as well as correlates of cardiac autonomic function through HRV. The data demonstrate that there is a strong correlation between these parameters derived from webcam recordings and standard reference sensors. Regarding the choice of measurement epoch, a recording of 1–2 min is needed to assess the spectral components of HRV and an averaging period of 60 beats improves the confidence in the single timing measurement from the BVP waveform [108]. The face detection algorithm is subject to head rotation limits. Epp et al. [87] proposed a solution to determine emotions of computer users by analyzing the rhythm of their typing patterns on a standard keyboard. This work highlights two problems with current system approaches as mentioned above: they can be invasive and can require costly equipment. Their solution is to detect users emotional states through their typing rhythms on the common computer keyboard. Called keystroke dynamics, this is an approach from user authentication research that shows promise for emotion detection in human-computer interaction (HCI). Identifying emotional state through keystroke dynamics addresses the problems of previous methods by using costly equipment which is also nonintrusive to the user. To investigate the efficacy of keystroke dynamics for determining emotional state, they conducted a field study that gathered keystrokes as users performed their daily computer tasks. Using an experience-sampling approach, users labeled the data according to their level of agreement on 15 emotional states and provided additional keystrokes by typing fixed pieces of text. Their approach allowed users’ emotions to emerge naturally with minimal influence from their study, or through emotion elicitation techniques. From the raw keystroke data, they extracted a number of features derived mainly from key duration (dwell time) and key latency (flight time). They created decision-tree classifiers for 15 emotional states, using the derived feature set. They successfully modelled six emotional states, including confidence, hesitance, nervousness, relaxation, sadness, and tiredness, with accuracies ranging from 77.4% to 87.8%. They also identified two emotional states (anger and excitement) that showed potential for future work in this area. Zimmermann et al. [109] developed and evaluated a method to measure affective states through motor-behavioural parameters from the standard input devices mouse and keyboard. The experiment was designed to investigate the influence of induced affects on motor-behaviour parameters while completing a computer task. Film clips were used as affect elicitors. The task was an online-shopping task, that required participants to shop on an e-commerce website for office-supplies. 96 students (46 female, 50 male, aged between 17 and 38) participated in the experiment. During the experiment, all mouse and keyboard actions were recorded to a log-file by software running in the background of the operating system, invisible to the subject. Log-file entries contained exact time and type of actions (e.g. mouse button down, mouse position x and y coordinates, which key pressed). ntain exact time and type of action (e.g. mouse button down, mouse position x and tes, which key pressed). ocedure 5 After subjects arrived at the laboratory, sensors and res- duration of keystroke (from key-down until key-up event) and performance. piratory measurement were attached and connected, val at the laboratory, sensors andbands respiratory measurement bands were attached and Vizer et al. [91] explored the possibility of detecting cogand then they were asked to complete subject and health , and then data subjects were asked to complete subject and health questionnaires nitive and physical stress by monitoring keyboard interactions questionnaires on the computer. All instructions duringdata withgiven the eventual the instructions experiment were giventhe at the appropriatewere stageswritten on the and mputer. All during experiment at thegoal of detecting acute or gradual changes computer interface. The procedure was automated (as Fig. 1 in cognitive and physical function. The research analyzed e stages onshown). the computer interface. procedure was automated Figureand 2).linguistic features of spontaneously generated keystroke The participants firstThe familiarized themselves with the (see text. Theindicated study is designed with two experimental conditions online-shopping task andwith then the indicated their mood on the and cts first familiarized themselves online-shopping task then graphical and verbal differentials. Afterwards, the first movie (physical stress and cognitive stress) and one control condid on the clip, graphical and verbal differentials. Afterwards, the first movie clip, tion (no stress). Data were analyzed with machine learning expected to be affectively neutral, was presented. Partechniques using ticipantsneutral, then filled the mood (control assessmentrun). questionnaires, o be affectively wasoutpresented Subjects then filled out thenumerous features, including specific types completed the online-shopping task and filled out the question- of words, timing characteristics, and keystroke frequencies, essment questionnaires, completed the online-shopping task and filled out the naires again. Then they randomly chose the second movie clip, to generate reference models and classify test samples. They aires again.inducing Then the wasPVLA, randomly chosen, one of theclassifications of 62.5% for physical stress achieved correct onesecond of the movie moods clip PVHA, NVHA, NVLA,inducing nVnA (P=positive, N=negative, H=high, L=low, n=neutral, and 75% for cognitive stress (for 2 classes), which they state is VHA, PVLA, NVHA, NVLA, nVnA (P=positive, N=negative, H=high, L=low, V=valence, A=arousal). The film was followed by graphical comparable to other affective computing solutions. They also V=valence, A=arousal). Thequestionnaires, film was then followed verbal that their solutions should be tested with varying typing and verbal differential the taskby andgraphical the statedand abilities and keyboards, with varying physical and cognitive two questionnaires again. The experiment ended after 1.5 to l questionnaires, then the task and the two questionnaires again. The experiment abilities, and in real-world stressful situations. 2 hours. r 1.5 to 2 hours. Twenty-four participants with no documented cognitive or physical disabilities were recruited for the study. The data Introduction collected per keystroke consisting the type of keyboard event Subject Data (either key up or key down), timestamp, and key code.The time stamp was recorded to a resolution of 10ms using theTicks Task Exercise call to the system clock in Visual Basic. Key codes allowed for subsequent analysis of the keys pressed and words typed. Inital Mood Assessment All data were collected using a single desktop computer and standard keyboard.At the conclusion of the experimental MOOD INDUCTION 1 session, the participant completed a demographic survey and MOOD ASSESSMENT 1A reported his or her subjective level of stress for each cognitive and physical stress task on an 11-point Likert scale. PART I - MOOD: TASK 1 NEUTRAL The authors extracted features from the timing and keystroke data captured during the experiment. Data were MOOD ASSESSMENT 1B normalized per participant per feature using z-scores. For each participant, the baseline samples were used to calculate MOOD INDUCTION 2 means and standard deviations per feature; then the control and experimental samples were normalized using those values. MOOD ASSESSMENT 2A Both the raw and normalized data were analyzed to determine PART II - MOOD: if accounting for individual differences by normalizing results TASK 2 PH/PL/NH/NL/ in this way would yield higher classification accuracy. NEUTRAL MOOD ASSESSMENT 2B Machine learning methods were employed to classify stress conditions. The machine learning algorithms used in this Debriefing analysis included Decision Tree (DT),Support Vector Machine (SVM), k-Nearest Neighbor (kNN), AdaBoost,and Artificial Procedure of the experiment (PH=positive valence/high arousal, PL=positive Neuralvalence/low Network(ANN). These techniques were selected priFig. 1. Procedure of the experiment (PH=positive valence/high H=negative valence/high arousal, NL=negative valence/low arousal) arousal, PL=positive valence/low arousal, NH=negative valence/high arousal, marily because they have been previously used to analyze NL=negative valence/low arousal). keyboard behaviour (e.g., kNN) or they have shown good alysis performance across a variety of applications (e.g., SVM). The They analyzed the mouse and keyboard actions from the primary goal was to confirm whether the selected features ted data will be analyzed in three steps. In a first step, the independent measure is log-files. The following parameters were tested for correlations and automated models were able to successfully detect stress induction with procedure of (the the list 5 different affective statesof PVHA, PVLA, mood state may be extended): number mouse rather thanNVHA, optimize how well they detect stress. Therefore, clicks per minute, average duration of mouse clicks (from they adopted the most commonly used parameter settings VnA (P=positive, N=negative, H=high, L=low, n=neutral, V=valence, A=arousal). button-down until button-up event), total distance of mouse when training and testing stress detection models and used movements in pixels, average distance of a single mouse the same set of features for all five machine learning methods. movement, number and length of pauses in mouse movement, The parameter settings were chosen based on experience and number of events “heavy mouse movement” (more than 5 knowledge about the problem at hand rather than through changes in direction in 2 seconds), maximum, minimum and exhaustive testing of all possible combinations of parameter average mouse speed, keystroke rate per second, average settings. For example, three-fold cross validation is chosen to 6 strike a balance between the number of folds and the number of data points in each fold. The primary contribution of this research is to highlight the potential of leveraging everyday computer interactions to detect the presence of cognitive and physical stress. The initial empirical study described above (1) illustrated the use of a unique combination of keystroke and linguistic features for the analysis of free text; (2) demonstrated the use of machine learning techniques to detect the presence of cognitive or physical stress from free text data; (3) confirmed the potential of monitoring keyboard interactions for an application other than security; and (4) highlighted the opportunity to use keyboard interaction data to detect the presence of cognitive or physical stress. Rates for correct classification of cognitive stress samples were consistent with those currently obtained using affective computing methods. The rates for correct classification of physical stress samples, though not as high as those for cognitive stress, also warrant further research. Hernandez et al. [86] explores the possibility of using a pressure-sensitive keyboard and a capacitive mouse to discriminate between stressful and relaxed conditions in a laboratory study. During a 30-minute session, 24 participants performed several computerized tasks consisting of expressive writing, text transcription, and mouse clicking. Under the stressful conditions, the large majority of the participants showed significantly increased typing pressure (> 79% of the participants) and more contact with the surface of the mouse (75% of the participants). The purpose of this work is to study whether a pressuresensitive keyboard and a capacitive mouse can be used to sense the manifestation of stress. Thus, they devised a withinsubjects laboratory experiment in which participants performed several computerized tasks under stressed and relaxed 2 conditions. aredes Asta Roseway3 Mary Czerwinski3 In order to comfortably and unobtrusively monitor stress, Amherst Street, Cambridge, MA 02139, USA this study examines gathering behavioural activity from the 387 Soda Hall, Berkeley, CA 94720-1776, USA keyboard and the mouse. These devices are not only one of Microsoft Way, Redmond, WA 98052-6399, USA the most common channels of communications for computer edes@berkeley.edu, {astar, marycz}@microsoft.com users but also represent a unique opportunity to non-intrusively capture longitudinal information that can help(also capture changes and their associated behavioral effects knownlongconditions such as chronic stress. Instead of ed can help term as the fight or flight response), has evolved to help usanalyzing face variety of traditional keyboard and mouse dynamics on time life-threatening situations. However, repeatedbased triggering of or ronic stress. frequency this stress during daily this activity canfocuses result inon chronic of reflex certain buttons, work pressure. continuously In particular, stress, leading to aused largea array of adverse health conditions they pressure-sensitive keyboard and a explores the such as depression, hypertension and various forms of capacitive mouse (see Fig. 2). ensing Stress of Computer Users oard and a tressful and During a med several writing, text he stressful ants showed 9% of the face of the he potential ndations for d; User cardiovascular diseases [21]. Fig. 2. Figure 1. Pressure-sensitive keyboard (left), and Pressure-sensitive keyboard, and capacitive capacitive mouse (right). mouse. A first towardsthe preventing type of condition For each step keystroke, keyboardthis provides readings from 0 consists in being able to detect when a person is stressed. (no contact) to 254 (maximum pressure). They implemented Ideally, stress measurement systems should be continuous and unobtrusive so that they can capture the responses of people throughout the day without creating additional stress. If a person could know, for instance, that during the a custom-made keyboard logger in C++ that allowed us to gather the pressure readings at a sampling rate of 50 Hz. The capacitive mouse used in this work is the Touch Mouse from Microsoft, based on the Cap Mouse described in [110]. This mouse has a grid of 13 × 15 capacitive pixels with values that range from 0 (no capacitance) to 15 (maximum capacitance). Higher capacitive readings while handling the mouse are usually associated with an increase of hand contact with the surface of the mouse. Taking a similar approach to the one described by Gao et al. [111], they estimated the pressure on the mouse from the capacitive readings. They made a custom-made mouse logger in Java that allowed gathering information for each capacitive pixel at a sampling rate of 120Hz. In order to examine whether people under stress use the input devices differently, they designed several tasks that required the use of the keyboard or mouse under two different conditions: stressed and relaxed. The chosen tasks are: text transcription, expressive writing, mouse clicking. In order to see whether the two versions of the tasks elicited the intended emotions (i.e., stressed or relaxed), participants were requested to report their valence, arousal and stress levels on a 7-point Likert scale after completion of each task. Although stress could be positive or negative, they expected that high stress levels in the experiment would be associated with higher arousal and negative valence. Additionally, throughout the experiment participants wore the Affectiva QTM wrist-band sensor that scontinuously measured electrodermal activity, skin temperature, and 3-axis accelerometer with a sampling rate of 8Hz. In order to examine the differences between relaxed and stressed conditions, they performed a within-subjects laboratory study. Therefore, all participants performed all the tasks and conditions during the experiment. After providing written consent, participants were seated at an adjustable computer workstation with an adjustable chair, and requested to provide some demographic information. Next, they were asked to wear the QTM biosensor on the left wrist and to adjust the wristband so that it did not disturb them while typing on the computer. All data were collected and synchronized using a single desktop computer with a 30 inch monitor. They used the same pressure-sensitive keyboard and capacitive mouse for all participants. The room and lighting conditions were also the same for all users. Fig. 3 shows the experimental setup. The results of this study indicate that increased levels of stress significantly influence typing pressure (> 83% of the participants) and amount of mouse contact (100% of the participants) of computer users. While > 79% of the participants consistently showed more forceful typing pressure, 75% showed greater amount of mouse contact. Furthermore, they determined that considerably small subsets of the collected data (e.g., less than 4 seconds for the mouse clicking task) sufficed to obtain similar results, which could potentially lead to quicker and timelier stress assessments. This work is the first to demonstrate that stress influences keystroke pressure in a controlled laboratory setting, and the first to show the benefit of a capacitive mouse in the context of stress measurement. The findings of this study are very promising and pave the way a maximum ng keyboard 2.8) with a participants cience or a ls for the school (7), relaxed and hin-subjects erformed all ment. After seated at an stable chair, information. or on the left not disturb to minimize mental tasks, tutorial, in ece of text g task. After e three tasks l conditions half of the ndition and other half of ndition and more, while een the two writing and ced between red between of the study. re instructed of paradise think about using a single desktop computer with a 30 inch monitor. We used the same pressure-sensitive keyboard and capacitive mouse for all participants. The room and lighting conditions were also the same for all users. Figure 4 shows a photo of the experimental room. accelerometer. This sensor can be used to detect body movements and gestures. Traditional approaches [82]–[85] have to attach pressure sensors on chairs, keyboard, mouse, etc. Accelerometer equipped on smartphones will definitely benefit the unobtrusive and long-term data collection with respect to body movement, gestures, and other human behaviors. 30 Inch Display Text for Transcription Task QTM biosensor Capacitive Mouse Pressure-sensitive Keyboard Headsets for Transcription Task Fig. 3. 7 Adjustable Chair Figure 4. Photograph of the experimental setup. The experimental setup. RESULTS This section provides the analysis of the collected data for grouped the creation less invasive systems that canwecontinuously into of several research questions. First, analyze monitor stress in real-life settings. the effectiveness of the tasks to elicit the relaxed and stressed Second, the we study differences Table II states. summarizes mainthefeatures of in thepressure presented for the keyboard tasks. Third, we analyzemight the differences in solutions. Some valuable conclusions be considered capacitance the mouse clicking task. Finally, we explore while building for a new system. how the much datadevelopment would be necessary to replicate thethe findings With rapid of smartphone and abundant of the previous questions. of sensors equipped as mentioned in Sec. II, the research Since some of the affective data was computing not normallybecomes distributed we and on smartphone-based more utilized the non-parametric Wilcoxonthe Rank Sumdetection test (W) to more popular. Next we will discuss affect using evaluate whether two distributions are statistically different smartphone. (e.g., the distributions of pressure during stressed and relaxed conditions), and the non-parametric Kruskal-Wallis IV.(K) S MARTPHONE - BASEDmultiple AFFECTIVE COMPUTING test to evaluate whether groups of variables This section will discuss the state-of-the-art affective computing using mobile or wearable devices. First, we will briefly introduce the sensors equipped on modern wearable devices. Powerful sensors along with the long-term usage of such devices enable the unobtrusive affective data collection, which is promising to improve traditional affective computing research. Moreover, the widespread use of smartphones has changed human lives. Modern technologies transform the impossible to the nature [112]. As a result, researchers are seeking to explore new features with respect to interactions (touch behaviors, usage events, etc.) with mobile wearable devices to infer affective states. We will summarize the related works in this aspect. (a) Smartphone accelerometer Fig. 4. (b) Space flight accelerometer Accelerometers. Gyroscope: It can be a very useful tool because it moves in peculiar ways and defy gravity. Gyroscopes have been around for a century now, and they are now used everywhere from airplanes, toy helicopters to smartphones. A gyroscope allows a smartphone to measure and maintain orientation. Gyroscopic sensors can monitor and control device positions, orientation, direction, angular motion and rotation. When applied to a smartphone, a gyroscopic sensor commonly performs gesture recognition functions. Additionally, gyroscopes in smartphones help to determine the position and orientation of the phone. An on-board gyroscope and accelerometer work in combination with a smartphone’s operating system or specific software applications to perform these and other functions. Fig. 5 give a view of digital gyroscope embedded in smartphone. Some works [113] take advantages of it to record users’ moving and position for affect detection. A. Affect sensing As we have mentioned above, traditional affective studies rely on camera, microphone, sensors attached to human bodies and some other human-computer interaction interfaces to collect raw data for affect detection and analysis. The widespread use and rich sensors of modern smart wearable devices overcome the limitations. We briefly introduce the affective-related sensors and tools equipped on smart devices. Accelerometer: It measures proper acceleration (acceleration it experiences relative to free fall), felt by people or objects, units as m/s2 or g (1g when standing on earth at sea level). Most smartphone accelerometers trade large value range for high precision, iPhone 4 range: 2g, precision 0.018g. Fig. 4 shows the smartphone accelerometer and space flight Fig. 5. iPhone’s digital gyroscope. GPS: Location sensors detect the location of the smartphone using 1) GPS; 2) Lateration/Triangulation of cell towers or wifi networks (with database of known locations for towers and networks); 3) Location of associated cell tower or WiFi networks. For localization, connection to 3 satellites is required for 2D fix (latitude/longitude), 4 satellites for 3D fix (altitude) as Fig. 6 shown. More visible satellites increase precision of positioning. Typical precision is 20 − 50m and 8 References Sources Labelling method Poh et al. [107] webcam none Epp et al. [87] mouse movements keystrokes Zimmermann et al. [109] keystrokes and movements Method and classifier statistical analysis and questionnaire (5-point Likert scale) C4.5 decision tree mouse questionnaire (self-assessment) statistical analysis Vizer et al. [91] keystrokes, language parameters questionnaire (11-point Likert scale) Hernandez et al. [86] keystrokes, typing pressures, biosensors, mouse movements labels assigned according to the tasks elicited the intended emotions decision trees, support vector machine, k-NN, AdaBoost, neural networks statistical analysis Results conclusion: strong correlation between these parameters derived from webcam recordings and standard reference sensors accuracy: 77.4 − 87.8% for confidence, hesitance, nervousness, relaxation, sadness, tiredness conclusion: possible discrimination between neutral category and four others accuracy: 75.0% for cognitive stress (k-NN), 62.5% for physical stress (AdaBoost, SVM, neural networks); conclusion: number of mistakes made while typing decreases under stress conclusion: increased levels of stress significantly influence typing pressure (> 83% of the participants) and amount of mouse contact (100% of the participants) of computer users TABLE II U NOBTRUSIVE EMOTION RECOGNITION . maximum precision is 10m. However, GPS will not work indoors and can quickly kill your battery. Smartphones can try to automatically select best-suited alternative location provider (GPS, cell towers, WiFi), mostly based on desired precision. With the location, we can study the relationship between life patterns and affective states. For example, most people in playground feel happy while most feel sad in cemetery. Fig. 6. How GPS works. Location provides additional information to verify the subjective report from participants of affective studies. It may also help to build a confidence mechanism [114] for subjective reports. Attaching the location to the subjective report would produce confident weights to measure the significance of collected subjective reports. For example, a participant reported that he was happy in cemetery. But, in common sense, people in cemetery would be sad. Thus, we could set a low weight (e.g., 0.2) as a confident value to the report. Microphone: Traditional affect studies need participants to sit in lab and speak to a microphone. Now the built-in microphone of smartphones is able to record voice in the wild. If you are looking to use your phone as a voice recorder, for recording personal notes, meetings, or impromptu sounds around you, then all you need is a recording application. Important things to look for in an application are: 1) the ability to adjust gain levels; 2) change sampling rates; 3) display the recording levels on the screen, so you can make any adjustments necessary; and, perhaps not as important, 4) save the files to multiple formats (at least .wav and .mp3). Also very handy is the ability to email the recording, or save it to cloud storage, such as Dropbox. Some of the most highly recommended applications for Android include Easy Voice Recorder Pro, RecForge Pro, Hi-Q mp3 Voice Recorder, Smart Voice Recorder, and Voice Pro. For iOS, Audio Memos, Recorder Plus and Quick Record appear to be good applications. We can choose some of these great applications to collect the voice for affect analysis. Camera: The principal advantages of camera phones are cost and compactness; indeed for a user who carries a mobile phone anyway, the additional size and cost are negligible. Smartphones that are camera phones (Fig. 7) may run mobile applications to add capabilities such as geotagging and image stitching. A few high end phones can use their touch screen to direct their camera to focus on a particular object in the field of view, giving even an inexperienced user a degree of focus control exceeded only by seasoned photographers using manual focus. These properties clearly inspire spontaneous face expression capture for affective studies (e.g., building spontaneous face expression databases). 9 Fig. 7. Samsung Galaxy K Zoom. Touch screen: The increasing number of people using touchscreen mobile phones (as Fig. 8 shown) raises the question of whether touch behaviors reflect players’ emotional states. Recent studies in psychology literature [115] have shown that touch behavior may convey not only the valence of an emotion but also the type of emotion (e.g. happy vs. upset). Unfortunately, these studies have been carried out mainly in a social context (person-person communication) and only through acted scenarios. Fig. 8. Touch-screen mobile phone. [111] explored the possibility of using this modality to capture a player’s emotional state in a naturalistic setting of touch-based computer games. The findings could be extended to other application areas where touch-based devices are used. Using stroke behavior to discriminate between a set of affective states in the context of touch-based game devices is very promising. We should pave the way for further explorations with respect to not only different contexts but also different types of tactile behavior. Further studies are needed in a variety of contexts to establish a better understanding of this relationship and identify if, and how, these models could be generalized over different types of tactile behavior, activity, context and personality traits. Smartphone usage log: Since the usage of smartphone has become the natural life of human, some works [43], [116] have shown the power of application logs for affect detection. Smartphone usage data can indicate personality traits using machine learning approaches to extract features. It is also promising to study relationships between users and personalities, by building social networks with the rich contextual information available in applications usage, call and SMS logs. In addition, analyzing other modalities such as accelerometers and GPS logs remains a topic of mobile affective computing. B. Mood and emotion detection After introducing the powerful affect sensing ability of smart devices, we summarize the state-of-the-art affective computing research using smart devices. This part will focus on discussing emotion and mood detection. First, we need to clarify three terms that are closely intertwined: affect, emotions, and moods. Affect is a generic term that covers a broad range of feelings that people experience. It’s an umbrella concept that encompasses both emotions and moods [117]. Emotions are intense feelings that are directed at someone or something [118]. Moods are feelings that tend to be less intense than emotions and that often (though not always) lack a contextual stimulus. Most experts believe that emotions are more fleeting than moods [119]. For example, if someone is rude to you, you will feel angry. That intense feeling of anger probably goes fairly quickly. When you are in a bad mood, though, you can feel bad for several hours. LiKamWa et al. [43] designed a smartphone software system for mood detection, namely MoodScope, which infers the mood of its user based on how the smartphone is used. Smartphone sensors that measure acceleration, light, and other physical properties, MoodScope, however, is a “sensor” that measures the mental state of the user and provides mood as an important input to context-aware computing. They found that smartphone usage correlates well with the users’ moods. Users use different applications and communicate with different people depending on their moods. Using only six pieces of usage information, namely, SMS, email, phone call, application usage, web browsing, and location, they could build statistical usage models to estimate moods. In this work, they developed an iOS application that allowed users to report their moods conveniently Fig. 9(a) shows the primary GUI of the MoodScope Application. They also allowed users to see their previous inputs through a calendar mode, shown in Fig. 9(b), and also through a chart mode. They created a logger to collect a participant’s smartphone interaction to link with the collected moods. The logger captured user behaviour by using daemons operating in the background, requiring no user interaction. The data were archived nightly to a server over a cell data or Wi-Fi connection. The authors divided the collected data into two categories: 1) social interaction records, in that phone calls, text messages (SMS) and emails signal changes in social interactions; 2) routine activity records, in that patterns in browser history, application usage 10 and location history as coarse indicators of routine activity. Then they used a least-squares multiple linear regression to perform the modeling, simply applying the regression to the usage feature table, labeled by the daily averages of mood. Sequential Forward Selection (SFS) [120] was used to choose a subset of relevant features to accelerate the learning process. (a) Primary GUI Fig. 9. (b) Mood calendar view MoodScope application. Rachuri et al. [44] proposed EmotionSense, a mobile sensing platform for social psychology studies based on mobile phones. Key characteristics include the ability of sensing individual emotions as well as activities, verbal and proximity interactions among members of social groups. EmotionSense gathers participants’ emotions as well as proximity and patterns of conversation by processing the outputs from the sensors of smartphones. EmotionSense system consists of several sensor monitors, a programmable adaptive framework based on a logic inference engine, and two declarative databases (Knowledge Base and Action Base). Each monitor is a thread that logs events to the Knowledge Base, a repository of all the information extracted from the on-board sensors of the phones. The system is based on a declarative specification (using first order logic predicates) of facts, i.e., the knowledge extracted by the sensors about user behaviour and his/her environment (such as the identity of the people involved in a conversation with him/her); actions, i.e., the set of sensing activities that the sensors have to perform with different duty cycles, such as recording voices (if any) for 10 seconds each minute or extracting the current activity every 2 minutes. By means of the inference engine and a user-dened set of rules (a default set is provided), the sensing actions are periodically generated. The actions that have to be executed by the system are stored in the Action Base. The Action Base is periodically accessed by the EmotionSense Manager that invokes the corresponding monitors according to the actions scheduled in the Action Base. Users can define sensing tasks and rules that are interpreted by the inference engine in order to adapt dynamically the sensing actions performed by the system. The authors have presented the design of two novel subsystems for emotion detection and speaker recognition which are built on a mobile phone platform. These are based on Gaussian Mixture methods for the detection of emotions and speaker identities. EmotionSense automatically recognizes speakers and emotions by means of classifiers running locally on offthe-shelf mobile phones. A programmable adaptive system is proposed with declarative rules. The rules are expressed using first order logic predicates and are interpreted by means of a logic engine. The rules can trigger new sensing actions (such as starting the sampling of a sensor) or modify existing ones (such as the sampling interval of a sensor). The results of a real deployment designed in collaboration with social psychologists are presented. It is found that the distribution of the emotions detected through EmotionSense generally reflected the self-reports by the participants. Gao et al. [111] answered a question whether touch behaviours reflect players emotional states. Since the increasing number of people play games on touch-screen mobile phones, the question would not only be a valuable evaluation indicator for game designers, but also for real-time personalization of the game experience. Psychology studies on acted touch behaviour show the existence of discriminative affective profiles. In this paper, finger-stroke features during gameplay on an iPod were extracted and their discriminative powers were analyzed. Machine learning algorithms were used to build systems for automatically discriminating between four emotional states (Excited, Relaxed, Frustrated, Bored), two levels of arousal and two levels of valence. Accuracy reached between 69% and 77% for the four emotional states, and higher results (89%) were obtained for discriminating between two levels of arousal and two levels of valence. The iPhone game Fruit Ninja5 was selected for this study. Fruit Ninja is an action game that requires the player to squish, slash and splatter fruits. Players swipe their fingers across the screen to slash and splatter fruit, simulating a ninja warrior. Fruit Ninja has been a phenomenon in the app store with large sales, and has inspired the creation of many similar drawingbased games. As the Fruit Ninja is not an open source app, an open source variation of it (called Samurai Fruit), developed by Ansca Mobile, was used instead. It has been developed with the Corona SDK. In order to elicit different kinds of emotions from players and to be able to capture players touch behaviour, a few modifications to the Ansca’s game were implemented. The developed New Samurai Fruit game software captures and stores players’ finger-stroke behaviour during gameplay: the coordinates of each point of a stroke, the contact area of the finger at each point and the time duration of a stroke, i.e. the time occurring from the moment the finger touches the display to the moment the finger is lifted. This information is directly gathered by using tracking functions of the Corona SDK. The contact area is used here as a measure of the pressure exerted by the participants, as the device does not allow pressure to be measured directly (called feature pressure). In order to take into consideration physical differences between individuals, i.e. the dimension of the tip of a finger, a baseline pressure area is computed before the game starts. Each participant is asked to perform a stroke. The average of the contact area of each point of this stroke is used as a pressure baseline for that participant. The pressure values collected during the game are hence normalized by subtracting from them the participant’s baseline value. The authors introduced and explained the experiment process, the iPod touch device and the game rules of New Samurai Fruit to participants and answered their questions. The participants were then asked to play the game and get familiar with it. After the training session, the participants were asked to play and try to complete 20 consecutive levels Intelligent Computing Laboratory, Future IT Research Center Samsung Advanced Institute of Technology (SAIT), Samsung Electronics Co., Ltd. Nongseo-dong, Giheung-gu, Yongin-si, Gyeonggi-do, Korea 446-712 {horus.lee, macho, sunjae79.lee, ilpyung.park}@samsung.com 11 of the game. As a form of motivation, they were informed that emotional state directly with emoticons or indirectly with ideas, interests or their activities at any respond time, andtoanywhere with the participant with the highest final score would have been written text, thereby followers her/his message Abstract—Awareness of the emotion of those who communicate their friends. rewarded with an others iTunesis gift card worthchallenge 25 pounds. The game more actively or even empathize with them. In psychology, this with a fundamental in building affective More specifically, a SNS user often expresses her/his feeling was video recorded. order toEmotion decide isthea ground themind kindorofemotional phenomenon is called emotional contagion. intelligentInsystems. complextruth state (i.e. of the state directly withthe emoticons or indirectly withThat influenced by external events, emotional state associated with each level),physiological participantschanges, were or is, communication between users would be newly message induced or written text, thereby their followers respond to her/his with others. Because emotions can represent a user’s first asked torelationships fill out a self-assessment questionnaire after every enriched by the or sharing of emotion. TheyIndefined thesethis kinds more actively even empathize with them; psychology, internal context or intention, researchers suggested various game level. methods To reduce the effect the game results the of of communicational asemotional affectivecontagion social communicakind of phenomenonactivities is called the [1]. That to measure theofuser’s emotions from on analysis signals, expressions, or voice. between users wouldsome be newly or labelling, thephysiological Cued Recall Brieffacial approach [121] was used.However, At tion is, as communication depicted in Fig. 10. However, SNSinduced users hardly existing methods have practical limitations to be used with enriched by the sharing of emotion. We defined these kind of or the end of theconsumer 20 levels,devices, the player was shown a recorded video show what they feel, therefore their friends cannot sense such as smartphones; they may cause communicational activities as affective social communication of the gameplay s/he had just and carried out. While the video to their emotional states appropriately. inconvenience to users require special equipment such as react a as depicted in Figure 1. sensor. Our is to recognize emotions was playing skin s/heconductance was encouraged toapproach recall his/her thoughts of states the user inconspicuously collecting and analyzing and emotional and by if necessary relabel the levels. user-generated data from different types of sensors on the The results of visual andadopted featurea machine extraction smartphone. To inspection achieve this, we learning to gather, analyze and classify usage patterns, showed that approach finger-stroke behaviour allows fordevice the discrimideveloped a social service client as forwell Android nation of theandfour affective statesnetwork they investigated, John's Emotion: John's Social smartphones which unobtrusively find various behavioral Happy Communication Activity as for the discrimination two levels arousal and a John patterns and the between current context of users.ofAlso, we conducted w/ his Emotion between twopilot levels of valence. resultsdata showed high study to gather The real-world whichvery imply various andcross situations of a participant her/his sample everyday life. performance behaviors on both validations on in unseen 1. Gather and Analyze Sensor Data From these data, we extracted 10 features and applied them to 2. Recognize User Emotion Unobtrusively data. To investigate person-independent emotion recognition 3. Append User Emotion to Communication build a if Bayesian Network classifier for emotion recognition. models couldExperimental reach similar results, results showthree that modelling our system algorithms can classify user Shawn: Hey, what makes you happy? emotions into 7 classes such as happiness, surprise, anger, disgust, were selected: Discriminant Analysis (DA), Artificial Neural John: My paper accepted for CCNC 2012!! sadness, fear, and neutral with a surprisingly high accuracy. The J Kim: Congratulation man Network (ANN) with Back Propagation and Support Vector J.Y: me too!! proposed system applied to a smartphone demonstrated the John: Dinner is my treat! Machine (SVM) classifiers. The lastemotion two learning algorithms feasibility of an unobtrusive recognition approach and a [Communication induced by the user emotion] John's Followers user scenario emotion-oriented socialgreat communication between Fig. 1. Conceptual Diagram of Affective Social Communication. were selected for their for popularity and their ability for users. generalization. The models for Arousal produced the best cor- Fig. 10. Diagram of Affective Social Communication. However, some SNS users hardly show what they feel; rect recognitionIndex results ranging from computing, 86.7% (kernel SVM)mediated to Terms—Affective Computer therefore their friends cannot sense or react to their emotional paper presented This a machine learning approach 88.7% (linearcommunication, SVM). In theEmotion case of Valence, theMachine performances recognition, intelligence, This states appropriately. is probably because they to are reclearning emotional states of a smartphone user without ranged fromSupervised 82% (linear SVM) to 86% (kernel SVM). The ognize unfamiliar with the expression of emotion or are not aware of any or extra cost for equipping sensors. results for the discrimination of the 4 emotion labels were also inconvenience their own emotions enough. Possible solutionadditional for this problem I. INTRODUCTION is adopting emotion recognition technologies are being current emotion of the user, theywhich gathered various well above chance level; ranging from 69% (kernel SVM) to To perceive OCIAL networking service (SNS) like Twitter, Facebook or the affective computingThese researchsensor societydata typesextensively of sensorstudied data by from the smartphone. 77% (linear SVM). Google+ is an online service platform which aims for emotions of types the user. emotion and categorized into two suchExisting as behaviour Overall, this work has investigated the possibility of using can tobe determine building and managing social relationships between people. On recognition technologies can be divided into three major context of the user, and it was collected while the user used stroke behaviour to discriminate between a set of affective the SNS, users represent themselves in various ways such as categories depending on what kinds of data is analyzed for a certain application on her/his smartphone. As the proof states in the profiles, context pictures, of touch-based game devices. The results or text messages, and interact with other recognizing human emotion: physiological signals, facial of concept, they developed an Android application named are very promising and pavetheir theacquaintances. way for further explorations people including By using various SNS expressions, or voice. Physiological emotion recognition mobile devices, furthermore, can share affective client which collects sensor with respect applications to not onlyon different contexts but also users different showstwitter acceptable performance butabove has mentioned some critical whenever that the user sends a text message (i.e., to the types of tactile behaviour. Further studies are needed in a data weaknesses prevent its widespread use; they are tweet) obtrusive to users special devices.they For example, Viaand theneed pilot studyequipment for two orweeks, collectedif 314 variety of contexts to establish a better understanding of this Twitter. relationship and identify if, and how, these models could be dataset including self-reported emotional states of the user, generalized over different types of tactile behaviour, activity, and utilized it to build Bayesian Network classifier which is a powerful probabilistic classification algorithm based on context and personality traits. Lee et al. [122] proposed an approach to recognize emo- the Bayes theorem. Through repetitive experiments including tions of the user by inconspicuously collecting and analyzing preprocess, feature selection, and inference (i.e., supervised 978-1-4577-2071-0/12/$26.00 ©2012on IEEE 260 the classifier can classify each tweet written by the user-generated data from different types of sensors the learning), smartphone. To achieve this, they adopted a machine learning user into 7 types with a satisfactory accuracy of 67.52% on approach to gather, analyze and classify device usage patterns, average. Classifiable types of emotions include Ekmans six and developed a social network service client for Android basic emotions [123] and one neutral state. To determine the current emotion of users, they adopted a smartphones which unobtrusively found various behavioural machine learning approach consisting of the data collection, patterns and the current context of users. Social networking service (SNS) like Twitter, Facebook or data analysis, and classification process. For these tasks, they Google+ is an online service platform which aims for building used a popular machine learning software toolkit named Weka and managing social relationships between people. On the [124]. Data collection process gathered various sensor data SNS, users represent themselves in various ways such as from the smartphone when a participant used AT Client profiles, pictures, or text messages, and interact with other installed on their device. In more detail, if a participant felt a people including their acquaintances. By using various SNS specific emotion at the certain moment in their everyday life, applications on mobile devices, furthermore, users can share they would write some short text messages and were asked ideas, interests or activities at any time, and anywhere with to report their current emotion via AT Client. Meanwhile, their friends. A SNS user often expresses her/his feeling or AT Client collected various sensor data during this period. S with the emotion. All of these features are formalized as an attribute-relation file format (ARFF) for Weka. List and s, we adopted a descriptions of features are summarized in Table I. data collection, Next, we analyzed collected training data using Weka. As the these tasks, we first step, we discretized features which have a continuous kit named Weka numerical value such as a Typing Speed. This procedure is for making all features numerical and any It is necessary also possible that some userssuitable write afortweet without statistical computation on the Weka (i.e., preprocess step). emotions; this emotional state might be a neutral, and they Then, we ranked all features by measuring information gain also with tookrespect this as training data because the neutral emotion to each class. Information gain is one of the most itselfpopular is a frequently observed and state in emotion metrics of association correlation amongrecognition several [125]. From variables gathered[13]. sensor data,onthey features random Based the extracted result of 14 attribute evaluation, finally selectedcorrelations 10 features which more which seem towehave potential with have the emotion. strong correlation emotion toasbuild an inference model file All of these featureswith are the formalized an attribute-relation (i.e., feature selection step). Shadowed cells of Table I are format (ARFF) for Weka. selected features and reasonable explanations for this feature They analyzed collected training data using selection are presented in the experimental study Weka. section. By By the repetitive experiments, they using chose10-fold Bayesian classifier the repetitive experiments crossNetwork validation, we eet to Twitter when chose model BayesianforNetwork classifier because as an inference as anfinally inference their system, it showed ociated sensor data model for our system, because it showedother highest classification ilized for the data highest classification accuracy among machine learning accuracysuch among machine learning classifiers such a classifiers as other a Nave Bayes, decision tree, orasneural Naïve Bayes, decision tree, or neural network. Sample network. Fig.Network 11 shows sample Figure 2. In this Bayesian is drawn at theBayesian Figure 3. Network. the smartphone their device. In emotion at the ould write some ort their current Client collected lso possible that s; this emotional as training data Data Record per Each Tweet 12 They applied a Bayesian approach to reconstruct the neural sources, demonstrating the ability to distinguish among emotional responses reflected in different scalp potentials when viewing pleasant and unpleasant pictures compared to neutral content. This work may not only facilitate differentiation of emotional responses but also provide an intuitive interface for touch based interaction, allowing for both modelling the mental state of users as well as providing a basis for novel bio-feedback applications. The setup was based on a portable wireless Emotiv Research Edition neuroheadset (http://emotiv.com) which transmits the EEG and control data to a receiver USB dongle, originally intended for a Windows PC version of the Emotiv research edition SDK. They connected the wireless receiver dongle to a USB port on a Nokia N900 smartphone with Maemo 5 OS. Running in USB hostmode they decrypted the raw binary EEG data transmitted from the wireless headset, and in order to synchronize the stimuli with the data they timestamped the first and last packets arriving at the beginning and end of the EEG recording. Eight male volunteers from the Technical University of Denmark, between the ages of 26 and 53 participated in the experiment. on are generated vior on the default EditText. From habits of users in All values are Fig. 3. Sample Bayes. Network drawn by GeNIe (http://genie.sis.pitt.edu/) software Fig. 11. Sample Bayesian Network. mes user invokes in EditText to At last, our inference model was continuously updated by the ons like cursor Through a pilot study for two weeks, they gathered 314 unknown real-world data and classified it into 7 emotional on, and copy & real-world dataset from a participant erical. states (i.e., learning & inference step). and utilized it to validate evice is shaken; it the emotion recognition approach by measuring classification C. System Architecture accuracy. One subject from the research group members (one A block diagram of AT Client is shown in Figure 4. Data male in his 30s) was recruited for the pilot study; he wrote Aggregator gathers various data from internal/external sensors: n the formula of tweets and indicated his emotional state whenever he felt a it collects sensor data or information like coordinates of touch w) + 15 where Ta emotion in his daily life.of the device, current location, positions, degree of movement F), Tw = wet-bulb certain umerical value. weather received from webvia through running software in Byor evaluating all features information gain attribute Entertain., Etc smartphones. Then, Data Aggregator divides data into two evaluation algorithm and concerning some facts, they selected ning, Night classes such as user behavior-related and context-related data, defined by the 10 features as primary attributes of training data for the and delivers these data to User Behavior Analyzer and User Bayesian Network The feature which has the highest Context Analyzerclassifier. respectively. se, Anger, Disgust, correlation to emotions was the speed of typing; in addition, lengths of inputted text, shaking of the device, or user location were also important features for emotion recognition. The average classification accuracy is 67.52% for 7 emotions. For emotions such as happiness, surprise, and neutral, the approach performed appreciably better than chance (i.e., 262 and in the case of anger and disgust, it also outper50%), formed at least random classification. Classification accuracy is irregular for emotion types, but they found a general correlation between the number of observation cases and classification accuracy. This finding means that low accuracy in classification for sadness and fear may be improved with additional data collection. Petersen et al. [126] rendered the activations in a 3D brain model on a smartphone. They combined a wireless EEG headset with a smartphone to capture brain imaging data reflecting the everyday social behaviour in a mobile context. htness; it has a Fig. 12. The wireless neuroheadset transmits the brain imaging data via a receiver USB dongle connected directly to a Nokia N900 smartphone. Fig. 12 shows the smartphone-based interface. Applying a Bayesian approach to reconstruct the underlying neural sources may thus provide a differentiation of emotional responses based on the raw EEG data captured online in a mobile context, as the current implementation is able to visualize the activations on a smartphone with a latency of 150ms. The early and late ERP components are not limited to the stark emotional contrasts characterizing images selected from the IAPS collection. Whether they read a word with affective connotations, came across something similar in an image or recognized from the facial expression that somebody looks sad, the electrophysical patterns in the brain seem to suggest that the underlying emotional processes might be the same. The ability to continuously capture these patterns by integrating wireless EEG sets with smartphones for online processing of brain imaging data may offer completely new opportunities for modelling the mental state of users in real life scenarios as well as providing a basis for novel bio-feedback applications. Kim et al. [113] investigated human behaviours related with 13 References Data Sensor monitors LiKamWa et al. [43] SMS, email, phone call, application usage, web browsing, and location Location, voice records, sensing data GPS Rachuri [44] et al. GPS, Accelerometer, Bluetooth, Microphone Touch panel Gao et al. [111] Touch pressure, duration, etc. Lee et al. [122] Touch behaviour, shake, illuminance, text, location, time, weather Touch panel, GPS, accelerometer Petersen et al. [126] EEG data Kim et al. [113] Movement, position and application usage Wireless EEG headset, touch panel Accelerometer, touch panel, gyroscope, application logs Modeling technique and classifier Multiple linear regression, Sequential forward selection Gaussian Mixture Model classifier, Hidden Markov Models, Maximum A Posteriori Estimation Discriminant Analysis, Artificial Neural Network with Back Propagation, Support Vector Machine Bayesian Network classifier Bayesian EM approach Naive bayesian, bayesian network, simple logistic, best-first decision tree, C4.5 decision tree and naive bayesian decision tree Participants 32 18 15 314 realworld dataset from 1 participant 8 N/A TABLE III M OBILE A FFECT D ETECTION . the touch interface on a smartphone as a way to understand users emotional states. As modern smartphones have various embedded sensors such as accelerometer and gyroscope, they aimed to utilize data from these embedded sensors for recognizing human emotion and further finding emotional preferences for smartphone applications. They collected 12 attributes from 3 sensors during users’ touch behaviours, and recognized seven basic emotions with off-line analysis. Finally, they generated a preference matrix of applications by calculating the difference between prior and posterior emotional states to the application usage. The pilot study showed 0.57 of an average f1-measure score (and 0.82 with decision tree based methods) with 455 cases of touch behaviour. They discovered that a simple touching behaviour has a potential for recognizing users’ emotional states. This paper proposed an emotion recognition framework for smartphone applications and the analysis of an experimental study. The framework consists of an emotion recognition process and emotional preference learning algorithm which learns the difference between two emotional state; the prior and the posterior emotional state for each entities in a mobile device such as downloaded applications, media contents and contacts of people. Once the entities emotional preference has been made, they can recommend smartphone applications, media contents and mobile services that fit to users current emotional state. From the emotion recognition process, they tracked the temporal changes of the users emotional states. The proposed emotion based framework has a potential to help users to experience emotional closeness and personalized recommendations. Fig. 13 shows an overall process that consists of monitoring, recognition and emotional preference learning module. The monitoring module collects devices sensory data such as touch and accelerometer with its occurrence time. The emotion recognition module processes the collected data to A N E XAMPLE O BASED O Relation R1 Tr R2 Fig. 1. The Conceptual Process of Emotion Recognition, Generation and Fig. 13. The Preference Conceptual Process of Emotion Recognition, Generation and Emotional Learning B N B Life C Emotional Preference Learning. speech and contextual information are frequently used to recognize human emotion. Many researchers proposed practical inferapproaches the humans emotional state. necessary, emotion to apply emotions to When machines, especiallythe robots, recognition module can havecharacters. additional attributes such as virtual agents and embodied According to recent Definition 1. (E studies,interaction machine’s and embedded emotional aspect and memory, moodshuman-like to enhance its accuracy {u1 , u2 , . . . , uk } increases positive reactions from the human users[6]. Beyond performance. As in the bottom part of Fig. 13, the emotional and a set of r robotics, learning several studies tried is to integrate emotional to an preference module responsible for factors building e = hU ˆ, A tuple E consumer electronics device[4][5]. ˆ × Aˆ wh emotional preference matrix for the smartphone applications R2 ⊆ U (and orIII. their embedding objects)ON byMlearning the difference respectively. E MOTION R ECOGNITION OBILE P LATFORM of quantificated emotional statea typical between prior and posterior Unlike the bi Emotion based systems have process of recognition behaviours. of human emotion and generation of machine emotion. The relationship R2 include text, mu processhave has several stagespreferences. for the recognition - classification, People their own To discover this, many droid market[16 quantification and mapping stages. Through this, a machine technologies are depended on user or item based stochastic can recognize a user’s emotion, and optionally, the machine belonging to ea approaches. But almost cases ignore the users’ emotional can generates its own emotional state or make recommenda- mentioned abov factor. From the emotion recognition gave each tion such as Fac tions for users. Fig.1 shows an overallmodule, process they that consists application and object an emotional preference by learning the application of monitoring, recognition and emotional preference learning the as an object. differences users’ emotional statedevice’s duringsensory the usedata of apmodule. of Thethe monitoring module collects such as touch and accelerometer occurrence plication. Smartphone users interactwith withitstheir devicestime. through The emotion recognition module processes data B. Emotion Rec sequential or non-sequential activities suchthe as collected dragging, flickinfer the human’s applications, emotional state.viewing When necessary, ing, to tapping, executing contents the and so As we already emotion recognition module can have additional attributes on. such as memory, interaction and moods to enhance its accuracy nition process as and performance. As in the bottom part of Fig.1, the emotional preference learning module is responsible for building an emotional preference matrix for the smartphone applications and mapping sta for classifying s touch action can multi-touching. 14 For the training of user’s emotional state, they periodically received emotion feedback. They tried to minimize noises during the experiment. Because users moving and position can influence accelerometer or gyroscope, they guided a subject sit and keep the same position whenever he performs an action with the smartphone. With the emotion recognition module, they found each applications emotional preferences by learning differences of the emotion attributes. Although the experimental result showed comparatively low performance, they showed the possibility of touch-based emotion recognition method which has wide applicability since modern smart devices are dependent on touch interaction. They argued that they could enhance the recognition performance by additional data collection and found remedies for practical issues for the future works. Collecting the self reported emotion data is one of the biggest challenge for the research. Some emotions such as anger, fear and surprise are not frequently occurring in real life situation and they are hard to be caught during the experiments. In summary, this work proposed an emotion recognition based smartphone application preference learning. From the experiment, they discovered that a simple touching behaviour can be used for recognizing smartphone users’ emotional state, and reducing the number of emotions can improve the recognition accuracy. They also found that the window size for the recognition depended on the type of emotions and classification methods. V. C HALLENGES AND OPPORTUNITIES The unobtrusive affect detection requires sensors deployment on common used devices. While the mobile affective computing takes advantage of embedded sensors in smart devices. Even researchers have put a lot of efforts, we only touched the tip of the iceberg. Although the accuracy of some works has reached a relative high level, we still are very far away from the usefulness. This section will highlight the challenges on MAC and discuss some research potentials. The existing spontaneous affective data were collected in the following scenarios: human-human conversation, HCI, and use of a video kiosk. Human-human conversation scenarios include face-to-face interviews [17], [127]. HCI scenarios include computer-based dialogue systems [21], [79]. In the video kiosk settings, the subjects’ affective reactions are recorded while the subjects are watching emotion-inducing videos [47], [128], [129]. In most of the existing databases, discrete emotion categories are used as the emotion descriptors. The labels of prototypical emotions are often used, especially in the databases of deliberate affective behaviour. In databases of spontaneous affective behaviour, coarse affective states like positive versus negative, dimensional descriptions in the evaluation-activation space and some application-dependent affective states are usually used as the data labels. Currently, several large audio, visual, and audiovisual sets of human spontaneous affective behaviour have been collected, some of which are released for public use. Friendly application design is also challenging for mobile emotion detection. Some works (e.g., [130]) have designed some interesting applications (Fig. 9). The friendly design can make users feel more comfortable for affective feedback collection. The arising problem is the self-interference that such design could incur some effects on user’s affect. It is nontrivial to evaluate the self-interference and find a compensation to the results. Besides, the widespread use of smart devices may change user’s social behaviour, which affects the tradition emotional social relationship. The lack of the coordination mechanism of affective parameters under multi-model condition quite limits the affective understanding and the affect prompts. The amalgamation of different channels is not just the combination of them, but to find the mutual relations among all channel information. The mutual relation could make better integration of the different channels during interaction phases for both recognition/understanding and information generation. B. Potential research A. Challenges So far, the research stays on the superficial of affective computing problems. The conclusion either proposed a high correlation between collected data and results or showed reasonable accuracy of affect detection under studies with very few participants. All these give us hopes and inspire more and more talents to be devoted to solving remaining problems. The primary challenge should be the ground truth establishment. To our knowledge, researchers recruited participants to fill some questionnaires for obtaining ground truth. Alternatively, some appropriate databases (audio, visual expressions) can be used to train the classifier. However, most of databases are built by recording emotion expressions of actors. Psychological research suggests that deliberate emotion expressions differ in spontaneous occurring expressions [15], [16]. Spontaneous emotion expression database establishment is quite expensive and inefficient, in that we have to manually label the raw data. Nevertheless, researchers have conquered the difficulties to build large spontaneous affective data sets. The smart devices have enough power and popularity to offer an ubiquitous affect detection, in that they keep tracking people’s everyday life to constantly collecting affective data. It is possible that through the mobile application markets (e.g., Apple Store, Google Play), we could deploy almost unlimited the affect detection applications among users. This enables an individual affective services, in which the application will learn individual mood and emotion patterns and provide the affect detection in custom. In this way, we can keep collecting people’s spontaneous affective data in their daily lives. This will eventually efficiently and effectively build a scalable in custom spontaneous affective databases,which are helpful to extract common affective features of human and find distinguishable fingerprints among different users. In addition, lots of new HCI and smart wearable devices have gain their popularity so that the affective fingerprints (e.g., voice, facial expression, gesture, heart rate and so on) could be fully investigated by combining multiple wearable devices. Therefore, designing an affective eco-system that 15 coordinates all devices for data fusion is very promising to boost the affect detection performance and explore the possibility of further affective computing techniques. Moreover, the emergence of the World Wide Web, the Internet and new technologies (e.g., mobile augmented reality [131], [132]) have changed life styles of human. It is necessary to discover new emotional features, which may exist in application logs, smart device usage patterns, locations, order histories, etc. There is a great need to thoroughly monitor and investigate the new emotional expression fingerprints. In other words, establishing new affective databases in terms of new affective fingerprints should be a very significant research topic. VI. C ONCLUSION Research on the human affect detection has witnessed a good deal of progress. At those time, affective computing research relied on well-controlled experiments in lab. Available data sets are either deliberate emotional expressions from actors or small scale spontaneous expressions established at a substantial cost. The obtrusive data collection methods in lab also restrict the effectiveness of the research. Today, new technologies bring us new opportunities of affect detection in an unobtrusive and smart way. A number of promising methods for unobtrusive tool based (webcam, keystroke, mouse) and smartphone based of human affect detection have been proposed. This paper focuses on summarising and discussing these novel approaches to the analysis of human affect and reveal the most important issues yet to be addressed in the field include the following: • Building a scalable in custom databases, which could extract common affect features and distinguishable affect fingerprints of human, • Devising an affective ecosystem that coordinates different smart wearable devices, which takes into consideration the most comprehensive affective data for boosting affect detection accuracy, • Investigating potential new affect-related features due to the changing life styles. Since the complexity of these issues concerned with the interpretation of human behaviour at a very deep level is tremendous and requires a highly interdisciplinary collaboration, we believe the true break-throughs in this field can be experienced by establishing an interdisciplinary international program directed toward computer intelligence of human behaviours. The research on unobtrusive and mobile affective computing can aid in advance the research in multiple related research fields including education, psychology and social science. R EFERENCES [1] R. W. Picard, Affective Computing. Cambridge, MA, USA: MIT Press, 1997. [2] C. Darwin, The expression of the emotions in man and animals. Oxford University Press, 1998. [3] W. James, What is an Emotion? Wilder Publications, 2007. [4] W. D. Taylor and S. K. Damarin, “The second self: Computers and the human spirit,” Educational Technology Research and Development, vol. 33, no. 3, pp. 228–230, 1985. [5] J. C. Hager, P. Ekman, and W. V. Friesen, “Facial action coding system,” Salt Lake City, UT: A Human Face, 2002. [6] R. Cowie, E. Douglas-Cowie, and C. Cox, “Beyond emotion archetypes: Databases for emotion modelling using neural networks,” Neural networks, vol. 18, no. 4, pp. 371–388, 2005. [7] R. Banse and K. R. Scherer, “Acoustic profiles in vocal emotion expression.” Journal of personality and social psychology, vol. 70, no. 3, p. 614, 1996. [8] L. S.-H. Chen, “Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction,” Ph.D. dissertation, Citeseer, 2000. [9] R. Gross, “Face databases,” in Handbook of Face Recognition. Springer, 2005, pp. 301–327. [10] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive database for facial expression analysis,” in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000, pp. 46–53. [11] H. Gunes and M. Piccardi, “A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior,” in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, vol. 1. IEEE, 2006, pp. 1148–1153. [12] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 2005, pp. 5–pp. [13] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on. IEEE, 2006, pp. 211–216. [14] J. J. Gross and R. W. Levenson, “Emotion elicitation using films,” Cognition & Emotion, vol. 9, no. 1, pp. 87–108, 1995. [15] C. Whissell, “The dictionary of affect in language.” Emotion: Theory, Research, and Experience. The Measurement of Emotions, vol. 4, pp. 113–131, 1989. [16] P. Ekman and E. L. Rosenberg, What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, 1997. [17] M. S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan, “Recognizing facial expression: machine learning and application to spontaneous behavior,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 568–573. [18] A. B. Ashraf, S. Lucey, J. F. Cohn, T. Chen, Z. Ambadar, K. M. Prkachin, and P. E. Solomon, “The painful face–pain expression recognition using active appearance models,” Image and Vision Computing, vol. 27, no. 12, pp. 1788–1796, 2009. [19] J. F. Cohn and K. L. Schmidt, “The timing of facial motion in posed and spontaneous smiles,” International Journal of Wavelets, Multiresolution and Information Processing, vol. 2, no. 02, pp. 121–132, 2004. [20] M. F. Valstar, M. Pantic, Z. Ambadar, and J. F. Cohn, “Spontaneous vs. posed facial behavior: automatic analysis of brow actions,” in Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006, pp. 162–170. [21] C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 2, pp. 293–303, 2005. [22] A. Batliner, K. Fischer, R. Huber, J. Spilker, and E. N¨oth, “How to find trouble in communication,” Speech communication, vol. 40, no. 1, pp. 117–143, 2003. [23] S. Greengard, “Following the crowd,” Communications of the ACM, vol. 54, no. 2, pp. 20–22, 2011. [24] R. Morris, “Crowdsourcing workshop: the emergence of affective crowdsourcing,” in Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems. ACM, 2011. [25] M. Soleymani and M. Larson, “Crowdsourcing for affective annotation of video: Development of a viewer-reported boredom corpus,” in Proceedings of the ACM SIGIR 2010 workshop on crowdsourcing for search evaluation (CSE 2010), 2010, pp. 4–8. [26] S. Klettner, H. Huang, M. Schmidt, and G. Gartner, “Crowdsourcing affective responses to space,” Kartographische Nachrichten-Journal of Cartography and Geographic Information, vol. 2, pp. 66–73, 2013. [27] G. Tavares, A. Mour˜ao, and J. Magalhaes, “Crowdsourcing for affective-interaction in computer games,” in Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimedia. ACM, 2013, pp. 7–12. 16 [28] B. G. Morton, J. A. Speck, E. M. Schmidt, and Y. E. Kim, “Improving music emotion labeling using human computation,” in Proceedings of the acm sigkdd workshop on human computation. ACM, 2010, pp. 45–48. [29] S. D. Kamvar and J. Harris, “We feel fine and searching the emotional web,” in Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011, pp. 117–126. [30] “Amazon mechanical turk,” https://www.mturk.com/mturk/welcome. [31] “Worldwide smartphone user base hits 1 billion,” http://www.cnet.com/ news/worldwide-smartphone-user-base-hits-1-billion/. [32] J. F. Cohn, “Foundations of human computing: facial expression and emotion,” in Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006, pp. 233–238. [33] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, “Emotion recognition in human-computer interaction,” Signal Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [34] M. Pantic, A. Pentland, A. Nijholt, and T. S. Huang, “Human computing and machine understanding of human behavior: a survey,” in Artifical Intelligence for Human Computing. Springer, 2007, pp. 47– 71. [35] A. Pentland, “Socially aware, computation and communication,” Computer, vol. 38, no. 3, pp. 33–40, 2005. [36] C. Darwin, The expression of the emotions in man and animals. University of Chicago press, 1965, vol. 526. [37] B. Parkinson, Ideas and realities of emotion. Psychology Press, 1995. [38] J. J. Gross, “Emotion regulation: Affective, cognitive, and social consequences,” Psychophysiology, vol. 39, no. 3, pp. 281–291, 2002. [39] ——, “The emerging field of emotion regulation: An integrative review.” Review of general psychology, vol. 2, no. 3, p. 271, 1998. ¨ [40] A. Ohman and J. J. Soares, “”unconscious anxiety”: phobic responses to masked stimuli.” Journal of abnormal psychology, vol. 103, no. 2, p. 231, 1994. [41] ——, “On the automatic nature of phobic fear: conditioned electrodermal responses to masked fear-relevant stimuli.” Journal of abnormal psychology, vol. 102, no. 1, p. 121, 1993. [42] M. S. Clark and I. Brissette, “Two types of relationship closeness and their influence on people’s emotional lives.” 2003. [43] R. LiKamWa, Y. Liu, N. D. Lane, and L. Zhong, “Moodscope: Building a mood sensor from smartphone usage patterns,” in Proceeding of the 11th annual international conference on Mobile systems, applications, and services. ACM, 2013, pp. 389–402. [44] K. K. Rachuri, M. Musolesi, C. Mascolo, P. J. Rentfrow, C. Longworth, and A. Aucinas, “Emotionsense: a mobile phones based adaptive platform for experimental social psychology research,” in Proceedings of the 12th ACM international conference on Ubiquitous computing. ACM, 2010, pp. 281–290. [45] B. Fasel and J. Luettin, “Automatic facial expression analysis: a survey,” Pattern Recognition, vol. 36, no. 1, pp. 259–275, 2003. [46] O. Pierre-Yves, “The production and recognition of emotions in speech: features and algorithms,” International Journal of Human-Computer Studies, vol. 59, no. 1, pp. 157–183, 2003. [47] M. Pantic and M. S. Bartlett, “Machine analysis of facial expressions,” 2007. [48] M. Pantic and L. J. Rothkrantz, “Toward an affect-sensitive multimodal human-computer interaction,” Proceedings of the IEEE, vol. 91, no. 9, pp. 1370–1390, 2003. [49] M. Pantic, N. Sebe, J. F. Cohn, and T. Huang, “Affective multimodal human-computer interaction,” in Proceedings of the 13th annual ACM international conference on Multimedia. ACM, 2005, pp. 669–676. [50] N. Sebe, I. Cohen, and T. S. Huang, “Multimodal emotion recognition,” Handbook of Pattern Recognition and Computer Vision, vol. 4, pp. 387–419, 2005. [51] A. Samal and P. A. Iyengar, “Automatic recognition and analysis of human faces and facial expressions: A survey,” Pattern recognition, vol. 25, no. 1, pp. 65–77, 1992. [52] Y.-L. Tian, T. Kanade, and J. F. Cohn, “Facial expression analysis,” in Handbook of face recognition. Springer, 2005, pp. 247–275. [53] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expression perception and recognition: A survey,” Affective Computing, IEEE Transactions on, vol. 4, no. 1, pp. 15–33, 2013. [54] J. Tao and T. Tan, “Affective computing: A review,” in Affective computing and intelligent interaction. Springer, 2005, pp. 981–995. [55] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 1, pp. 39–58, 2009. [56] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and trends in information retrieval, vol. 2, no. 1-2, pp. 1–135, 2008. [57] S. Brave and C. Nass, “Emotion in human-computer interaction,” The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, pp. 81–96, 2003. [58] K. R. Scherer, T. B¨anziger, and E. Roesch, A Blueprint for Affective Computing: A sourcebook and manual. Oxford University Press, 2010. [59] D. Gokcay, G. Yildirim, and I. Global, Affective computing and interaction: Psychological, cognitive, and neuroscientific perspectives. Information Science Reference, 2011. [60] R. A. Calvo and S. K. D’Mello, New perspectives on affect and learning technologies. Springer, 2011, vol. 3. [61] J. A. Russell, “Core affect and the psychological construction of emotion.” Psychological review, vol. 110, no. 1, p. 145, 2003. [62] J. A. Russell, J.-A. Bachorowski, and J.-M. Fern´andez-Dols, “Facial and vocal expressions of emotion,” Annual review of psychology, vol. 54, no. 1, pp. 329–349, 2003. [63] L. F. Barrett, “Are emotions natural kinds?” Perspectives on psychological science, vol. 1, no. 1, pp. 28–58, 2006. [64] L. F. Barrett, B. Mesquita, K. N. Ochsner, and J. J. Gross, “The experience of emotion,” Annual review of psychology, vol. 58, p. 373, 2007. [65] T. Dalgleish, B. D. Dunn, and D. Mobbs, “Affective neuroscience: Past, present, and future,” Emotion Review, vol. 1, no. 4, pp. 355–368, 2009. [66] T. Dalgleish, M. J. Power, and J. Wiley, Handbook of cognition and emotion. Wiley Online Library, 1999. [67] M. Lewis, J. M. Haviland-Jones, and L. F. Barrett, Handbook of emotions. Guilford Press, 2010. [68] R. J. Davidson, K. R. Scherer, and H. Goldsmith, Handbook of affective sciences. Oxford University Press, 2003. [69] M. Minsky, The Society of Mind. New York, NY, USA: Simon & Schuster, Inc., 1986. [70] S. DMello, R. Picard, and A. Graesser, “Towards an affect-sensitive autotutor,” IEEE Intelligent Systems, vol. 22, no. 4, pp. 53–61, 2007. [71] M. S. Bartlett, G. Littlewort, B. Braathen, T. J. Sejnowski, and J. R. Movellan, “A prototype for automatic recognition of spontaneous facial actions,” in Advances in neural information processing systems. MIT; 1998, 2003, pp. 1295–1302. [72] J. F. Cohn, L. I. Reed, Z. Ambadar, J. Xiao, and T. Moriyama, “Automatic analysis and recognition of brow actions and head motion in spontaneous facial behavior,” in Systems, Man and Cybernetics, 2004 IEEE International Conference on, vol. 1. IEEE, 2004, pp. 610–616. [73] A. Kapoor, W. Burleson, and R. W. Picard, “Automatic prediction of frustration,” International Journal of Human-Computer Studies, vol. 65, no. 8, pp. 724–736, 2007. [74] G. C. Littlewort, M. S. Bartlett, and K. Lee, “Faces of pain: automated measurement of spontaneousallfacial expressions of genuine and posed pain,” in Proceedings of the 9th international conference on Multimodal interfaces. ACM, 2007, pp. 15–21. [75] I. Arroyo, D. G. Cooper, W. Burleson, B. P. Woolf, K. Muldner, and R. Christopherson, “Emotion sensors go to school.” in AIED, vol. 200, 2009, pp. 17–24. [76] L. Devillers, L. Vidrascu, and L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection,” Neural Networks, vol. 18, no. 4, pp. 407–422, 2005. [77] L. Devillers and L. Vidrascu, “Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs.” in Interspeech, 2006. [78] B. Schuller, J. Stadermann, and G. Rigoll, “Affect-robust speech recognition by dynamic emotional adaptation,” in Proc. speech prosody, 2006. [79] D. J. Litman and K. Forbes-Riley, “Predicting student emotions in computer-human tutoring dialogues,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004, p. 351. [80] B. Schuller, R. J. Villar, G. Rigoll, M. Lang et al., “Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition,” in Proc. ICASSP, vol. 5, 2005, pp. 325–328. [81] R. Fernandez and R. W. Picard, “Modeling drivers speech under stress,” Speech Communication, vol. 40, no. 1, pp. 145–159, 2003. [82] S. Mota and R. W. Picard, “Automated posture analysis for detecting learner’s interest level,” in Computer Vision and Pattern Recognition Workshop, 2003. CVPRW’03. Conference on, vol. 5. IEEE, 2003, pp. 49–49. 17 [83] S. D’Mello and A. Graesser, “Automatic detection of learner’s affect from gross body language,” Applied Artificial Intelligence, vol. 23, no. 2, pp. 123–150, 2009. [84] N. Bianchi-Berthouze and C. L. Lisetti, “Modeling multimodal expression of users affective subjective experience,” User Modeling and User-Adapted Interaction, vol. 12, no. 1, pp. 49–84, 2002. [85] G. Castellano, M. Mortillaro, A. Camurri, G. Volpe, and K. Scherer, “Automated analysis of body movement in emotionally expressive piano performances,” 2008. [86] J. Hernandez, P. Paredes, A. Roseway, and M. Czerwinski, “Under pressure: sensing stress of computer users,” in Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014, pp. 51–60. [87] C. Epp, M. Lippold, and R. L. Mandryk, “Identifying emotional states using keystroke dynamics,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2011, pp. 715–724. [88] I. Karunaratne, A. S. Atukorale, and H. Perera, “Surveillance of humancomputer interactions: A way forward to detection of users’ psychological distress,” in Humanities, Science and Engineering (CHUSER), 2011 IEEE Colloquium on. IEEE, 2011, pp. 491–496. [89] P. Khanna and M. Sasikumar, “Recognising emotions from keyboard stroke pattern,” International Journal of Computer Applications, vol. 11, no. 9, pp. 8975–8887, 2010. [90] F. Monrose and A. D. Rubin, “Keystroke dynamics as a biometric for authentication,” Future Generation computer systems, vol. 16, no. 4, pp. 351–359, 2000. [91] L. M. Vizer, L. Zhou, and A. Sears, “Automated stress detection using keystroke and linguistic features: An exploratory study,” International Journal of Human-Computer Studies, vol. 67, no. 10, pp. 870–886, 2009. [92] E. Niedermeyer and F. L. da Silva, Electroencephalography: basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins, 2005. [93] O. AlZoubi, R. A. Calvo, and R. H. Stevens, “Classification of eeg for affect recognition: an adaptive approach,” in AI 2009: Advances in Artificial Intelligence. Springer, 2009, pp. 52–61. [94] A. Heraz and C. Frasson, “Predicting the three major dimensions of thelearner-s emotions from brainwaves.” [95] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, “A survey of signal processing algorithms in brain–computer interfaces based on electrical brain signals,” Journal of Neural engineering, vol. 4, no. 2, p. R32, 2007. [96] F. Lotte, M. Congedo, A. L´ecuyer, F. Lamarche, B. Arnaldi et al., “A review of classification algorithms for eeg-based brain–computer interfaces,” Journal of neural engineering, vol. 4, 2007. [97] C. O. Alm, D. Roth, and R. Sproat, “Emotions from text: machine learning for text-based emotion prediction,” in Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005, pp. 579–586. [98] T. Danisman and A. Alpkocak, “Feeler: Emotion classification of text using vector space model,” in AISB 2008 Convention Communication, Interaction and Social Intelligence, vol. 1, 2008, p. 53. [99] C. Strapparava and R. Mihalcea, “Learning to identify emotions in text,” in Proceedings of the 2008 ACM symposium on Applied computing. ACM, 2008, pp. 1556–1560. [100] C. Ma, H. Prendinger, and M. Ishizuka, “A chat system based on emotion estimation from text and embodied conversational messengers,” in Entertainment Computing-ICEC 2005. Springer, 2005, pp. 535–538. [101] P. H. Dietz, B. Eidelson, J. Westhues, and S. Bathiche, “A practical pressure sensitive computer keyboard,” in Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 2009, pp. 55–58. [102] I. J. Roseman, M. S. Spindel, and P. E. Jose, “Appraisals of emotioneliciting events: Testing a theory of discrete emotions.” Journal of Personality and Social Psychology, vol. 59, no. 5, p. 899, 1990. [103] I. J. Roseman, “Cognitive determinants of emotion: A structural theory.” Review of Personality & Social Psychology, 1984. [104] “Smartphone,” http://en.wikipedia.org/wiki/Smartphone. [105] “Natural usage of smartphone,” http: //www.marketingcharts.com/wp/online/ 8-in-10-smart-device-owners-use-them-all-the-time-on-vacation-28979/. [106] K. Dolui, S. Mukherjee, and S. K. Datta, “Smart device sensing architectures and applications,” in Computer Science and Engineering Conference (ICSEC), 2013 International. IEEE, 2013, pp. 91–96. [107] M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in noncontact, multiparameter physiological measurements using a webcam,” Biomedical Engineering, IEEE Transactions on, vol. 58, no. 1, pp. 7– 11, 2011. [108] J. Allen, “Photoplethysmography and its application in clinical physiological measurement,” Physiological measurement, vol. 28, no. 3, p. R1, 2007. [109] P. Zimmermann, S. Guttormsen, B. Danuser, and P. Gomez, “Affective computing–a rationale for measuring mood with mouse and keyboard,” International journal of occupational safety and ergonomics, vol. 9, no. 4, pp. 539–551, 2003. [110] N. Villar, S. Izadi, D. Rosenfeld, H. Benko, J. Helmes, J. Westhues, S. Hodges, E. Ofek, A. Butler, X. Cao et al., “Mouse 2.0: multi-touch meets the mouse,” in Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 2009, pp. 33–42. [111] Y. Gao, N. Bianchi-Berthouze, and H. Meng, “What does touch tell us about emotions in touchscreen-based gameplay?” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 19, no. 4, p. 31, 2012. [112] “Transforming the impossible to the natural,” https://www.youtube. com/watch?v=3jmmIc9GvdY. [113] H.-J. Kim and Y. S. Choi, “Exploring emotional preference for smartphone applications,” in Consumer Communications and Networking Conference (CCNC), 2012 IEEE. IEEE, 2012, pp. 245–249. [114] G. Tan, H. Jiang, S. Zhang, Z. Yin, and A.-M. Kermarrec, “Connectivity-based and anchor-free localization in large-scale 2d/3d sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 10, no. 1, p. 6, 2013. [115] M. J. Hertenstein, R. Holmes, M. McCullough, and D. Keltner, “The communication of emotion via touch.” Emotion, vol. 9, no. 4, p. 566, 2009. [116] G. Chittaranjan, J. Blom, and D. Gatica-Perez, “Who’s who with bigfive: Analyzing and classifying personality traits with smartphones,” in Wearable Computers (ISWC), 2011 15th Annual International Symposium on. IEEE, 2011, pp. 29–36. [117] J. M. George, “State or trait: Effects of positive mood on prosocial behaviors at work.” Journal of Applied Psychology, vol. 76, no. 2, p. 299, 1991. [118] N. H. Frijda, “The place of appraisal in emotion,” Cognition & Emotion, vol. 7, no. 3-4, pp. 357–387, 1993. [119] P. E. Ekman and R. J. Davidson, The nature of emotion: Fundamental questions. Oxford University Press, 1994. [120] A. Jain and D. Zongker, “Feature selection: Evaluation, application, and small sample performance,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, no. 2, pp. 153–158, 1997. [121] T. Bentley, L. Johnston, and K. von Baggo, “Evaluation using cuedrecall debrief to elicit information about a user’s affective experiences,” in Proceedings of the 17th Australia conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future. Computer-Human Interaction Special Interest Group (CHISIG) of Australia, 2005, pp. 1–10. [122] H. Lee, Y. S. Choi, S. Lee, and I. Park, “Towards unobtrusive emotion recognition for affective social communication,” in Consumer Communications and Networking Conference (CCNC), 2012 IEEE. IEEE, 2012, pp. 260–264. [123] P. Ekman, “Universals and cultural differences in facial expressions of emotion.” in Nebraska symposium on motivation. University of Nebraska Press, 1971. [124] “Weka 3,” http://www.cs.waikato.ac.nz/ml/weka/. [125] R. Tato, R. Santos, R. Kompe, and J. M. Pardo, “Emotional space improves emotion recognition.” in INTERSPEECH, 2002. [126] M. K. Petersen, C. Stahlhut, A. Stopczynski, J. E. Larsen, and L. K. Hansen, “Smartphones get emotional: mind reading images and reconstructing the neural sources,” in Affective Computing and Intelligent Interaction. Springer, 2011, pp. 578–587. [127] E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, “Emotional speech: Towards a new generation of databases,” Speech communication, vol. 40, no. 1, pp. 33–60, 2003. [128] A. J. O’Toole, J. Harms, S. L. Snow, D. R. Hurst, M. R. Pappas, J. H. Ayyad, and H. Abdi, “A video database of moving faces and people,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 5, pp. 812–816, 2005. [129] N. Sebe, M. S. Lew, Y. Sun, I. Cohen, T. Gevers, and T. S. Huang, “Authentic facial expression analysis,” Image and Vision Computing, vol. 25, no. 12, pp. 1856–1863, 2007. [130] R. LiKamWa, Y. Liu, N. D. Lane, and L. Zhong, “Can your smartphone infer your mood?” in Proc. PhoneSense Workshop, 2011. 18 [131] T. H¨ollerer and S. Feiner, “Mobile augmented reality,” Telegeoinformatics: Location-Based Computing and Services. Taylor and Francis Books Ltd., London, UK, vol. 21, 2004. [132] Z. Huang, P. Hui, C. Peylo, and D. Chatzopoulos, “Mobile augmented reality survey: a bottom-up approach.” CoRR, vol. abs/1309.4413, 2013. [Online]. Available: http://dblp.uni-trier.de/db/ journals/corr/corr1309.html#HuangHPC13
© Copyright 2024