Metallurgical and Materials Engineering Research paper
(^1) Jeevani Singireddy, 2 Botlagunta Preethish Nandan, (^3) Phanish Lakkarasu, (^4) Venkata Narasareddy Annapareddy,^5 Jai Kiran Reddy Burugulla (^1) Sr. Software Engineer, Intuit Inc, jeevanisingreddy@gmail.com (^2) SAP Delivery Analytics, ASML, preethishnananbotlagunta@gmail.com (^3) Senior Site Reliability Engineer, Qualys, phanishlakarasu@gmail.com (^4) Sr Enterprise Application Developer, venkata.narsreddy@gmail.com (^5) Senior Engineer, jaikirrann@gmail.com Abstract: As artificial agents develop beyond mere tools and begin to perform roles traditionally associated with humans, expectations of their performance are equally evolving. Not only must agents be able to accomplish their tasks; but they must also be able to do so in a manner that observers would consider socially or contextually appropriate. For social interaction where the agent and human are co-performers, adherence to social cues that signal emergent aspects of a relationship such as intimacy or status is paramount to the experience of the interacting humans. For autonomous agents who function alone, adaptive behavioral modeling and user state awareness are critical to the impact of the agent’s actions on humans. Such contextual social behavior is a requirement for complex applications including physically located social robots, virtual avatars emerging in gaming, online social environments, or customer service interactions, and proactive virtual assistants. Humans have sophisticated socio-emotional capacities that enable them to behaviorally coordinate their interactions with others, inferring mental states that may lie far beyond explicit observable cues. Furthermore, emotional expressions are multimodal and are the result of a complex interaction between inherent affective states and contextual interaction. The Human Centered Intelligent Systems conceptual framework describes a pathway whereby artificial agents may also achieve aspects of this intelligence through rich user state modeling based on deep multimodal analysis of big data that can capture the social behavior and interaction context. In this chapter, we describe this "user-state" modeling approach and exemplify its applicability to a spectrum of agent applications. Keywords: Human-Centered Intelligent Systems, User-State Modeling, Socially Adaptive Agents, Socio-Emotional Intelligence, Multimodal Interaction Analysis, Artificial Social Agents, Autonomous Agent Behavior, Context-Aware Intelligence, Human-Agent Co-Performance, Virtual Avatars, Social Robots, Emotional Expression Recognition, Interaction Context Awareness, Affective Computing, Deep Behavioral Modeling, Adaptive Virtual Assistants, Online Social Environments, Gaming AI, Customer Service Bots, Mental State Inference.
The advent of the Fourth Industrial Revolution has ushered in an era of unprecedented investment in and reliance on intelligent automated systems. Artificial agents are omnipresent in everyday life, from
600 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
household appliances featuring real-time processing and control of autonomous functions to interacting service robots employed in hostile environments, smart software agents affecting investment patterns in volatile financial markets to cognitive computing systems in fields as diverse as commerce, healthcare, governance, and education. These machines, however, lack the power of Emotional Intelligence (EI), able to process cognitive, affective, and motivational signals in human interactions to interpret sentiments and empathy, and to provide emotionally-informed assessment for modeling social behaviors and intentions. As a consequence, while they have achieved the capacity for high-intelligence functions, problems remain in not being able to seamlessly interact with human beings in a socially and emotionally sensitive manner. These ambiguities might lead to poor relationships with human partners and might prevent artificial agents from achieving the hypersync level of performance with escort functions currently mostly available only in a human-to-human context, which correlates with the quality of interpersonal empathy. Addressing the need to bridge this gap, we formulate and present a computational model for EI in artificial agents, extending the concept of Deep Multimodal Big Data (DMB2D) beyond just processing and using it for the contextual understanding of what is going on in life.
Fig 1: Prediction of Emotional Empathy in Intelligent Agents
The objective of this chapter is to provide a context and operationalization of important terms, the issues at stake, the motivations behind the undertaking, as well as a roadmap of how the chapter unfolds. We dive deeper into the operative implications by articulating a specific focus on the broader context previously described. More specifically, Section 2 provides an overview of the study’s purpose and scope, and details the content of Section 3. Section 3 defines DMB2D and reviews related topics in big data, particularly its multimodal nature. Section 4 details an overview of previous systems’ limitations and develops the theoretical, technological, and practical implications around the construction of a DMB2D-enabled EI-aware architecture for deep understanding-contextualization of human-to-human interactions of social and business importance. Section 5 reviews and opens up research challenges based on the limitations of current solutions, while Section 6 provides practical conclusions. 1.1. Overview of the Study's Purpose and Scope Many professions and everyday life call for the recognition and appropriate processing of emotional and socially relevant signals. Human beings possess the unique ability to communicate through facial expressions, bodily posture, gestures, voice tone, social context, and cultural artifacts. These nonverbal behavior channels, produced through conscious and/or unconscious means, communicate inscribed declarative meanings in a global context, infer interpersonal relations, and facilitate spontaneous socially desirable learning processes, such as in teaching or caregiving settings with children and elderly, as well as information processing from a multitude of artifacts. However, the performance of these functions is undoubtedly much better in human-to-human worlds than in human-to-device worlds. Existing devices are not yet emotionally intelligent enough to reliably model behaviors, infer mutual expectations, respond accordingly, and consequently assist in teaching chores, or helper roles in tutoring/consoling/entertaining/monopolizing situations. On the one hand, there is a viable necessity for devices to become more emotionally intelligent. On the other hand, it is a major challenge for investigators in robotics, social signal processing, and nonverbal communication, to improve devices' capacities in these essential supportive complementary functions. How to accomplish this goal in a scientific, realistic, and affordable way is the task of the research developed in this essay. This is a multi-disciplinary undertaking, in the sense that devices cannot expect to become more emotionally intelligent without a significant exponential effort in the gathering of evidence from parallel fields. Additional requirements for success are the proper selection of modalities, as well as contextualization of the social scenario considered and prior expectations of the participants' roles. Thus,
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 601
with a deep metaphorical approach conveyed by a duct of multi-modal big data coming from wide-data generation and exploration, this resource effort may eventually lead us to the development of artificial agents endowed with emotional intelligence capabilities.
Equation 1: Multimodal Emotion Inference Function (MEIF):
2.1. Definition and Components Emotional intelligence (EI), or emotional quotient (EQ) as commonly denoted, can be defined as how well we integrate feelings and thoughts, how well we relate gratefully to our motivation and that of others, and how well we deal with other people's emotions and manage their social interactions. The term emotional intelligence was first introduced by American clinical psychologists in 1990. It was commonly popularized by a journalist in 1995 and subsequently promoted through bestselling books. Since then, the definition of EI, and consequently its components, has earned significant attention, allowing us to clarify the concept better but making it less homogeneous in the different formulations of the theorists. Although early work attempted to portray EI as a type of intelligence, close to the cognitive school conceptualization, a growing body of psychological literature has since identified it as a personality trait. This shift opened the possibility of a multidimensional and associated with several relevant evolving social and psychological capacities, rather than a unit measurement. The consequent particularization proposed several facets or dimensions, which are measured and quantified differently throughout the academic panorama. Both lines follow several similar lines of investigation. The cognitive line, based on a trial model, is led by the Perceived, Ability, and Mixed modalities, capable of focusing on academic performance, cognitive tasks, and planning. On the other hand, the Mixed line motivates the performance in the workplace from external issues such as stress and mental health. All this discussion
602 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
has resulted in the creation of several popular commercial and academic psychometric tests, allowing empirical validation in the area of study. 2.2. Importance in Human Interaction Today, people communicate not only through words. In most exchanges, non-verbal behavior is vital to fully understanding messages and meanings. Non-verbal communication – which encompasses explicit and implicit body language, facial expression, gestures, and para-verbal channels including posture, eye gaze, proximity, touch, and voice quality – supports or undermines verbal communication as it conveys important cues. Non-verbal signals can help signal the core intentions of a conversation: whether it is a signal of urgency or a call for support, whether it is a moment of learning or a moment of teaching, we will depend on the non-verbal cues provided in the interaction. Receiving and expressing the appropriate non-verbal signals is as important as words. Non-verbal communication is also crucially dependent on the context of the interaction. The context not only defines the interaction itself, the role we have defined within it, and those who join the interaction but also what triggers it. It is important to establish and maintain that channel of implicit communication that goes along with the verbal message. Most of the messages exchanged during a conversation are expressed and received as implicit signals of emotion rather than as factual content – words used with a patient while performing surgery versus those used at a friendly and polished press conference, for instance. Understanding and expressing emotional intent unless implied or overtly expressed through words is needed not only in a conversation but also when observing a person’s behavior when interacting with others in their environment. Preemptively guessing another person’s emotional output to some extent even before the interaction has started – providing an understanding of social roles, interactions, and interaction contexts – can also allow us to anticipate the interaction to some point. It can be argued that completely understanding and appropriately reacting to a not-so-simple set of human emotions should be a must-have for any AI agent involved in human-computer interaction. Sociability in human-agent interaction, however, is not limited to its main interaction. Embedding humans richly and accurately into a given context of our daily lives will allow more efficient integration of intelligent technology into those lives. Making intelligent daily-use technology aware of the relevant information for the user, pushing decision-making or scheduling tasks allowing for better interactions and contact with users will open up many levels for both users and agents.
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 603
deficits are currently challenging full semantic and affective understanding capabilities as well as intelligent adaptive-agent learning ability.
Fig 2: Artificial Agents and Emotional Intelligence 3.1. Current State of AI Emotional Intelligence The study in the field of Artificial Intelligence (AI) – which aims to build machines that can display intelligent human-like behavior – has experienced unprecedented interest and growth in recent years. Nowadays, many tools capable of solving complex decision problems are available. General problems are currently not dealt with in the proposal, specializing in limited issues, breaking the so-called “sora. The construction of newborn machines has not been limited to the reinforcement of pathways. A sector that has progressed rapidly is the imitation of human intelligence. Machines have been built capable of imitating tasks that previously only humans... Of course, in a more limited way... More recently, ultra- deep neural networks allied to large data banks for training have conquered more difficult – and usually manual – tasks such as facial or speech recognition. With these improvements, it is believed that shortly machines will be able to imitate humans in a generalized way. But these achievements represent the construction of what is. Technical capacity has made machines become more and more experts at doing less and less. With all this specializing movement, it is believed that Artificial Intelligence will be responsible for synergistic tools. Our perspective is that the machines will be responsible for augmenting human action, interspersed in the conduction of tasks that require logical-rational action, but augmented with tools, and algorithms, capable of being much more competent in the solution of these tasks than we humans. In this context, the current study aims to propose and review algorithms that could allow artificial agents to display behaviors related to emotional competence, as a kind of essential tool in this synergistic process. In so doing, this chapter presents a brief review of the concept of Emotional Intelligence, what characterizes it, and what functions it fulfills in interpersonal relationships, which allow us to create the basis of our proposition. Subsequently, we examine, in the context of neurological and computational models, the main pillars of the development of this emotional signage displayed by human beings: the perception and recognition of non-verbal signals associated with emotional messaging; the how and the what of the internal processes that trigger the action; and at the end, the process of display of non-verbal behavioral characteristics and their role in the communicative function of emotional indicators. In the last part of the chapter, we present current neural models and the preliminary results and obstacles to be faced in the search for implementation and post-processing systems. 3.2. Challenges in Implementation Throughout history, people have attempted to create intelligent artifacts, and nowadays, many researchers are dedicated to creating AI agents able to interact socially with humans through smartly designed conversational behaviors. The application of conversational artificial intelligence to robots, virtual avatars, automatic classifiers of user emotion or sentiment, and even on our mobile devices, is a technological reality. However, most works that address the implementation of conversational AI use very simple and shallow behavioral models. Despite recent works showing great progress, research in this domain is still in its infancy. This chapter aims to introduce the concept of emotional intelligence applied to the interaction with conversational AI, and the implementation of artificial agents implemented with deep multimodal big data that learn online and adapt to the user during the interaction.
604 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
A crucial issue is that developing agents able to emulate the apparent social and emotional human intelligence is a very difficult task. Truly vibrant conversational artificial agents are still scarce. The challenges for creating believable intelligent agents lie in understanding the meaning that the user wants to convey, emotions and latent states, and how to respond to the user in a congruent way. We have to take into consideration that humans can judge the emotional state of the other person based on a set of nonverbal cues: recognition of user gestures, postures, facial expressions, body movements, head or body orientation, eye gaze, and biological signals, and even verbal cues like the content of speech, linguistic style, and modulation. There are still many issues related to affect and emotional state recognition, such as the problem of recognizing with high accuracy the affective state of the user or the group in a multimodal way, through speech, gaze, facial or body movements, detecting and classifying individual user and group attributes, and interpretation of conflicting information from different modalities.
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 605
and the issue of rapport. These interactive emotional signals are usually provided in a multimodal form, in the vocal channel and, when available, in the visual channel. In addition to emotional expression, the development of empathetic artificial agents, as well as socially engaged ones, requires an extensive repertoire of behaviours, appropriate for varying situations, applications, and users. Such agents also need to correspondingly adjust their communicative behaviours, by modeling the users’ individual traits, and their current emotional state, type of intent, need for autonomy, for social connection, etc. Such functions may be described as contextual adaptive multimodal behavior modeling. 4.2. Sources of Multimodal Data The first part of our definition of deep multimodal big data suggests that an agent must be exposed to the real world under diverse and challenging contexts for the accumulated data to realize the goals of its design. High-level emotional, cognitive, and social functions require the agent to face complex multimodal data at least during its learning phase, similar to how human babies grow. In practice, the formalization of rich stimulus-response relations within naturalistic stakeholder-agent interaction exploits the efficient decomposition of essential primary sensory systems. This data integration can harness either direct social signal transfer toward the information-seeking agent or social stimulus elicitation, in which social behavioral responses from other participants in the setting are modulated by active interaction cues from the agent. Traditional databases, such as conventional motion capture systems for behavioral information, still rely on pre-specified and controllable scenarios. Only a few data acquisition techniques can generate multimodal datasets on self-initiated social behavior, and the collected data is limited to small samples. Webcam videos represent spontaneous social responses, but action detection and synchronization are cumbersome. These spontaneous data samples can be serendipitously found in personal video collections or within social media feeds. Social media activity has drastically grown in the past two decades. Individuals share personal reactions with their family and friends in a continuous loop. External observers of these interactions can infer genuine reactions even if they are not participants in their initial stimulus-response interaction. Indistinguishable reciprocity cues together with the large amount of available multimodal social data are at the core of social signal processing, which aims to develop technology capable of processing and analyzing the original implicit message of these stimuli and responses. 4.3. Data Processing Techniques Deep multimodal big data originating from various sources must first be processed to produce clear input suitable for multimodal fusion. Each modality may require different pre- and post-processing. For instance, visual data processing may require face, hand, and body recognition and detection, object recognition, pose estimation, and tracking in the case of video data. Videos need synchronization with other modalities to ensure they reflect the same time and context. Research has found facial expression data preprocessing beneficial for emotion recognition using deep learning. Facial landmarks also need a sustained interval of reliable expression data to be useful in affect recognition. Eye-centered CNN models have been found effective for emotion recognition. Audio data may require segmentation for paralinguistic and emotion-related content and filtering for acoustic data and features. For EEG signals, usually, pre-processing steps such as band-pass filtering, artifact removal, dimensionality reduction, and normalization are employed. Long input sequences are often reduced using sliding windows. Features extracted or created from the raw EEG data usually include the spatial, temporal, or spectral ones. Machine learning classifiers are usually built in a supervised, semi- supervised, or unsupervised manner. Few-shot learning on EEG data is a recently proposed technique for training CNNs after a few subjects are used for training. Text input may require noise-cleaning using lowercasing, stop word removal, and stemming. Text preprocessing can help to improve results for sentiment classifiers. For graph data, there may be data cleaning and filtering required. This multimodal big data is then fused to increase the accuracy and enhance performance for sentiment and emotion analysis and decision support for different use cases.
606 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
Fig 3: Machine learning for cognitive behavioral analysis
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 607
closely related in form to the contextual modeling of social behavior. Syntactically, at its core, Role Labeling models low-level, atomic components of the speech flow. Equation 2: Contextual Social Response Model (CSRM):
5.2. Role of Context in Emotional Intelligence The ability of emotionally intelligent humans to accurately read the mental states of others in a social context such as who is present, what they are doing, and why they are doing it is often seen as a factor that helps support the validity of the social signal decoding. This not only provides an additional inflow of information to the human perception of the others’ cues but is also important for theorizing inference and cognitive model development as to why the others are presenting what is being observed. For example, whereas the same degree of attention could indicate attraction, a person paying little attention could have a completely different reason for that behavior such as admiration from a distance and, hence, there are no clear-cut causal links in many cases. Moreover, the emotional state of an emotional agent is largely dependent on his or her environment. This involves consideration of external stimuli such as objects in the environment as well as the presence and action of other agents. A cognitive model of an emotional agent would try to infer the reason for his or her behavior by the context, which consists of these environmental aspects. Given the importance of social context representation in social signal processing, it is surprising why there has been little to no mention of context in outlining theoretical models of emotional intelligence. The great majority of the work on modeling specific social signals for emotional intelligence such as belief-desire-influence that other people have cognitions and intentions; person perception for detection of the social category of an agent; social gaze; emotional expression; and mood, prosody, and speech content recognition as well as fusion is done in an isolated stimulus-response manner with little respect to knowledge of social context. This is in stark contrast to the role of context in human ability and behavior, which is largely facilitated by emotional intelligence. The salient support to this is the presence of the literature that exists associated with role, motivational, and social relations; and distance and temporal context in normal-human-human interaction. At the same time, we must not forget that the observed qualities of emotionally intelligent humans and the concurrent issues of cue validity and multimodal fusion are the several associations in attempts to submodel and hypothesize the definition of emotional content regarding another context; and the decoding of additional-latent context variables concurrently with emotional content, help support the further development of human emotional intelligence.
Adaptive behavioral modeling is important for the development of intelligent systems capable of higher cognitive functions across many domains involving multilayered human-system interaction. For proper recognition according to the emotional and situational context, the agent should adapt to different users in different scenarios. In this chapter, we will elaborate on the principles and methods of adaptive behavioral modeling.
608 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
from internal and external factors or systems of association. Individuals with adaptive difficulties find it difficult to develop their independence and live an independent life.
6.1. Concepts of Adaptive Behavior Adaptive behaviors are often observed in social beings interacting in a variety of contexts. The common concept relies on the property that the same agent may not adopt the same response in equal conditions. The simplest version is the one by which an agent develops relative preferences for different reward channels and policies to maximize its future utilities. A degree of randomness, exploratory behaviors, and, generally, non-stationarity are also detected in nature, typically in the initial phase of training or exposure to knowledge. This promotes learning, by enhancing the variability of performances. Some of the mechanisms underlying adaptation and learning rely on evolutionary principles, especially in robotics. In all the identified definitions, adaptation is closely related to experiential learning or a form of social learning. Putting together the notions of behavioral adaptation, the first important feature is that adaptation implies a dynamical modification of the behavioral patterns adopted by the agent during interaction. Such changes may be abrupt or very limited in extent, both in time and factor space. The model for responding to the adaptation time course and for mapping the input, i.e. the stimulus perceived by the agent or some of its internal indicators, into the variation of the behavioral characteristics are perhaps the main open issues in adaptive behavior. Adaptative behaviors allow one to use time as a dimension for analysis, prediction, and modeling in complex tasks, especially if performed by computational agents.
Fig 4: Adaptive Learning in AI Agents 6.2. Models of Behavioral Adaptation The concept of an emotional agent that is capable of simulating human emotions and that could reuse the behavioral architecture of an intelligent agent was proposed a long time ago but recently it has received increasing interest in artificial intelligence, robotics, computer graphics, sociology, and psychology communities. Different real-time behavioral and perception architectures have been previously described to allow autonomous virtual character behaviors within a specific emotional model. We attempt to associate an influence function to the existing intelligent agents’ models, with all of them utilizing psychological and social influence principles that are long known in the psychology research community. Such an emotional influence function will allow aspects of behavioral adaptation for social interactions in which an agent could join or leave certain social groups but also change their behavior. During the analysis of what is at stake with the behavioral modeling of an artifact, we will attempt to evaluate to what extent we can or should reproduce the behavioral attributes of humans endowed with emotional states. The complexity and dynamics of the real world are sometimes brutally and uncontrollably obvious. Artifacts whose appearance can elicit some kind of human emotion and whose functionality can attempt to partially resemble human behavior and social interaction are frequently encountered. We can endlessly feel the difference between an animated toy and a robotic humanoid in the company of disabled children, the elderly, and even marketers. How should ethical models of social
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 609
behavior adaptation of artificial agents be applied to artifacts that can be part of our society or social networks and how useful or acceptable would such social and individual user behaviors be?
Deep learning technologies represent a new paradigm in algorithmic development for artificial intelligence because they can tackle extremely difficult problems such as classifying images or recognizing voice patterns without requiring the hand-coding of algorithms or feature extractors by expert developers. Deep neural networks learn features automatically given a set of raw data and their corresponding labels; they rely solely on the given labels to optimize the parameters for both the feature extractors and the classifier. Further, by using multiple layers of simple feature extractors, and optimizing both the feature extraction and classification in an end-to-end manner, deep neural networks can model the hierarchical layers of features present in complex data, such as images. In these networks, the input to the first layer of the neural network is the raw data, such as the pixel intensities of images or the spectral energy of audio signals. The first layer of the neural network learns a simple feature from the input data, such as the presence of a certain pattern of pixel intensities or a certain pattern of spectral energy, for that input. The output of this first layer of features is then treated as the input to a second layer of neurons, which learns a more complex feature, which could be a combination of patterns identified by the first layer of neurons. The output of the second layer, in turn, is the input to a third layer that learns an even more complex feature, and so on, until the final layer to which the last layer’s output is connected. 7.1. Neural Networks for Emotion Recognition The processes of information acquisition, processing, and storage in the human brain work in a parallel and associative manner. We can take advantage of these mechanisms by combining data from different modalities together to improve the quality and results of emotion, affect, and mood detection and analysis. Such massive multimodal data collections, enriched with social context and information, can now be performed through multiple systems and platforms that capture the collective human experience in real time. Various neural network structures that are responsible for the feedforward hierarchical representation of data are already being developed that can automatically discover and create complex feature representations in a relatively unsupervised manner. These can be successfully utilized for the detection and analysis of user emotions, engaging dynamic relationships, and the creation of complex user profiles containing mood and affect temporal trajectories. Because neural networks are inspired by the biological structure of the human brain, many people wrongly believe that they can only be used with large databases. Neural networks were also developed initially to mimic human cognitive functions. However, over the past decades, tremendous computing power has been assembled, thus allowing us to use very large neural networks with large weights and deep hierarchies to process large datasets. The most representative architectures of large neural networks are the Deep Neural Networks and Convolutional Neural Networks. However, it is important to mention that neural networks are not only useful for working with big datasets: there are already existing architectures that have been used successfully for datasets bigger than hundreds of examples. 7.2. Integration of Multimodal Inputs One of the main advantages of deep learning approaches to multimodal emotional analysis is the possibility of deriving learning representations for all of the involved modalities to perform their integration efficiently. For example, the analysis of speech prosody and facial movement has been mostly driven by models where multimodal integration happens relatively late in the analysis pipeline. In particular, low-level features, typically extracted from the signals of the two modalities, are modeled independently of each other before feeding them into a multimodal integration stage. When this is done using neural network models, the low-level feature extraction stages are usually based on deep linear projections, because neural networks can only be trained using a large amount of labeled data. The independent modeling of the auditory and visual modalities typically relies on hidden Markov models or Gaussian mixture models that are used to transduce the low-level features (over time) into an emotional decision (over time). The advantages of integrating multimodal emotional analysis, or at least some of the processing steps, in the same supervised learning framework, have been demonstrated in various works. Deep learning models represent a good environment for this task, and their architecture provides some natural ways for modal integration.
610 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
In particular, modalities can be integrated at different levels of abstraction, during training and/or testing, depending on the computational resources available and the desired trade-off between training speed, model efficiency, and resulting performance. In the most basic case, two separate deep networks can be used to learn the optimal representation associated with each signal modality. Such an approach can be considered a classical approach, where only the very top layer of the architecture is adequate for learning the latent representations associated with the inputs from the two modalities. Different monomodal models are fed from the very beginning with suboptimal low-level features, but the top layer can learn deep representations that are not constrained to be modular.
Fig 5: Top Multimodal AI Use Cases
Systems and tools for contextually aware social interaction and culturally and emotionally adaptive user modeling, capable of improving user experience in all application domains, are ubiquitous, and their demand is steadily increasing. Although most applications of interactional AI are in the realm of user interface design, emotionally intelligent agents can also be found in other application domains, such as healthcare, customer service, education, gaming and entertainment, etc., and can also fulfill various functions like therapeutic kit, sociodemographic pseudoperson, emotional advisor, etc. Some notable applications of virtual humans, animated characters, and social or virtual agents equipped with contextual emotional intelligence are described below in the context of the application domains. Healthcare and Therapy There is a lot of ongoing work on using artificially intelligent avatars and chatbots for emotional and cognitive assessment and therapy. The need for such systems has grown exponentially in the past two years as many populations, particularly the youth and elderly populations, have been devastated by the crisis. Research shows that people are more willing to share personal information like disaffected feelings and mental trauma with chatbots rather than real-life therapists. The use of emotionally intelligent conversational agents for dealing with mental health issues might be the only alternative for many patients, particularly in countries with a dearth of mental health professionals. Virtual humans and conversational agents are currently being applied for the diagnosis and treatment of anxiety and depression issues among children. The therapy is conducted in a friendly and supportive setting which enhances engagement, fidelity to task, and confidence. Over 60% of the children assigned to the agent- receiving therapy preferred the solution delivered via virtual humans. 8.1. Healthcare and Therapy The functions and implications of behavior and its modulation by emotionally/socially intelligent agents may be a key factor in the development of systems for increasingly realistic and human-like face-to-face interaction. In these future applications, the virtual agents may be used as independent “virtual therapists” or as “social therapeutic tools” for the modulation of the behavior of human patients, such as children and elderly with various developmental conditions, phobias, autism, and Alzheimer disease, both in the therapeutic process and during rehabilitation. We can anticipate therapeutic protocols where the virtual agents perform modulation of the human patient’s cognition, emotion, and behavior by communicating naturally through dialogues that take advantage of the flexible multimodal
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 611
communicative potentials provided by speech, gesturing, mimicry, facial expression, and body motion. These systems can offer particularly innovative solutions because of the use of biofeedback devices, able to monitor emotive and cognitive conditions and states, in real time, that guide the dyadic interaction modalities and the influence of the agent on the human patient. Biofeedback technologies and their use within the interaction with emotionally/socially intelligent agents can be used to match states of arousal or to provide modulation of “asocial” behaviors by inducing social interaction behavioral inhibition. The first of these functions can support the communication processes within an intersubjective dialogical context, while the second function can assist during therapeutic protocols, applying punishment to maladaptive behaviors and providing reinforcement for adaptive behaviors through real-time closed-loop biofeedback. These systems can therefore realize modeling of the socio- emotional behavior of the human participating in an interactive process with the intelligent agent, and modulate the behavior of the patient engaged in therapy. 8.2. Customer Service and Support Chatbots or virtual agents have become increasingly important in providing customer service and support. Typically deployed on websites, chatbots can handle inquiries that do not require direct contact with a human agent, e.g. providing information about business hours, prices, or product availability. However, this benefit of not requiring human resources is offset by the limited interaction quality of such agents. Emotionally intelligent agents are capable of more complex interactions by processing multimodal signals from the user and responding both verbally and non-verbally in a more empathetic and engaging way, which significantly increases user satisfaction. Emotionally intelligent agents can also provide a more personal service as they are capable of remembering users’ preferences. More and more companies are integrating what are referred to as intelligent virtual agents in their customer care processes, which is to be expected, as intelligent virtual agents can occupy a relevant share of the customer service budget. Thus, emotionally intelligent agents can assist in customer service in different forms such as embodied or video chatting avatars, 2D or 3D chatbots, and lifelike virtual assistants. Intelligent virtual agents can offer different services, such as administration, answering frequently asked questions, and automating services. Over the past years, a growing number of companies have adopted equal services, offering personalized customer assistance on their respective websites. Such embodied agents process the customer’s inquiries by speech recognition and natural language understanding, consult a backend with all the information needed to answer the questions, store the history of the interaction with the customer, and then produce, at least in a spoken form, the necessary response, synchronizing it with emotional facial expressions and with the appropriate emotional state of the virtual character. When such an agent is so smart that it is not easily distinguishable from a real person, it will have no problem providing rich healthcare interactions for a variety of uses. 8.3. Education and Learning Environments Although concerns about moral and ethical issues arising from the development of socially intelligent machines are certainly legitimate, and even more so when it comes to attempting to design machines that exhibit emotional intelligence through the modeling of behavior and expression of emotions, the sub-domain of building such machines to assist children in educational contexts has certainly opened a stimulating and fruitful line of research. Nowadays, there exists a significant mass of empirical evidence that suggests that non-verbal cues play a critical role in conveying a wide variety of social messages during the interaction between learners and educational agents, and more importantly, that empathetic socially intelligent agents can produce positive effects on students’ cognitive, emotional, and social development as well as on the effectiveness of the learning task. For example, it has been shown that the presence of an empathetic virtual character in conversational learning has a very positive influence on students’ learning experience and reduces their frustration. More in general, the interaction with emotionally aware agents has been seen to foster motivation, learning outcomes, engagement, and retention and enhance the learning of students with learning difficulties, such as autistic children, and of disruptive pupils – pupils tend to misbehave during instructions, who, according to the teacher, need more emotional and social support to stay focused on the requirements of the learning tasks. Furthermore, emotion-aware educational agents can also complement traditional assessment strategies by providing a continuous assessment of the cognitive needs and motivation of the students, accommodating thus the individual learning styles of students, to further optimize the learning experience.
612 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
The primary goal of this research is to explore novel technology-assisted behavior modeling paradigms that can equip AAs with social intelligence to facilitate a naturalistic interaction and more naturalistic human-AI relationships in multiparty interactions over contextually relevant and appropriate inter- decade interaction management strategies. Specifically, by proposing deep learning-based methods that can automatically leverage a multidisciplinary multimodal big data pipeline, we provide crucial response prediction solutions to the programming of behavior modeling of AAs. In this direction, we focus on the interactive peer social behavior modeling capabilities of AAs to ensure naturalistic partnerships encouraged by context awareness. Such a research agenda could help answer what the new generations of AAs could achieve to break seconds into autonomous social agents in deep interactions with accelerated human perception, experience, and behavior, and how these newly achieved cognitive and behavioral capabilities could assist us with everyday challenges to unveil further potentially transformative applications in various aspects of everyday and interpersonal life. Advancements in AI Technology The rapid advancement in AI technology and its ubiquitous deployment in an ever-expanding scope of applications and user bases have dramatically transformed our everyday, interpersonal, and organizational lives. The ubiquitous deployment of AAs has intertwined with human existence to a great extent. The user experience provided is deeply dependent on the proper behavioral design of the device or application produced. However, the way the currently deployed AAs could respond during interactions hinders the full potential of revolutionized automation provided, particularly in achieving emotionally, socially, and behaviorally rewarding human-AI relationships. The social intelligence exhibited by the behavioral capacity of existing AAs is nascent, and highly dependent on handcrafted architectures and expert knowledge in the specific domains applied, making it tedious and difficult to iterate in the design process. The response strategy selection process is also very cumbersome, particularly in adapting to the contextual relevance and appropriateness required in actual interactions.
Equation 3: Adaptive Empathy Adjustment (AEA): 9.1. Advancements in AI Technology Kindly note that citation information for the work, the abstract, main topic, and keywords of the work are also provided at the end to help you understand and enhance the current section text. Abstract. There has been a sustained interest in the application of Artificial Intelligence (AI) in multiple interdisciplinary domains and different services provided by almost every society. However, a recent technological prodigy in Deep Learning computing capabilities, the emergence of Deep Multimodal Big Data – that is, large amounts of complex, heterogeneous data and information involving Deep Learning and Deep Neural Networks architectures, as well as breakthroughs in Deep Transfer Learning methods based on Deep Multimodal Big Data, are enabling a much higher level of performance in different AI applications and services. In this contribution, we discuss the evolution of AI over time, the actual impact of these recent advancements in AI technology, and also propose a set of more open and ambitious future research challenges towards more robust, adaptable, and reliable AI technologies. These AI next-generation principles would be able to address the multiple types of increasing complexity we are finding, escalating toward a more general AI capability. These contributions would attempt to help to connect dots spanning from the different research domains – from involved AI computational and technological efforts to the application domain areas and use-case applications – towards real impactful, general “Intelligent Associates”. Keywords: Artificial Intelligence; Advancements; Deep Multimodal Big Data; Research Areas. Introduction
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 613
AI researchers have been developing AI capabilities for years now, and almost all those efforts have been focused on developing AI capabilities that could perform specific tasks with high levels of performance. Such Specific or Narrow AI - based on the Machine Studies Approach, incorporates a certain level of task performance proficiency based on expert interventions. Despite the incredible progress reported with these AI applications and services, radical conceptual, technological, and algorithm breakthroughs have been required to initiate the deep evolutions in technology that are reaching today’s AI services. In this section, we will review these previous AI domains and types of services and the radical technological, conceptual, performance, and capability shifts that we have reported during the last decades in AI research.
Fig 6: Being an Emotionally Intelligent Leader through the Nine-Layer Model of Emotional Intelligence 9.2. Potential Research Areas Social Emotions represent deep and complex aspects of human psychology related to the way people transpire and filter their interactions with social agents, based on varying levels of synesthetic, empathic, emotional, and affective perspectives. Social Agents stemming from Agent-based systems and Social agents in Virtual Environments have existed for some time. However, progress in Emotions and AI technologies have stimulated emerging research areas focusing on particular aspects of Artificial Agents' Emotional Intelligence. CAMB predicts affective changes based on social context. Personal Simulations investigate different means and techniques to simulate the Emotional Intelligence of Subjective Agents. Psychological Insights studies how Empathy can motivate Acts of Heroism in Precarious social environments. Story Understanding focuses on Emotional predictions based on agent beliefs and goals. Artificial Sociopathy examines the costs and benefits of Agents lacking Empathy. Humorous Agents explore the times, situations, and characters appropriate for Humor modeling. Frustrating Agents investigate how setting frustrating goals can make a game challenging. Credibility discusses violations of Reality principles to violate audience suspension of disbelief; Believable Agents model factors that are used to make an Artificial Agent seem credible and believable. As we push towards a more Multimodal AI Era, competencies inspired by Human Cognitive Psychology cannot and should not consider Artificial Agents in isolation. Real Cognitive Agents are surrounded and influenced by different, real, proximate, physical objects. This implication lays the foundation for developing AI Cyber-Physical-Social Systems, Systems in which Artificial Agents intelligently interact with human users and the world around them. These interactions must be Multimodal and Symmetric— users should have a level of Personalization that makes these interactions Complex and Believable; while Artificial Agents should have a Symmetric Cognition at the appropriate level Imitating the Competencies of human users and responding with the appropriate behavior. In this scenario, the primary research areas we believe will flourish in the coming years include Action Planning, Autonomous Robotic Movement, Social-Emotional Recognition and Responsive Interaction, Long- Term User Modeling, Theory of Mind modeling, Collaborative object use, Humor and Fun, Story and Knowledge Modeling, Learning from Experience and Knowledge Transfer, Non-Matching Social Behaviors.
614 Metall. Mater. Eng. Vol 31 ( 4 ) 202 5 p. 599 - 615
Artificial agents are typically modeled and developed as 'dumb' machines that serve when controlled by humans. We conclude that when heavily relying on models of human behavioral prediction due to lacking emotional intelligence in artificial agents, we could become much 'dumber' as humans as we are unable to achieve being autocratic and automatic only with our supposedly superior higher capabilities as intelligent social beings. More pertinent would be to apply our coaches' and mentors' functions on humans with their natural emotional intelligence, while utilizing the prediction capabilities for behavioral simulations in the non-reactive, interactive, and proactive functioning of artificial agents. Merely well-molded thinking patterns, which are based on purely logical modeling mostly neglecting individual emotional traits, could have an inverse effect on our supposed higher thinking functions. In summary, many researchers might have come across certain rhetorical questions being used in the introduction and related work sections of their publications. These questions can generally be used when proposing an idea for which there has not yet been published an answer, that might be a first. Accordingly, their intention could also be a second (exclusivity) to suggest how existing problems and issues might be solved. Fortunately, those who can achieve very first personal goals through their publications, also seem to be the ones who ideally ignite a fire with their scientific contributions that endlessly improve prevalent conditions for us. Nonetheless, artificial intelligence remains an outcome of information's point of view, and as such should be explicitly and extensively defined to yield consequential arguments and capabilities for the improved living environments envisaged. It might be clarified in future work, if the development of advanced emotion-oriented software applications could only become facilitated by utilizing emotionally intelligent machines or if their ultimate goal would finally be the intelligent insight associated with conscious deliberation about emotional aspects, enabling humans to fully embrace who they are. Condemning evolutionary theorists of humans who devised learning and development by innate emotional components only, could then be the final observation in our progress. 10.1. Final Thoughts and Broader Implications Advances in deep learning and big data now enable us to design multimodal agents that can learn contextually informed mapping from low-level multimodal perception data to high-level behavioral agents’ traits and states. Further, these trait and state inferences can be guided by the underlying rich databases of behavioral features, such as affective, personality, and human compromise measures, which require both big data and deep data, spanning both the ordinary and extraordinary, across many agents and for a wide array of contexts. When agents understand the affective states and behavioral preferences of their interlocutors, they may then adapt their appearance and behavior, for example, to be more trustworthy and motivating. Anxiety-detecting agents may learn soothing behaviors and people may perceive their stability-enhancing cues through behavioral cues related to personality traits such as neatness and structure, and may even trust such agents more to perform their delicate tasks during anxiety episodes. Such adaptive behaviors by multimodal agents, enhanced through deep learning capabilities, may have reciprocal effects on the emotional and behavioral states and traits of their interactants. However, although agents as predictors of root behavioral traits are very attractive, we have to be cognizant of the ethical issues at play, such as privacy concerns relative to the extracted information and the motivational aspect of the design of the algorithmic technology that implements these responses. For example, if the designers of an algorithmic technology that generates the moderating appearance and actions of an agent to address an interlocutor’s feelings are not affecting the individual in a desired and beneficial way, we may be implementing an agent of deception and harm, rather than an agent of benevolence and stability generation.
References
Jeevani Singireddy et al. Emotional Intelligence in Artificial Agents....... 615