The Art of the Face: How Design Shapes Our Bond with Humanoid Robots

As national strategies and industrial plans worldwide propel the development of robotics, the application domains of humanoid robots are expanding rapidly. They are transitioning from controlled industrial settings into our homes, workplaces, and public spaces, beginning to undertake social tasks and roles that require interaction and cooperation with people. In this new paradigm, the visual design of a humanoid robot, particularly its face, becomes a critical communication channel. Extensive research corroborates that users’ attention is predominantly directed toward the facial region of a humanoid robot compared to other body parts. Consequently, the specific configuration of facial features—the eyes, mouth, facial contour, and their proportions—can profoundly influence a user’s emotional experience, perceived trustworthiness, and overall willingness to engage. Understanding this impact is therefore paramount for designing humanoid robots that are not only functional but also intuitively acceptable and pleasant to interact with.

Current investigations into the effects of facial design on user perception have yielded initial insights. Studies suggest that omitting a mouth on large displays might aid emotional conveyance, while the shape and curvature of the mouth—such as an upturned or neutral line—can significantly affect perceived credibility. Similarly, eye design is crucial; medium-sized, classic round eyes are often associated with approachability and familiarity. However, a significant portion of this existing body of knowledge relies primarily on subjective user reports and ratings. While valuable, these methods can be susceptible to biases and do not reveal the underlying, automatic cognitive and affective processes triggered by a robot’s visage. There remains a substantial gap in objectively evaluating how the face of a humanoid robot shapes user experience at a physiological and neural level.

This is where multimodal measurement approaches offer a powerful lens. Eye-tracking technology provides an objective window into visual attention, quantifying where a user looks (fixations), for how long (fixation duration), and in what sequence. These metrics reveal which design features are salient and engaging. Complementing this, neurophysiological tools like Electroencephalography (EEG) and its derived Event-Related Potentials (ERP) allow us to peer into the brain’s real-time processing. Specific ERP components, such as N1, P2, and P3, are time-locked neural responses that indicate different stages of cognitive processing: early sensory registration, feature discrimination, and higher-order evaluation/attention allocation, respectively. By integrating subjective preferences with eye-tracking and EEG data, we can construct a more holistic and objective understanding of user response. This study employs precisely this triadic methodology to dissect the influence of key facial design features—Facial Width-to-Height Ratio (FWHR), face shape, eye shape, and mouth expression—on users’ affective cognition toward humanoid robots.

Methodological Framework: Measuring Perception

The core of this investigation was an experimental study designed to isolate and measure the effects of specific facial design variables. A set of standardized humanoid robot facial prototypes was created, systematically varying four key attributes based on prevalent design trends and prior research. The variables and their levels were as follows:

Design Variable Levels
Facial Width-to-Height Ratio (FWHR) High, Medium, Low
Face Shape Round, Square
Eye Shape Round, Square
Mouth Expression Smiling (upturned curve), Neutral Linear

A full factorial combination would yield 24 unique faces. However, a preliminary subjective screening phase was conducted where participants rated robots across the three FWHR levels on a “disliked-liked” scale. Statistical analysis revealed a significant main effect for FWHR, with the medium ratio being significantly preferred over both high and low ratios. Consequently, to maintain a focused and manageable stimulus set for the detailed physiological measurements, the subsequent eye-tracking and EEG experiments utilized prototypes with the preferred medium FWHR, combined with the two levels of the other three variables, resulting in 8 distinct humanoid robot faces (2 face shapes × 2 eye shapes × 2 mouth expressions).

Two separate but complementary experiments were conducted with a cohort of participants. In the EEG experiment, participants viewed each robot face in a randomized sequence while their brain activity was recorded via a 64-channel system. Each face was presented briefly, after which the participant made an explicit binary choice regarding their affective preference (like/dislike). This protocol allowed for the analysis of neural ERPs time-locked to the onset of the visual stimulus, correlating specific brainwave components with the different facial features.

The eye-tracking experiment presented the same 8 faces for a longer, free-viewing period. Using a remote eye-tracker, participants’ gaze patterns were recorded as they naturally observed each humanoid robot face. Areas of Interest (AOIs) were defined post-hoc for the eyes, mouth, face contour, and background. Key metrics extracted for analysis included the total number of fixations within an AOI (indicating attentional capture) and the total fixation duration on an AOI (indicating cognitive processing depth). Subjective preference ratings for the 8 designs were also collected using standardized scales.

Decoding the Results: A Triangulated Perspective

The data from the three measurement streams—subjective, ocular, and neural—painted a detailed and convergent picture of how facial design influences perception of a humanoid robot.

1. The Power of Proportion and Contour: The initial screening confirmed that a medium Facial Width-to-Height Ratio is a robust foundation for a likable humanoid robot face. In the main experiments, face shape emerged as a powerful factor. Subjectively, robots with a square face contour were rated as more likable than those with a round face. The EEG data provided a neural correlate for this preference. The N1 component, an early negative deflection around 100-150ms post-stimulus associated with initial attentional resource allocation, showed significantly greater amplitude for square faces. This suggests that the square contour triggered a stronger early visual engagement. Following this, the P2 component (a positive peak around 200-250ms), often linked to stimulus classification and feature analysis, was also more pronounced for square faces. This indicates that the brain dedicated more processing resources to evaluating this feature, culminating in the higher subjective liking score.

2. The Allure of a Smile and the Gaze toward Round Eyes: The mouth expression was another critical determinant of appeal. Subjectively, the smiling mouth was overwhelmingly preferred over the neutral linear mouth. Eye-tracking data revealed that this preference was mirrored in visual behavior; the smiling mouth attracted a significantly higher number of fixations. The brain’s response further solidified this finding. The P3 component, a later positive wave (300-400ms) that reflects higher-order cognitive processes, context updating, and the allocation of attentional resources to motivationally significant stimuli, showed a stronger amplitude for faces with a smiling mouth. This neural signature indicates that the smiling expression was not merely noticed but was deeply evaluated and held greater motivational relevance for the viewer, aligning perfectly with the explicit preference.

Eye shape primarily influenced the visual exploration pattern. While it did not produce a main effect on subjective liking in this specific set, the eye-tracking metrics were unequivocal: round eyes attracted a significantly greater number of fixations and longer total fixation duration compared to square eyes. This underscores the high visual salience and engagement potential of round eyes in the design of a humanoid robot face. Interestingly, a significant interaction was found in the neural data: the combination of a smiling mouth and round eyes elicited the strongest P3 amplitude. This suggests a synergistic effect where these two “positive” or “classic” features together create a particularly engaging and affectively salient stimulus, demanding the highest level of cognitive-evaluative processing.

The table below summarizes the key findings from the eye-tracking analysis, highlighting where users focused their attention:

Facial Feature Key Eye-Tracking Finding Interpretation
Mouth Expression Higher fixation count for smiling mouth. Smiling mouth is more visually engaging and attention-capturing.
Eye Shape Higher fixation count and duration for round eyes. Round eyes have higher salience and prompt deeper visual processing.
Overall Pattern Longest fixation duration on eyes and face contour areas. These are the primary zones for information gathering from a humanoid robot face.

The ERP component amplitudes, which form the core neural evidence, are summarized below:

ERP Component Design Feature with Stronger Amplitude Cognitive Stage & Implication
N1 (~100-150ms) Square Face > Round Face Enhanced early attentional capture by the square contour.
P2 (~200-250ms) Square Face > Round Face Increased perceptual feature analysis of the square face shape.
P3 (~300-400ms) Smiling Mouth > Neutral Mouth; Strongest for Smile + Round Eyes combo Greater cognitive evaluation and motivated attention to positive/synergistic features.

Theoretical Integration: A Framework for Design Cognition

The results can be synthesized into a coherent cognitive processing model for humanoid robot facial perception. The journey from visual stimulus to affective preference involves sequential yet overlapping stages, each measurable through specific tools.

Stage 1: Preattentive & Early Attention (N1/P2 Era). Upon viewing the humanoid robot face, low-level visual features are processed. The significant N1/P2 effects for face shape indicate that geometric contour is a primitive feature that differentially modulates very early neural responses. A square contour may create a more distinct or “cartoonish” visual signature that efficiently engages sensory pathways. This stage can be modeled as the initial filtering of design feature $F_i$ (e.g., contour sharpness):

$$
\text{Neural Engagement}_i = \alpha \cdot \text{Salience}(F_i) + \beta
$$

where higher salience (e.g., of a square vs. round contour) leads to greater amplitude in early components like N1.

Stage 2: Focused Visual Analysis & Feature Integration (Eye-Tracking Window). As viewing continues, overt attention is directed by both stimulus-driven salience and goal-driven interests. The eye-tracking data reveals the allocation of this finite resource. The high fixation counts and duration on round eyes and smiling mouths show these features act as attentional “hubs.” This can be conceptualized as a visual sampling process where the probability of fixating on an AOI $A_j$ is a function of its feature-based salience and its informational value for forming an affective impression $I$:

$$
P(\text{Fixation on } A_j) \propto \text{Salience}(A_j) + \gamma \cdot \text{Relevance}(A_j, I)
$$

Stage 3: Cognitive-Evaluative Processing (P3 Era & Subjective Judgment). The extracted visual information is integrated and evaluated against internal schemas (e.g., for friendliness, anthropomorphism). The P3 component is a hallmark of this stage. Its enhancement by the smiling mouth and the synergistic “smile + round eyes” combination signifies the culmination of this evaluation. These features likely match a “positive affective” or “highly anthropomorphic” schema, triggering a larger P3 as the brain updates its context with this salient appraisal. This final stage directly feeds the conscious subjective rating $S$:

$$
S = \delta \cdot \text{P3 Amplitude} + \epsilon
$$

where a larger P3 amplitude, reflecting deeper evaluative processing of motivationally significant features, predicts a higher liking score.

This framework aligns with neuroaesthetic and affective computing principles, suggesting that successful design of a humanoid robot face requires features that successfully navigate this three-stage cascade: capturing early attention, sustaining visual engagement, and positively resolving the cognitive-evaluative process.

Conclusion and Implications

This multimodal investigation demonstrates that the face of a humanoid robot is far more than a decorative shell; it is a complex interface that systematically modulates human cognition and affect. The findings provide robust, objective evidence to guide design principles:

  1. A medium Facial Width-to-Height Ratio provides a balanced and generally preferred foundational structure.
  2. A square face contour enhances early neural engagement and attentional capture, leading to higher subjective liking, potentially by offering a clear, stylized aesthetic.
  3. Incorporating a smiling mouth expression is highly effective, attracting more visual fixations, driving deeper cognitive evaluation (as seen in P3), and resulting in stronger affective preference.
  4. Using round eyes significantly increases the visual salience and attention-holding capacity of the humanoid robot‘s face.
  5. Designing for feature synergy—such as combining a smiling mouth with round eyes—can create a particularly powerful positive effect, eliciting the strongest neural signatures of evaluative processing.

The integration of subjective, eye-tracking, and neural measures moves beyond mere opinion to reveal the mechanisms of perception. This approach offers a rigorous framework for evaluating and iterating on the design of social and collaborative humanoid robots. Future work can build on this by exploring dynamic faces, cultural differences in feature perception, and longer-term interaction effects. Ultimately, by grounding design in the science of human perception and cognition, we can create humanoid robots that are not only more capable but also more intuitively understandable and positively received by the people they are meant to serve and work alongside.

Scroll to Top