A Study on Affinity in Humanoid Robot Design Using AI-Driven Methods

The integration of humanoid robots into daily life and commercial service scenarios is accelerating. Their success, however, hinges not only on functional prowess but critically on social acceptance. A key determinant of this acceptance is the robot’s exterior design, which directly influences user perception, emotional response, and ultimately, trust. Designs perceived as cold, mechanical, or intimidating can trigger user aversion, while those exhibiting affinity can foster positive interaction. This paper explores a structured methodology to define, quantify, and generate high-affinity appearances for humanoid robots by integrating Kansei Engineering with targeted training of Stable Diffusion (SD) models.

An example of a humanoid robot in a service context.

The concept of affinity in design refers to the quality of a product that aligns with human physiological and psychological factors, eliciting feelings of pleasure, comfort, and ease. For a humanoid robot, this translates to an appearance that feels approachable, friendly, and safe. Kansei Engineering provides a robust framework for translating such subjective, affective responses (“Kansei”) into concrete, objective design parameters. It establishes a mapping between the user’s emotional space and the designer’s physical space of elements like form, color, and material.

Meanwhile, generative Artificial Intelligence (AI), particularly diffusion models like Stable Diffusion, has revolutionized creative workflows. These models can produce a vast array of high-quality visual concepts from textual descriptions. However, their raw output is often generic and lacks precision for specialized domains like humanoid robot design. The challenge lies in steering these powerful models to generate concepts that adhere to specific, research-backed design principles—in this case, principles of affinity. This study bridges this gap by using quantitative insights from Kansei analysis to curate training data and guide the fine-tuning process of an SD model, creating a dedicated tool for affinity-focused humanoid robot design.

1. Defining Affinity: A Kansei Engineering Approach

To move beyond vague descriptors, we first deconstruct the “affinity” of a humanoid robot into measurable dimensions. Through literature review and expert discussion, affinity-related semantic pairs were collected and categorized. A panel of design professionals then selected the most representative pairs, resulting in three primary evaluation dimensions:

Affinity Degree: Measures the direct feeling of closeness and approachability (e.g., Cold ↔ Warm).
Gentleness Degree: Captures the stylistic and temperamental impression of care and softness (e.g., Hard ↔ Gentle).
Liveliness Degree: Reflects the perceived vitality and dynamism (e.g., Serious ↔ Lively).

These three dimensions were used to construct a semantic differential survey. The physical design elements of a humanoid robot were abstracted and systematically varied to create stimulus material. Key elements included:

Design Element	Variations
Head Form	Square, Circle, Vertical Stadium, Horizontal Stadium, Semicircle
Eye Form	Circle, Square, Vertical Stadium, Horizontal Stadium
Body Proportion	Slim (Child, Teen) vs. Stocky (Child, Teen, Adult) builds; varying heights
Material & Finish	Plastic (glossy/matte), Fabric (woven/furry), Metal (smooth/brushed), Transparent (clear/frosted)
Color	Warm/Cool hues at High/Low saturation; Neutral whites, grays, blacks

Participants rated each stimulus across the three affinity dimensions on a Likert scale. Statistical analysis (Kruskal-Wallis H tests, Mann-Whitney U tests, Spearman/Pearson correlation) of 645 valid responses yielded the following insights:

Form: Rounded, smooth, and simple contours scored highest. A horizontal stadium shape for the head and a vertical stadium shape for the eyes were most favorable. A slim, teenage-like body proportion was preferred over bulkier or extremely child-like proportions. The correlation between height and affinity was negative ($\rho_s = -0.340, p<0.01$), especially for stocky builds.

Material: Soft materials like fabric (particularly with a furry texture) generated the strongest affinity. Plastic and transparent materials with matte/frosted finishes were neutral. Glossy or metallic surfaces consistently scored low on affinity and gentleness, though sometimes higher on liveliness, indicating a trade-off.

Color: High brightness and low saturation enhanced affinity. Warm colors (e.g., light yellow, pale orange) were generally preferred. For cool colors, reducing saturation and increasing brightness improved affinity scores. Among neutrals, white was most favorable, followed by gray, with black being least favorable ($\rho_s \approx -0.62, p<0.01$ for the white-to-black trend).

The three evaluation dimensions showed strong internal consistency for most elements (e.g., for head form: Affinity-Gentleness $\tau=0.356$, p<0.01), confirming they reliably measure the same underlying construct. However, for materials and some color comparisons, Affinity and Liveliness were occasionally negatively correlated, suggesting that maximizing affinity might require tempering extreme dynamism.

These findings were synthesized into a quantitative Affinity Scoring Table, assigning points to each design feature variant. This table served as the objective ground truth for both evaluating existing designs and guiding the creation of new ones. The overall affinity score $S_A$ for a design can be modeled as a weighted sum of its feature scores:

$$
S_A = \sum_{i=1}^{n} w_i \cdot s(f_i)
$$

where $f_i$ represents a design feature (e.g., head shape), $s(f_i)$ is its score from the lookup table, and $w_i$ is its relative importance weight derived from survey analysis.

2. Training a Stable Diffusion Model for Affinity

With clear affinity criteria established, the next step was to train a generative AI model to produce humanoid robot concepts adhering to these principles. Stable Diffusion, a latent diffusion model, was chosen for its flexibility and open-source nature. The core challenge was to “teach” the model the visual language of affinity as defined by our Kansei study, moving beyond generic prompt engineering.

The training pipeline involved four key stages:

2.1 Training Sample Curation & Creation
A high-quality, consistent dataset is crucial. Initial images of humanoid robots were gathered and then meticulously edited using a combination of AI tools (SD’s inpainting/ControlNet) and manual digital painting. The editing goal was to align each sample with high-scoring features from the Affinity Scoring Table: applying rounded forms, soft material textures, friendly color palettes, and appropriate proportions. This resulted in a curated set of images that were visually cohesive and explicitly embodied the target affinity traits.

2.2 Dataset Tagging
Each training image was annotated with descriptive text captions. Automated tagging was supplemented with meticulous manual annotation to ensure accuracy and comprehensiveness. Tags included overarching styles (“affinity robot”, “organic form”), specific design features (“oval head”, “vertical eyes”, “white body”), materials (“plastic material”), and scene context (“full body, standing”). This textual data allows the SD model to learn the association between the affinity visual features and the corresponding descriptive words.

2.3 Model Fine-Tuning & Iteration
We employed the Dreambooth fine-tuning method to create a specialized model checkpoint. The process was iterative. An initial model was trained on the first batch of affinity-focused images. Its outputs were generated and evaluated against the Affinity Scoring Table. To address shortcomings (e.g., lack of realism, incorrect proportions), new training samples were created or existing ones modified, and the model was fine-tuned again. This cycle refined both the dataset and the model.

Training Round	Focus of Sample Improvement	Avg. Sample Affinity Score
1	Basic affinity form & color	12
2	Increased realism, adjusted proportions	14
3	Optimized proportions (slim teen), added detail diversity	16

Key training parameters were set as follows: Learning Rate = $1 \times 10^{-4}$, Batch Size = 5, Epochs = 10, using the Cosine scheduler and 8-bit Adam optimizer. The training loss curve was monitored, with an ideal loss value converging around 0.07-0.08 in later epochs, indicating effective learning without overfitting.

2.4 Model Selection and Application
After training, multiple model checkpoints (from different epochs) were generated. An X-Y plot was used to visualize outputs from different checkpoints at various conceptual strengths (weights). The best-performing model was selected based on both the loss value and a qualitative assessment of output quality and adherence to affinity traits using the scoring table.

The final trained model acts as a dedicated affinity design generator. It can be used alone or combined with other Low-Rank Adaptation (LoRA) style models to create a diverse Style Matrix of humanoid robot concepts. Each concept in the matrix can be rapidly evaluated using the Affinity Scoring Table, allowing designers to efficiently identify the highest-scoring, most promising designs from a large batch of AI-generated options.

3. Discussion and Implications

This study presents a closed-loop methodology for affective design in robotics. The Kansei Engineering phase translates the subjective goal of “affinity” into an objective, quantifiable framework linked to specific design elements of the humanoid robot. This framework does not remain merely analytical; it directly fuels the development of a sophisticated generative AI tool. The trained Stable Diffusion model internalizes these affinity principles, enabling the rapid exploration of a design space that is pre-constrained towards human-friendly outcomes.

The practical implications are significant. Designers can move from slow, manual sketching to AI-assisted generation of hundreds of pre-validated concepts. The Affinity Scoring Table provides a consistent, evidence-based filter for selecting the best directions. This accelerates the early design phase and enhances the likelihood of creating a humanoid robot that users will find approachable and trustworthy.

However, limitations exist. The cultural generality of the affinity preferences identified through our survey warrants further investigation. Future work should involve cross-cultural studies to validate and potentially adapt the scoring framework. Furthermore, the interaction between static appearance and dynamic movement/behavior is critical for overall perception; a holistic approach combining affective form design with affective motion design is a necessary next step.

In conclusion, by marrying the human-centric, analytical rigor of Kansei Engineering with the generative power of modern AI, this research offers a novel and effective pipeline for the affective design of humanoid robots. It provides a methodological blueprint for creating robotic appearances that bridge the gap between technological capability and social acceptance, paving the way for more seamless and positive human-robot interaction in our everyday environments.