Exploring Affinity Appearance in Humanoid Robots and Training Stable Diffusion Models

In recent years, the integration of humanoid robots into daily life and commercial service scenarios has accelerated, highlighting the need for designs that foster user acceptance and trust. As a researcher focused on human-robot interaction, I have observed that the external appearance of a humanoid robot significantly influences emotional and functional perceptions. However, many existing humanoid robot designs prioritize engineering perspectives, resulting in cold, rigid aesthetics that lack affinity. This deficiency can trigger user resistance and reduce willingness to engage with the robot. To address this, I have embarked on a study to explore the affinity aspects of humanoid robot appearances and develop an AI-driven approach using Stable Diffusion (SD) models for efficient design generation. Affinity in design refers to the alignment of a product with human physiological and psychological factors, delivering pleasant, comfortable, and relaxed emotional experiences. By leveraging Kansei Engineering, I aim to translate subjective perceptions of affinity into quantifiable design elements, such as form, material, and color, and use these insights to train SD models for rapid, high-affinity design prototyping.

Affinity design for humanoid robots involves a multi-dimensional evaluation to capture user emotions. Through Kansei Engineering, I identified three key dimensions for measuring affinity: affinity degree (reflecting closeness and approachability), gentleness degree (representing warmth and care in style), and liveliness degree (indicating vitality and dynamism). These dimensions were derived from a selection of semantic differential pairs, such as “indifferent-affectionate,” “cold-gentle,” and “serious-lively,” which were voted on by design professionals to ensure relevance. To gather user feedback, I designed a questionnaire featuring abstracted elements of humanoid robot appearances, including head shapes (e.g., square, circle, vertical oval, horizontal oval, semi-circle), eye shapes (e.g., round, square, vertical oval, horizontal oval), body proportions (categorized into infant-like, child-like, adolescent, and adult figures with slim or robust builds), materials (plastic, fabric, metal, transparent with varying surface treatments), and colors (warm, cool, and neutral hues with adjusted saturation and brightness). A total of 645 valid responses were collected and analyzed using statistical methods like Kendall’s correlation, Spearman’s correlation, and Kruskal-Wallis H tests to determine the impact of each design element on affinity. The results revealed that head shape, facial expression, color, material, and body proportion, in descending order, most significantly influence affinity. For instance, horizontal oval heads and vertical oval eyes scored highest in affinity, while softer materials like fabric and low-saturation warm colors enhanced user perceptions. The correlation analysis showed strong positive relationships among the three affinity dimensions, confirming their consistency in evaluating humanoid robot designs.

The quantitative analysis allowed me to develop an affinity scoring table for design elements, which serves as a guideline for creating high-affinity humanoid robot appearances. For head shapes, horizontal oval (A4) and circular (A2) forms received the highest scores, whereas angular shapes like square (A3) scored lowest. Eye shapes such as vertical oval (B3) were most preferred, and body proportions like infant-like (C1) and adolescent slim (C3) builds were rated highly. Materials like fabric with textured surfaces (D3) outperformed smooth metals (D5) in affinity, and colors with high brightness and low saturation, such as light yellows and oranges, were favored. This scoring system enables precise design requirements; for example, the affinity score for a humanoid robot with a horizontal oval head, vertical oval eyes, adolescent slim body, plastic material, and white color can be calculated as follows: if each element is assigned a score based on its category, the total affinity score $S$ is the sum of individual element scores, represented as $S = \sum_{i=1}^{n} s_i$, where $s_i$ is the score for the $i$-th design element. This formula helps in evaluating and optimizing designs before prototyping. The table below summarizes the affinity scores for key design elements, derived from the questionnaire analysis.

Design Element	Category	Affinity Score
Head Shape	Horizontal Oval (A4)	3
Head Shape	Circle (A2)	3
Head Shape	Square (A3)	0
Eye Shape	Vertical Oval (B3)	3
Eye Shape	Round (B1)	1
Body Proportion	Adolescent Slim (C3)	4
Body Proportion	Infant-like (C1)	3
Material	Fabric with Texture (D3)	3
Material	Plastic with Matte Finish (D2)	2
Color	Low-Saturation Warm (E5)	3
Color	White (E1)	4

To translate these insights into practical designs, I employed Stable Diffusion (SD) models, an AI-based image generation technology that uses latent diffusion processes. The SD framework consists of components like a text encoder (e.g., CLIP), a latent diffusion model (including U-Net), and an autoencoder decoder (VAE), which work together to denoise random inputs into coherent images. However, generating high-affinity humanoid robot designs solely through prompt-based methods proved challenging, as SD often lacks specificity for detailed aesthetic requirements. Therefore, I focused on training custom SD models using the affinity scores as a foundation. The training process involved several steps: sample preparation, annotation, iterative training, and model selection. For sample preparation, I curated images that embodied high-affinity features, such as rounded forms and soft textures, and used tools like ControlNet and Photoshop to refine details. Each sample was automatically and manually annotated with tags like “affinity robot,” “organic form,” and “white body” to ensure accurate learning. The training utilized Dreambooth for full-model adjustments and Lora for efficient fine-tuning, with parameters set as follows: Learning Rate of $1 \times 10^{-4}$, Iteration of 10, Batch Size of 5, Epoch of 10, Optimizer as 8bit-Adam, Scheduler as Cosine, DIM of 128, and Alpha of 64. The loss function during training, denoted as $L(\theta)$, where $\theta$ represents model parameters, was minimized through gradient descent: $\theta_{t+1} = \theta_t – \eta \nabla L(\theta_t)$, with $\eta$ as the learning rate. This iterative process allowed the model to learn affinity features effectively, as evidenced by loss values decreasing to around 0.08 in later epochs.

The model training involved multiple rounds of iteration to enhance affinity. In the first round, samples with cartoonish styles were used, but they lacked realism; subsequent rounds incorporated more realistic proportions and details, leading to improved affinity scores. For example, the affinity score for round three samples reached 16, based on the scoring table. To evaluate the trained models, I generated XY crossplots showing outputs at different epochs and weights, and selected the best-performing model (epoch 6 at 0.6-0.8 weight) based on affinity assessments. The resulting SD model can generate diverse humanoid robot designs quickly, and when combined with style-specific models, it produces a style matrix of options. Each design in the matrix can be scored for affinity, enabling rapid identification of optimal solutions. For instance, in a style matrix with five variations, the first design often scored highest (e.g., 15 out of a possible 16), demonstrating the method’s efficacy. The table below illustrates a subset of such a style matrix with affinity scores, highlighting how this approach streamlines the design process.

Style ID	Design 1 Score	Design 2 Score	Design 3 Score	Design 4 Score	Design 5 Score
1	15	12	10	9	8
2	15	14	14	14	14
3	15	14	14	12	12
4	14	12	12	9	9

In conclusion, this study successfully quantifies the affinity of humanoid robot appearances through Kansei Engineering and leverages AI-assisted design with Stable Diffusion models to generate high-affinity solutions efficiently. The affinity scoring system provides a clear framework for evaluating design elements, while the SD training process enables rapid prototyping and optimization. This approach not only enhances design efficiency but also improves user acceptance by aligning robot aesthetics with human emotional needs. However, limitations exist, such as the relatively homogeneous sample group in the questionnaire, which may affect generalizability. Future work could expand to cross-cultural studies, incorporate more diverse styles, and integrate behavioral interactions with外观 design. By advancing this methodology, I aim to foster broader adoption of humanoid robots in various service scenarios, ultimately enhancing human-robot collaboration through emotionally resonant designs.

The application of AI in humanoid robot design represents a paradigm shift, allowing for iterative improvements based on user feedback. The SD model’s ability to generate numerous variations quickly means that designers can explore a wider design space, focusing on affinity-driven features. For example, the correlation between material texture and affinity can be modeled mathematically: if $A$ represents affinity score, $T$ texture roughness, and $C$ color saturation, a multiple regression equation like $A = \beta_0 + \beta_1 T + \beta_2 C + \epsilon$ could predict affinity, where $\beta$ coefficients are derived from empirical data. Similarly, in training, the noise addition process in SD can be described as $x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 – \bar{\alpha}_t} \epsilon$, where $x_t$ is the noisy image at step $t$, $x_0$ is the original, $\bar{\alpha}_t$ is a noise schedule, and $\epsilon$ is Gaussian noise. By fine-tuning this process with affinity-based samples, the model learns to generate images that emphasize soft, rounded forms and warm colors, key traits for humanoid robot acceptance. This integration of qualitative insights and quantitative AI tools paves the way for more humane and engaging robotic companions, reinforcing the importance of affinity in the evolving landscape of humanoid robot design.