Exploration of Affinity Appearance in Humanoid Robots and Training of Stable Diffusion Models

In recent years, the integration of humanoid robots into daily life and commercial services has accelerated, necessitating designs that foster user acceptance and trust. A critical aspect of this integration is the aesthetic appeal of humanoid robots, which significantly influences human-robot interaction. Specifically, the concept of “affinity” in design—referring to the ability of a product to evoke positive emotional responses such as comfort, warmth, and approachability—plays a pivotal role in enhancing user engagement. Traditional design approaches for humanoid robots often prioritize engineering functionality over emotional appeal, resulting in rigid and impersonal appearances that can trigger user aversion. To address this gap, we employ Kansei Engineering, a methodology that translates subjective human emotions into quantifiable design parameters, to systematically investigate the affinity characteristics of humanoid robot exteriors. By correlating user perceptions with design elements such as form, material, and color, we establish a robust framework for evaluating and generating affinity-driven designs. Furthermore, we leverage artificial intelligence (AI), particularly Stable Diffusion (SD) models, to automate and optimize the design process. This study details our approach to training SD models using affinity-based criteria, enabling the rapid generation of diverse and highly亲和力 humanoid robot设计方案. Our methodology not only improves design efficiency but also provides a scalable solution for creating humanoid robots that resonate emotionally with users, thereby facilitating their adoption in various real-world scenarios.

The foundation of our research lies in Kansei Engineering, which bridges the gap between human sensibility and product design. We define affinity through three key dimensions: approachability (reflecting social closeness), gentleness (conveying a soft and caring demeanor), and liveliness (indicating vitality and dynamism). These dimensions are operationalized using semantic differential scales, where participants rate design elements on a continuum from negative to positive poles. For instance, approachability is measured from “cold” to “affectionate,” gentleness from “hard” to “soft,” and liveliness from “serious” to “playful.” To gather comprehensive data, we designed a questionnaire featuring abstract representations of humanoid robot components, including head shapes (e.g., square, circle, vertical oval), eye designs (e.g., circular, rectangular), body proportions (e.g., child-like, adolescent, adult), materials (e.g., plastic, fabric, metal), and color schemes (e.g., warm, cool, neutral tones). A total of 645 valid responses were collected and analyzed using statistical methods such as Kendall’s tau for ordinal data and Pearson’s correlation for continuous variables. The results revealed strong positive correlations among the three affinity dimensions, confirming their consistency in capturing the essence of亲和力. For example, head shape and eye design showed correlation coefficients above 0.3 between approachability and gentleness, indicating that rounded, organic forms enhance both dimensions simultaneously. This analysis allowed us to derive an affinity scoring system, where each design feature is assigned a numerical value based on its impact on user perceptions. The scoring table serves as a guideline for selecting and optimizing design elements in subsequent AI model training.

Affinity Scores for Key Design Elements of Humanoid Robots
Design Element Category Approachability Score Gentleness Score Liveliness Score Total Affinity Score
Head Shape Square (A1) 3 3 3 9
Circle (A2) 3 3 3 9
Vertical Oval (A3) 0 0 1 1
Horizontal Oval (A4) 4 4 4 12
Semi-circle (A5) 0 1 1 2
Eye Design Circular (B1) 1 1 1 3
Rectangular (B2) 1 1 1 3
Vertical Oval (B3) 3 3 3 9
Horizontal Oval (B4) 0 0 0 0
Body Proportion Child-like (C1) 3 3 3 9
Adolescent (C2) 2 2 2 6
Adult (C3) 1 1 1 3

The statistical analysis further quantified the relationships between design elements and affinity dimensions. For instance, the correlation between height and approachability was negative ($r_s = -0.340$, $p < 0.01$), indicating that shorter humanoid robots are perceived as more approachable. Similarly, color analysis showed that neutral colors like white had higher affinity scores ($r_s = -0.613$ for whiteness to blackness, $p < 0.01$), while warm colors with low saturation outperformed high-saturation counterparts. These findings are summarized in the affinity score table, which we use to guide the curation of training data for AI models. The scores are derived from multivariate regression models, such as:

$$ ext{Affinity} = \beta_0 + \beta_1 \cdot ext{HeadShape} + \beta_2 \cdot ext{EyeDesign} + \beta_3 \cdot ext{BodyProportion} + \epsilon $$

where $\beta$ coefficients represent the weight of each design element, and $\epsilon$ is the error term. This equation helps in predicting the overall affinity of a design based on its features, enabling precise optimization.

To translate these insights into practical designs, we employ Stable Diffusion (SD) models, a class of latent diffusion models that generate images from text prompts or input images. The SD framework consists of components like a text encoder (e.g., CLIP), a latent diffusion model (including U-Net), and an autoencoder (VAE). The core process involves iteratively denoising a random noise image to produce a coherent output. However, standard SD models often struggle with generating specific亲和力 characteristics without targeted training. Thus, we adopt two training methods: DreamBooth, which fine-tunes the entire SD model on custom datasets, and LoRA (Low-Rank Adaptation), which injects trainable layers for efficient adaptation. DreamBooth is used for primary model training to capture broad affinity features, while LoRA models are employed for stylistic variations, balancing quality and computational efficiency. The training process involves four key steps: sample preparation, annotation, iterative training, and model selection.

In sample preparation, we curate a dataset of humanoid robot images that exemplify high-affinity traits based on our scoring table. Each image is processed using tools like ControlNet and Photoshop to refine details such as rounded edges, soft textures, and harmonious color palettes. For instance, we modify initial designs to incorporate horizontal oval heads and vertical oval eyes, which scored highest in affinity. The annotation phase involves automatically and manually tagging images with descriptive prompts that emphasize亲和力 elements, such as “affinity robot, organic form, soft materials, pastel colors.” This ensures that the SD model learns to associate these terms with visual features. During iterative training, we adjust parameters over multiple epochs, using the affinity scores to evaluate and select samples for each round. The training parameters are optimized as follows:

Training Parameters for Stable Diffusion Model
Parameter Value Description
Learning Rate $1 \times 10^{-4}$ Step size for weight updates
Iterations 10 Number of steps per epoch
Batch Size 5 Number of samples processed together
Epochs 10 Full passes through the dataset
Optimizer 8-bit Adam Algorithm for gradient descent
Scheduler Cosine Learning rate schedule
DIM 128 Dimension of latent space
Alpha 64 LoRA scaling parameter

The training loss function, which measures the difference between generated and target images, is minimized over epochs. The loss $L$ is defined as:

$$ L = \mathbb{E}_{z, \epsilon, t} \left[ \| \epsilon – \epsilon_ heta(z_t, t, c) \|^2 \right] $$

where $z$ is the latent representation, $\epsilon$ is the noise, $t$ is the timestep, $c$ is the conditioning prompt, and $\epsilon_ heta$ is the denoising function. We monitor the loss curve, aiming for values around 0.08 by epochs 7–9, indicating effective convergence. After training, we generate XY cross-plot matrices to visualize outputs across different model weights and epochs, selecting the best-performing combinations based on affinity scores. For example, a model at epoch 6 with a weight of 0.6–0.8 produced designs with the highest affinity scores, as validated by our scoring table.

The trained SD models enable the generation of diverse humanoid robot designs, which we organize into a style matrix. This matrix combines our affinity-optimized base model with various LoRA models to produce multiple stylistic interpretations, such as minimalist, futuristic, or organic themes. Each design in the matrix is evaluated using the affinity score table, allowing for rapid identification of high-affinity options. For instance, designs featuring low-saturation warm colors, fabric textures, and adolescent proportions consistently achieve scores above 12, indicating strong user appeal. This approach not only accelerates the design process but also ensures that emotional factors are systematically incorporated. In practice, generating a style matrix with 5 styles and 5 variations each yields 25 designs, from which we can filter top candidates in minutes—a task that would take days manually. The integration of AI thus democratizes affinity-driven design, making it accessible for iterative development and customization.

In conclusion, our study demonstrates a comprehensive framework for enhancing the亲和力 of humanoid robots through quantifiable design principles and AI-driven generation. By applying Kansei Engineering, we establish a clear link between user emotions and design elements, resulting in an affinity scoring system that guides both human and algorithmic decision-making. The training of Stable Diffusion models with affinity-optimized datasets facilitates the efficient production of diverse and emotionally resonant designs, addressing the limitations of traditional engineering-centric approaches. This methodology has significant implications for the widespread adoption of humanoid robots in service-oriented roles, where user trust and comfort are paramount. Future work could expand this approach to cross-cultural studies, incorporate dynamic interactions, and explore real-time customization based on user feedback. Ultimately, our research underscores the potential of AI as a collaborative tool in creative processes, paving the way for more humane and engaging robotic companions.

Scroll to Top