From Affinity Scores to AI Models: A Methodological Framework for Designing Humanoid Robots

The proliferation of humanoid robots across service, healthcare, and domestic environments necessitates a design philosophy that extends beyond pure functionality. The acceptance and trust of human users are critical factors for successful human-robot interaction. A key element influencing this acceptance is the robot’s appearance. This paper details a comprehensive methodological framework aimed at systematically enhancing the perceived affinity of humanoid robots. We employ Kansei Engineering to quantify the link between user perception and design elements, develop a scoring system for affinity, and utilize these findings to guide the targeted training of AI generative models, specifically Stable Diffusion. This approach allows for the efficient generation and evaluation of a multitude of aesthetically pleasing and highly acceptable humanoid robot designs.

The figure above illustrates the growing presence of robotic forms in our environments. However, many contemporary humanoid robot designs, often driven by engineering constraints, project a cold, mechanical, and intimidating aesthetic. This lack of affinity can trigger user discomfort and resistance, significantly hindering adoption. Therefore, translating the abstract concept of “affinity” into concrete, actionable design parameters is a crucial challenge. This research posits that affinity is not monolithic but can be deconstructed into measurable perceptual dimensions. We then correlate these dimensions with specific design elements—form, proportion, material, texture, and color—to create a predictive scoring system. This system, in turn, provides the structured data required to train an AI model to become a specialized designer of high-affinity humanoid robots.

Methodology: Exploring Affinity through Kansei Engineering

Our research employs Kansei Engineering as the foundational methodology. Kansei Engineering is a well-established technique that translates human feelings and impressions (Kansei) into concrete product design parameters. The core process involves collecting user feedback on various design samples, analyzing the data to extract semantic dimensions, and mapping these dimensions onto physical design elements. Our adapted process for the humanoid robot domain consists of four primary phases: Affinity Dimension Definition, Stimulus Preparation, User Study, and Quantitative Analysis.

1. Defining Perceptual Dimensions of Affinity

To accurately measure “affinity,” we first deconstructed it into three distinct yet correlated perceptual dimensions. An initial pool of 30 adjective pairs describing robot appearance was gathered and categorized. Through expert panel discussion and a vote among 25 design professionals, the most representative pairs for each category were selected.

Table 1: Selected Affinity Dimensions and Descriptors
Dimension Category	Selected Adjective Pair	Core Concept
Social Impression	Indifferent – Affectionate	Direct measure of perceived closeness and warmth.
Temperament & Style	Cold/Hard – Gentle	Perception of kindness, softness, and approachability in style.
Vitality & Dynamism	Serious – Lively	Perception of energy, playfulness, and non-threatening behavior.

These three dimensions—Affectionate-ness, Gentleness, and Liveliness—form the multi-faceted scale upon which all subsequent design samples are evaluated.

2. Preparation of Design Stimuli

To isolate the impact of individual design elements, we created abstracted representations of the humanoid robot form, focusing on key components. Each component was varied systematically based on an analysis of existing robot designs.

Head Silhouette: Five common shapes were identified: Square, Circle, Vertical Stadium, Horizontal Stadium, and Semi-circle.
Eye Shape: Four types: Circle, Square, Vertical Stadium, Horizontal Stadium.
Body Proportion: Seven schematics representing growth stages from infant-like to adult, categorized into Slender (infant, child, adolescent) and Sturdy (child, adolescent, adult) builds.
Material & Finish: Four material categories (Plastic, Fabric, Metal, Transparent) each with two surface treatments (e.g., smooth/matte, woven/furry).
Color: Three warm hues and three cool hues at two saturation levels (high and low), plus neutral tones (White, Gray, Black).

These elements were presented to participants in controlled pairings to assess their individual and combined effect on the three affinity dimensions.

3. Data Collection & Statistical Analysis

A questionnaire was distributed, yielding 645 valid responses. Participants rated design samples on 7-point semantic differential scales for the three dimensions. The analysis employed a combination of statistical tests tailored to the data type:

Kendall’s Tau-b and Pearson’s r: Used to assess the correlation between the three affinity dimensions for different element types (ordinal vs. continuous data). The strong positive correlations confirmed the internal consistency of our measurement scale. For most elements, higher Affectionate-ness scores strongly predicted higher Gentleness and Liveliness scores. The relationship for some material finishes and color saturations was more nuanced, providing detailed design insight.
Kruskal-Wallis H Test and Mann-Whitney U Test: Applied to compare rating differences across multiple categories (e.g., head shapes) and between two surface treatments, respectively. These non-parametric tests identified which specific design feature within a category scored significantly higher in affinity.
Spearman’s rho and Linear Regression: Used to analyze trends within ordered sequences (e.g., height progression, color hue shift) and to compare scores between different groups (e.g., high vs. low saturation).

Results: The Affinity Design Code for Humanoid Robots

The comprehensive analysis yielded a clear, quantifiable “code” for designing a humanoid robot with high perceived affinity. The influence of design elements on overall affinity was ranked as follows: Head Shape & Facial Features > Color > Material > Body Proportion.

1. Form & Proportion

Simple, flowing, and rounded contours consistently received higher affinity ratings.

Head Shape: The Horizontal Stadium oval was rated most affectionate and gentle, followed closely by the Circle. Angular shapes (Square, Semi-circle) scored lowest.
Eye Shape: The Vertical Stadium oval was strongly preferred. The Horizontal Stadium oval (often resembling a grimace or determined look) scored negatively on affinity.
Body Proportion: Affinity decreased significantly with increasing height, especially for sturdier builds. The most favorable proportions were those resembling a slender adolescent or child-like form. The trend can be summarized by the following relationship, where $A_p$ represents Affinity score for proportion, $h$ is a normalized height index, and $b$ is a build coefficient (higher for sturdy builds):
$$ A_p \propto – (h \cdot b) $$

Table 2: Affinity Scoring for Form Elements
Element	High-Affinity Feature	Affectionate-ness Score	Gentleness Score	Liveliness Score
Head Shape	Horizontal Stadium	High (6)	High (6)	High (6)
Eye Shape	Vertical Stadium	High (6)	High (6)	High (6)
Body Build	Slender Adolescent	High (6)	Medium-High (5-6)	Medium-High (5-6)

2. Material & Texture

Softness and tactile appeal are paramount for affinity.

Fabric materials, particularly with a furry texture, received the highest affinity scores.
Plastic and transparent materials scored moderately, with matte/frosted finishes consistently rated as more gentle and affectionate than glossy finishes.
Metal, regardless of finish, scored lowest on Affectionate-ness and Gentleness, though it could score moderately on Liveliness. This highlights a key finding: surface texture can decouple the dimensions. A rough texture can increase Gentleness while potentially reducing the perceived Liveliness of a shiny material.

Table 3: Material Affinity Score Summary
Material Type	Example Finish	Composite Affinity Score (0-4)	Key Characteristic
Fabric	Furry, Woven	3 – 4	Highest warmth and softness.
Plastic	Matte, Frosted	1 – 2	Moderate, improved by non-glossy finish.
Transparent	Frosted	1	Moderate, can feel technical.
Metal	Brushed, Polished	0	Low warmth, high mechanical feel.

3. Color

Color psychology plays a significant role, with preference following predictable rules based on hue, saturation (S), and value (V).

Neutral Colors: White > Gray > Black. White achieved the highest affinity scores across all dimensions.
Warm Colors (e.g., yellows, oranges): Lower saturation tones (e.g., cream, peach) were preferred over high-saturation ones (e.g., bright yellow). Affinity tended to increase as hue shifted towards softer yellows and pinks.
Cool Colors (e.g., blues, greens): Similarly, low saturation tones (e.g., light blue, mint) scored higher than vivid ones. Affinity increased as hue shifted towards softer blue- greens.

The overall affinity hierarchy for color schemes was: Low-Saturation Warms ≈ White > High-Saturation Warms > Low-Saturation Cools > High-Saturation Cools. This can be conceptually modeled, where $A_c$ is color affinity, $k$ is a constant, and $f(S, V)$ is a function that decreases with increasing saturation and increases with increasing value:
$$ A_c = k + f(S, V) $$
For warm hues, an additional positive hue shift term $+\Delta H_w$ applies, while for cool hues, a different positive shift $+\Delta H_c$ applies.

4. The Consolidated Affinity Scoring Table

Based on all findings, we constructed a consolidated scoring table. This table assigns a quantitative affinity value (0-4) to each specific design feature, transforming qualitative preferences into a design guidance matrix. This table became the cornerstone for the next phase: training the AI model.

Table 4: Consolidated Design Feature Affinity Score Table
Category	Feature Code	Feature Description	Affinity Score
Head Shape	HS1	Square	1
	HS2	Circle	3
	HS3	Vertical Stadium	0
	HS4	Horizontal Stadium	4
	HS5	Semi-circle	0
Eye Shape	ES1	Circle	2
	ES2	Square	2
	ES3	Vertical Stadium	4
	ES4	Horizontal Stadium	0
Body (Slender)	BP1	Infant-like	3
	BP3	Child-like	4
	BP4	Adolescent-like	3
Body (Sturdy)	BP2	Child-like	2
	BP5	Adolescent-like	2
	BP6	Adult-like	1
	BP7	Adult-like (Broad)	0
Material/Finish	M1	Plastic, Glossy	1
	M2	Plastic, Matte	2
	M3	Fabric, Furry	4
	M4	Fabric, Woven	3
	M5	Metal, Polished	0
	M6	Metal, Brushed	0
	M7	Transparent, Clear	1
	M8	Transparent, Frosted	1
Color Palette	C1	White / Light Neutral	4
	C2	Medium Gray	2
	C3	Low-Saturation Warm	4
	C4	High-Saturation Warm	2
	C5	Low-Saturation Cool	3
	C6	High-Saturation Cool	1
	C7	Black / Dark Neutral	0

Phase II: Training Stable Diffusion with the Affinity Code

The affinity scoring table provided a precise blueprint for creating a specialized AI designer. Our goal was to train a Stable Diffusion (SD) model to generate images of humanoid robots that inherently possess high-affinity characteristics, moving beyond simple text prompting.

1. Training Sample Curation & Annotation

The quality of the training dataset is critical. We curated and created a set of humanoid robot images that explicitly embodied high-scoring features from Table 4.

Curation: Initial images were selected based on overall compliance with affinity traits (e.g., rounded forms, soft colors).
Iterative Refinement: Using SD’s inpainting, ControlNet, and image-to-image features, alongside digital editing, we refined these images. This iterative process ensured each training sample maximized its composite affinity score. For example, a robot with a good head shape might be modified to have a more suitable body proportion and material finish.
Textual Annotation: Each final training image was annotated with descriptive tags. Beyond automatic captioning, we manually added detailed tags based on our affinity code: e.g., “affinity_robot, organic_form, oval_head, vertical_eyes, white_body, plastic_matte, teenage_figure”. This linked the visual features to the textual language the model learns from.

2. Model Training & Iteration

We employed the Dreambooth fine-tuning method to create a dedicated base model, as it adjusts the core weights of the SD neural network to deeply internalize the concept of a “high-affinity humanoid robot.” The process was iterative:

Round 1: Training on samples with high-affinity head/face features. Results were stylized/cartoonish.
Round 2: Added more realistic samples, but generated proportions became too diminutive.
Round 3: Focused on optimizing body proportions (prioritizing slender adolescent builds) and diversifying details. The resulting model outputs showed a strong alignment with affinity goals.

The training parameters were set as follows: Learning Rate ($\eta$): $1 \times 10^{-4}$, Batch Size: 5, Epochs: 10, Optimizer: 8-bit Adam, Network Dimension ($d_{net}$): 128. The training aims to minimize a loss function $\mathcal{L}(\theta)$ over the dataset $\mathcal{D}$:
$$ \theta^* = \arg\min_{\theta} \mathbb{E}_{x, c \sim \mathcal{D}} \left[ \mathcal{L}(x, c; \theta) \right] $$
where $\theta$ are the model parameters, $x$ is the image, and $c$ is the conditioning text embedding based on our detailed annotations.

3. Model Evaluation and Application

Model performance was assessed both quantitatively and qualitatively.

Loss Value: The training loss stabilized around 0.07-0.08 in later epochs, indicating successful learning without severe overfitting.
XY Plot Analysis: We generated a grid of images (X-axis: model weight from 0.2 to 1.0; Y-axis: different model checkpoints from epochs 3, 6, 9). Each output was evaluated using the affinity score table. The model from epoch 6, used at a weight of 0.6-0.8, consistently generated humanoid robot designs with the highest composite affinity scores (e.g., 14/16 points).
Style Matrix Generation: The trained base model can be combined with different stylistic LoRA models (e.g., “cyberpunk,” “organic,” “minimalist”). This allows for the rapid generation of a vast style matrix—a diverse array of humanoid robot concepts that all share the underlying high-affinity DNA. Designers can then quickly filter this matrix using the affinity scoring table to identify the most promising concepts for any given application.

Table 5: Affinity Score Evaluation of Generated Style Matrix Samples
Style LoRA	Sample 1 Score	Sample 2 Score	Sample 3 Score	Sample 4 Score	Sample 5 Score
Minimalist	15	12	10	9	8
Organic	15	14	14	14	14
Futuristic	15	14	14	12	12
Cyberpunk	14	12	12	9	9

Discussion and Conclusion

This research presents a closed-loop methodological framework for the affective design of humanoid robots. By applying Kansei Engineering, we successfully translated the subjective feeling of affinity into an objective, quantifiable scoring system for key design elements—form, material, and color. This code moves beyond vague guidelines, providing actionable metrics such as “prioritize a horizontal stadium head shape (score=4) over a square one (score=1)” or “select low-saturation warms (score=4) over high-saturation cools (score=1).”

The subsequent integration of this code with AI generative model training represents a significant advancement in design methodology. We demonstrated that a Stable Diffusion model, when trained on a dataset curated and refined according to the affinity scoring table, learns to embody these principles. It becomes a proficient generator of humanoid robot concepts that are not only diverse and stylistically varied but also pre-optimized for user acceptance. The ability to rapidly produce a style matrix and filter it using the same scoring table dramatically accelerates the conceptual design phase, allowing designers to explore a wider solution space and make informed decisions based on empirical user preference data.

This work has limitations. The participant pool for the Kansei study, while substantial, may not capture full global cultural diversity in perceptions of robot affinity. Future work should include cross-cultural validation of the affinity scores. Furthermore, the AI training focused primarily on static appearance. The next logical step is to integrate dynamic elements (e.g., gait, gesture, lighting patterns) into the affinity framework and model training. Finally, the interaction between appearance and actual robot behavior is a critical area for study; a friendly-looking humanoid robot must be paired with appropriate social behaviors to sustain user trust.

In conclusion, this framework bridges the gap between human perceptual psychology and advanced AI-driven design tools. It offers a systematic, data-informed, and efficient pathway to design humanoid robots that people are more likely to welcome into their homes, workplaces, and daily lives. By grounding the design of these machines in a deep understanding of human affinity, we can foster more natural, comfortable, and successful human-robot collaboration.