Back to Home
Research Whitepaper

Digital Twin Agent Training Report

Achieving 73.90% Human Behavioral Similarity

January 7, 2026
15 min read
MimixLabs Research Team

Key Results

73.90%
Weighted Optimized Similarity
+12.50%
Improvement vs Baseline
42M
Events Analyzed

Abstract

This report documents the methodology, implementation, and results of training AI agents to simulate human e-commerce behavior. Using a novel combination of personality-driven prompting, LLM-based evaluation, and behavioral clustering, we achieved 73.90% weighted optimized similarity (exceeding our 70% target). The system uses the REES46 e-commerce dataset (13,492 subjects) and Google's Gemini 3 Flash model via OpenRouter for cost-effective LLM evaluation.

Training Configuration

ParameterQuick TestFull Training
Total Subjects13,49213,492
Training Subjects10010,793 (80%)
Validation Subjects502,699 (20%)
Clusters33
Max Iterations315
LLM Modelgoogle/gemini-3-flash-preview

1. Introduction

Problem Statement

Creating AI agents that accurately simulate human behavior in e-commerce environments is crucial for:

  • A/B Testing: Testing UI changes without real user traffic
  • User Research: Understanding how different personality types interact with products
  • Predictive Analytics: Forecasting conversion rates and user journeys

The challenge is ensuring these agents behave like real humans, not idealized or random actors.

2. Dataset: REES46 E-Commerce

We used the REES46 eCommerce Behavior Dataset from Kaggle, one of the largest publicly available e-commerce clickstream datasets.

Total Events
42+ million
Unique Users
1.4+ million

Action Types Distribution

view_product~85%
cart~10%
purchase~5%

3. Behavioral Identity Persona (BIP) Model

The BIP model consists of 6 layers that define an agent's complete behavioral profile:

The 6 Layers

Layer 1: PERSONALITY — OCEAN traits + derived
Layer 2: SKILLS — Learned action patterns
Layer 3: HEURISTICS — Quick decision rules
Layer 4: EMOTIONS — Dynamic state + thresholds
Layer 5: MEMORY — What to remember
Layer 6: PROVENANCE — Training data source

OCEAN Personality Model

TraitLow BehaviorHigh BehaviorShopping Impact
OpennessConventionalCreative, curiousExploration breadth
ConscientiousnessSpontaneousDetail-orientedResearch depth
ExtraversionReservedOutgoingSocial proof weight
AgreeablenessSkepticalTrustingPrice sensitivity
NeuroticismCalmAnxiousFriction tolerance

4. Similarity Metrics

The overall similarity score is a weighted combination of seven metrics:

Overall Score = 0.25 × Sequence Similarity (LCS)
              + 0.20 × Action Distribution Match
              + 0.15 × Outcome Match
              + 0.15 × Intent Sequence Similarity
              + 0.10 × Temporal Similarity
              + 0.10 × Step Count Penalty
              + 0.05 × N-gram Similarity
Sequence Similarity

25% weight — LCS algorithm comparing action sequences

Intent Sequence

15% weight — Focus on purchase-intent actions

5. Training Methodology

Pipeline Overview

1. CLUSTER Users
2. OPTIMIZE Per-Cluster
3. EVALUATE Training
4. VALIDATE Holdout

User Clustering Results

ClusterNameSizeDescription
0High Converters~7-14%High purchase rate, focused behavior
1Deep Browsers~16-23%Low conversion, extensive product viewing
2Quick Abandoners~63-77%Brief sessions, rapid exit

6. Results

Per-Cluster Results

High Converters
Baseline: 85.89%85.89%
Deep Browsers
Baseline: 58.4%75.9%+17.5%
Quick Abandoners
Baseline: 59.8%72.4%+12.6%

Key Discovery: Conscientiousness Drives Abandoner Behavior

Increasing conscientiousness to 0.65 improved Deep Browsers from 58.40% to 75.90% — a +17.50% improvement. Users who browse extensively but don't purchase aren't "random clickers" — they're deliberate researchers.

7. Conclusion

We successfully trained AI agents to achieve 73.90% behavioral similarity with real human e-commerce users, exceeding our 70% target. Key achievements:

  • LLM-Based Evaluation: Using Gemini 3 Flash for cost-effective, accurate evaluation
  • Per-Cluster Training: Recognizing that different user types need different personas
  • Conscientiousness Discovery: High conscientiousness (0.65) is key for modeling "researcher" behavior
  • Robust Metrics: Added intent sequence similarity for better outcome prediction

Ready to try it yourself?

See how digital twins can transform your user research and A/B testing.

Request Early Access