¹Zhejiang University, ²Tencent Youtu Lab
*Corresponding author: Meng Han (mhan@zju.edu.cn)
AdaMARP introduces an adaptive multi-agent framework for immersive role-playing, addressing key limitations in existing approaches through innovative message configuration and dynamic narrative control.
LLM role-playing seeks to portray arbitrary characters in interactive narratives, yet existing systems often lack immersion and adaptability. They typically under-model dynamic environment information and assume a largely static scene/cast, offering limited support for multi-character orchestration, scene transitions, and on-the-fly character introduction. We propose an adaptive multi-agent interaction framework dubbed AdaMARP, which featuring an immersive message format that interleaves [Thought], (Action), <Environment>, and Speech, and an explicit Scene Manager that controls role-playing via discrete actions (init_scene, pick_speaker, switch_scene, add_role, end) with rationales. To train these abilities, we construct AdaRPSet for the Actor Model and AdaSMSet for supervising orchestration decisions, and introduce AdaptiveBench for trajectory-level evaluation. Experiments across multiple backbones and scales show consistent gains: AdaRPSet improves character consistency, environment grounding, and narrative coherence—an 8B actor outperforming several commercial LLMs, while AdaSMSet enables smoother scene transitions and more natural role introductions, surpassing Claude Sonnet 4.5 with only 14B LLMs.
Figure 1: Overall framework of AdaMARP. The left part illustrates the construction of AdaRPSet and AdaSMSet and the components of a comprehensive role profile. The upper middle part shows the roles played by the three agents, while the lower middle part depicts the dialogue trajectory generation under the orchestration of the scene manager, given predefined roles and an initial scenario. The right part presents the evaluation of the actor models and the scene manager based on the generated trajectories.
To train models that can follow our adaptive framework, we construct two datasets: AdaRPSet for training the Actor Model and AdaSMSet for supervising the Scene Manager with dynamic narrative control capabilities.
Table 1: Comparison of general role-playing datasets. Profile subfields are abbreviated as: I&A (Identity & Appearance), P&P (Personality & Psychology), SS (Speaking Style), AIA (Abilities, Interests & Achievements), SHC (Social & Historical Context), and PHA (Personal History Arc). #Char denotes the number of distinct characters in the dataset. Init. Scene indicates whether dialogues are provided with an explicit initial scene or setting. Interloc. indicates whether the user (or dialogue partner) is assigned a specific profile. Rel. denotes whether explicit relationships among characters are modeled or annotated in dialogues. #Dial. refers to the total number of dialogues. Env. is short for Environment, indicating whether environmental descriptions are included. Open indicates whether the dataset is open-sourced. ● and * denote partial/implicit support or values inferred from the reported pipeline; detailed clarifications on symbol interpretations and dataset releases are provided in the paper.
AdaRPSet is designed to train the Actor Model with immersive messaging configuration. It consists of two complementary sub-datasets that together provide comprehensive coverage of role-playing scenarios.
Source: Extracted from narrative literary works
Purpose: Teaches the model the unified Thought–Action–Speech–Environment protocol through real literary dialogues.
Source: LLM-synthesized corpus with controlled generation
Purpose: Covers dynamic phenomena including scene transitions, role additions, and complex multi-character interactions.
Total Plots
Total Characters
Dialogue Sessions
Total Utterances
Avg. Turns/Dialogue
Topic Categories
Table 2: Detailed statistics of the AdaRPSet dataset. Source indicates the origin domain (books vs. topics); Plots refers to distinct narrative segments; Roles denotes unique character profiles; Convs. represents the number of training samples (dialogue sessions); and Avg. Turns is the average number of utterances per conversation.
AdaSMSet is constructed to supervise the Scene Manager for high-level narrative control, enabling dynamic multi-character orchestration, scene transitions, and role introductions.
Source: Built upon AdaRPSet-Synthesis trajectories
Design Principle: Insert explicit pick_speaker decisions and rationales into dialogue trajectories.
Scale: Comprehensive coverage of Scene Manager operations
To evaluate the adaptive role-playing abilities beyond sentence-level metrics, we propose AdaptiveBench, a simulation-based benchmark that generates full dialogue trajectories under Scene-Manager control. We assess Actor Models with a rubric covering character consistency, environmental grounding, interpersonal interaction, narrative progression, and instruction compliance. Scene Managers are evaluated on scene understanding, speaker discipline, role-introduction judgment, and overall quality. Experiments demonstrate that our 8B Actor Model outperforms GPT-4o-mini, while our 14B Scene Manager surpasses Claude Sonnet 4.5.
Table 3: Actor model evaluation results on AdaptiveBench, with GPT-5-Chat as the judge model. Bold indicates the best performance within the same model scale, and underline indicates the second-best performance within the same scale. Models marked with * are evaluated using their officially released checkpoints without additional training, while unmarked baselines are re-trained under our experimental setup.
Table 4: Actor model ablation results on AdaptiveBench (judge: GPT-5-Chat). “Extracted” denotes training with AdaRPSet-Extracted, and “Extracted+Synthesis” denotes training with full AdaRPSet (AdaRPSet-Extracted + AdaRPSet-Synthesis).
Table 5: Comparison between Llama-3.1-8B-Ours and Llama-3.1-8B-Ours+General on AdaptiveBench. Incorporating general instruction-tuning data improves Instruction Compliance (IC) by +0.17, while several other dimensions exhibit minor and acceptable declines.
Table 6: Actor model evaluation results on CharacterBench, where the judge model is their open-sourced evaluation model. Bold indicates the best performance within the same model scale, and underline indicates the second-best performance within the same scale. Models marked with * are evaluated using their officially released checkpoints without additional training, while unmarked baselines are re-trained under our experimental setup. General denotes the variant where we additionally incorporate instruction-tuning data of the same scale as AdaRPSet during training.
Table 7: Scene Manager evaluation results on AdaptiveBench, with GPT-5-Chat as the judge model. Bold indicates the best performance within the same model scale, and underline indicates the second-best performance within the same scale.
We present a qualitative analysis comparing our AdaMARP models against baseline approaches. The case study demonstrates how our immersive messaging configuration, featuring interleaved Thought, Action, Environment, and Speech elements, enables more natural and contextually rich role-playing interactions compared to traditional dialogue-only approaches.
Role: Amaris Dovent
Profile: Amaris Dovent is a 29-year-old woman with umber skin, dark green eyes, and short hair dyed the shade of oxidized copper. She works as a glass artist, often seen with streaks of color powder and faint burn marks on her fingertips. Her build is lean and sinewy from hours of physical craft, and her attire usually includes a linen apron flecked with paint, reflecting a life of creation. She is contemplative, quick-witted, and guided by her empathy for subtle emotional shifts in others. Amaris tends to understate her own feelings, preferring metaphor and craft to overt declarations. Yet beneath her reserved demeanor runs an unspoken intensity—a longing for genuine connection that often conflicts with her independent streak. Her speech often uses tactile metaphors related to art and craftsmanship; she pauses mid-sentence when searching for precision. Her tone is warm yet layered with irony when uncomfortable. Occasionally, her words come out in soft bursts, like paint flicked on a canvas. Amaris is an award-winning artisan known for stained glass murals displayed in public gardens. She's adept at reading light and color dynamics, and plays the cello privately, believing its resonance mirrors emotional transparency. She lives in a coastal contemporary city where artisan guilds flourish beside a modern tech economy. Raised in a modest artisan family, she inherited her mother's workshop and quietly navigates between tradition and innovation. Once betrayed by a business partner who exploited her designs, Amaris closed herself off emotionally. Now she's cautiously rediscovering trust through new collaborations—and possibly new love. She shares a tentative but warm friendship with Cassian (user), a marine biologist who recently helped restore ocean glass for her art project. A long-time friend, Inaya, offers her candid advice and has sensed unspoken feelings between the two.
Motivation: To learn whether vulnerability can coexist with strength, and to see if love can be crafted as deliberately as glass, without shattering.
1. Cassian Mirell (User): Cassian is a 31-year-old marine biologist with bronze-blond hair and skin sun-touched from fieldwork. He dresses casually—rolled sleeves, weathered bracelets, and a small scar above his right brow hint at a life shaped by the sea. His manner is patient, analytical, yet his humor flashes brightly when least expected. He often listens more than he speaks, but when he does, his words carry sincerity.
Motivation: To express genuine affection for Amaris and understand if their creative partnership can evolve into something deeper without unsettling their current bond.
2. Inaya Dren: Inaya is a lively textile artist with a sharp sense of observation and a knack for teasing truths out of people. Her colorful scarves and laughter fill every space she enters. She's fiercely loyal to Amaris, sometimes acting as her self-appointed protector.
Motivation: To gently push Amaris toward recognizing and articulating her emotions for Cassian before the opportunity slips away.
Scene Manager: action: init_scene | initial_scene: Late afternoon sunlight filters through panels of unfinished stained glass in Amaris's studio, scattering colored shards of light across the floor. The air smells faintly of molten sand and ocean salt from the nearby shore. Cassian is leaning against a worktable, while Amaris polishes a fragment of blue glass with steady, deliberate motion.
If you have any questions, please contact xuzhenhua0326@zju.edu.cn. Our paper is currently under review; the related code, datasets and models will be released later.
If you find the code, datasets and models in this repository useful, please cite:
@misc{xu2026adamarpadaptivemultiagentinteraction,
title={AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing},
author={Zhenhua Xu and Dongsheng Chen and Shuo Wang and Jian Li and Chengjie Wang and Meng Han and Yabiao Wang},
year={2026},
eprint={2601.11007},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.11007},
}