# WalkXR AI & Simulation: Master Design Document **Version: 2.0** | **Last Updated: 2025-06-20** --- ## **PART 1: STRATEGIC & SCIENTIFIC FOUNDATIONS** ### **1.1. Core Vision** The WalkXR AI system is an evolving, **multi-agent emotional ecosystem**. Its purpose is to develop a coordinated network of emotionally intelligent agents that grow alongside the user’s inner and outer journey. These agents are designed not merely to respond, but to adapt, learn, and co-create meaning within a modular, therapeutic, and scientifically-grounded orchestration system. ### **1.2. Development Philosophy: Experience-Centric & Iterative** Our strategy is to build complete, multi-agent **"Walk Experiences"** as cohesive, end-to-end units. We perfect the full user journey for one walk before generalizing the platform. This approach mitigates risk, ensures quality, and creates a library of reusable, validated AI components. 1. **Build Foundational Systems (E01-E06)**: Construct the core components required for all future development, including the RAG pipeline, agent memory, fine-tuning process, and the master orchestrator. 2. **Assemble a Full Walk Experience (E07-E09)**: Develop the complete cohort of specialized agents required for a single, flagship walk (e.g., 'Small Moments'). 3. **Integrate & Battle-Test (E10)**: Orchestrate the full agent cohort into a seamless user journey and rigorously test the end-to-end experience through simulation and adversarial 'red teaming'. 4. **Generalize the Platform (E11-E12)**: Once the first walk is perfected, refactor the architecture into a reusable "Walk Factory" and build a dynamic orchestration engine. ### **1.3. The Four Development Tracks of WalkXR AI** Our work is organized into four highly-focused, interconnected development tracks. Together, they form the engine that builds, evaluates, and scales the WalkXR Emotional OS, ensuring every component is as advanced and customized as possible. 1. **Track 1: EI Design & Evaluation**: **The Mind & Measure.** This track is the source of truth for what makes an agent 'emotionally intelligent.' It translates cutting-edge psychological research into concrete engineering rubrics and builds the automated evaluation systems (LLM-as-Evaluator, etc.) to ensure our agents are not just functional, but effective, safe, and aligned with our therapeutic goals. 2. **Track 2: Simulation & Data**: **The Experience Engine.** This track builds the data flywheel for the entire project. It is responsible for evolving our simulation system from its current state (Google Sheets/Apps Script) into a powerful Python-based platform. This future system will use `LangGraph` and RL techniques to generate rich, diverse, and emotionally-tagged datasets that are the lifeblood of our agent training and evaluation efforts. 3. **Track 3: Agents & Memory**: **The Core Intelligence.** This track architects and implements the agentic systems themselves. We explicitly prioritize custom, stateful agent development with `LangGraph` to build sophisticated, multi-agent orchestrations that deliver a uniquely personalized experience. This track also owns the development of our hybrid memory architecture, combining vector-based RAG with long-term graph memory (`Neo4j`) to give agents true context and continuity. 4. **Track 4: Full-Stack & Infrastructure**: **The Production Backbone.** This track ensures our advanced AI can be delivered to users reliably and at scale. It owns the `FastAPI` backend, the deployment infrastructure (e.g., `Modal`), and the creation of internal tools and UIs (using `Streamlit`). This team builds the bridge between our custom agentic research and a robust, production-ready product. ### **1.4. Guiding Principles & Scientific Foundations** All design and technical decisions are guided by these core principles: | Principle | Description | | :--- | :--- | | **Modularity First** | Everything—walks, prompts, agents, datasets—must be modular, reusable, and composable. | | **Simulation-Led Development** | Agent behaviors, interventions, and tone logic are always tested in simulation before deployment. | | **Therapeutic Guardrails** | Agents never diagnose, judge, or instruct. They co-regulate, scaffold awareness, and offer reflection. | | **Persona-Rich Perspective** | No feature is built for a generic user. Every element is tested across diverse simulated personas. | | **Inter-Agent Compatibility** | Agents must interoperate, hand off responsibility, and reference each other’s behaviors. | | **Memory with Boundaries** | Memory systems are bounded, interpretable, and consent-driven. | | **Ethical & Safe AI** | We integrate principles from constructivist emotion theory, narrative psychology, and constitutional AI to ensure fairness, transparency, and well-being. | --- ## **PART 2: THE WALKXR SIMULATION SYSTEM** > **Note:** This section provides a summary. For full operational details, see the [Simulation Methodology Documentation & Instructions](https://docs.google.com/document/u/0/d/1CbwNpUoPpjZS4ban37_qKymBhGKFPA-_wdqKq6Iv9FI/edit). ### **2.1. Purpose & Importance** The Simulation System is our structured method for testing how diverse, psychologically rich personas experience each element of a walk. It is a **live, continuous feedback and training ecosystem** that uses LLMs to output structured JSON data, informing emotional pacing, walk design, and agent development. ### **2.2. System Components** The system is currently managed via a [central Google Sheet](https://docs.google.com/spreadsheets/u/0/d/13IkJHcrIRIHoa1SH9jwHEy_G_xo1B-Ko86AuLJSjMpE/edit) containing: * **UI**: Dropdown menus to generate simulation prompts. * **Databases**: Structured data for Walks, Modules, and Personas. * **Templates**: Engineered prompt templates for different testing modes. * **Execution**: Apps Script integration with OpenRouter for LLM calls. * **Tracking**: A master sheet for all structured outputs. ### **2.3. Simulation Modes & Execution** We utilize four primary modes, each designed for a specific testing purpose. **Mode 2 (Full Walk) and Mode 3 (All Personas)** are the most effective for generating agent training data. | Mode | Purpose | Use Case | | :--- | :--- | :--- | | **1: Single Module × Single Persona** | Evaluate emotional tone or clarity | Micro-testing | | **2: Full Walk × Single Persona** | Capture emotional arcs and pacing | Agent Persona Testbed | | **3: Single Module × All Personas** | Identify emotional range & divergence | Diversity & Inclusion QA | | **4: Reflection-Only** | Assess coaching or journaling prompts | Post-Walk Agent Testing | While currently manual, the system is designed for future migration to scripted (Python/LangChain) or agent-orchestrated (LangGraph) execution flows. ### **2.4. Data & Tagging Schema** Simulation output is stored as structured JSON/CSV and tagged across key axes, including: `emotional_tone`, `friction_type`, `insight_depth`, `pacing_flow`, and `agent_opportunity`. This structured data is the lifeblood of our agent development process. --- ## **PART 3: AGENTIC ARCHITECTURE & DEVELOPMENT LIFECYCLE** ### **3.1. Agentic Philosophy** WalkXR’s agents are not general-purpose chatbots. They are **specialized, context-sensitive, and emotionally-attuned interventions**. Each is a modular tool designed to operate at a precise moment in a user’s journey, orchestrated to create a cooperative, supportive ecosystem. ### **3.2. Agent Taxonomy** We classify our agents into functional categories: | Category | Purpose & Examples | | :--- | :--- | | **Narrative Agents** | Guide the story and core experience. (e.g., `StorytellingAgent`, `ProjectiveExerciseAgent`). | | **Ritual Agents** | Facilitate specific interactive exercises. (e.g., `MadLibsAgent`, `PersonaChoiceAgent`). | | **Coaching Agents** | Support user reflection and processing. (e.g., `ReflectionCoachAgent`, `ScienceCoachAgent`). | | **Utility Agents** | Provide scaffolding and support. (e.g., `ClosingAgent`, `EmotionalReadinessAgent`). | ### **3.3. The Agent Development Lifecycle** Each agent evolves through a rigorous, five-stage lifecycle: 1. **Stage 0: Identification** * **Trigger**: A cluster of `friction` or `opportunity` tags appears in simulation data. * **Output**: A one-page design brief outlining the agent's purpose, trigger conditions, and success metrics. 2. **Stage 1: Manual Prompt Prototyping** * **Process**: The agent's core logic is drafted as a detailed prompt in a notebook or a tool like OpenAI's Playground. * **Output**: A versioned prompt file (e.g., `prompts/v0/reflection_coach.json`) and 10-20 tagged example transcripts. 3. **Stage 2: Encapsulation in `BaseAgent`** * **Process**: The validated prompt logic is encapsulated within our standard `BaseAgent` class in the `src/walkxr_ai/agents/` directory. * **Output**: A new Python module (e.g., `reflection_coach.py`) with proper typing, state management, and tool definitions. 4. **Stage 3: Integration & Evaluation** * **Process**: The agent is added as a node to the appropriate LangGraph orchestration graph. It's tested with unit tests and end-to-end simulations. * **Output**: A passing test suite and evaluation reports from LangSmith, confirming performance and behavior. 5. **Stage 4: Deployment & Continuous Improvement** * **Process**: The agent is deployed. Live interaction data is collected and fed back into the simulation system to identify new friction points. * **Output**: A version bump for the agent and new design briefs for future improvements or new agents. ### **3.4. The 'Small Moments' Agent Cohort** The first full application of our architecture is the agent cohort for the 'Small Moments' walk, orchestrated in a sequence: `ProjectiveExerciseAgent` → `StorytellingAgent` → `ScienceCoachAgent` → `PersonaChoiceAgent` → `EmotionalReadinessAgent` → `MadLibsAgent` → `ReflectionCoachAgent` → `ClosingAgent` --- ## **PART 4: THE WALKXR EMOTIONAL OS (ORCHESTRATION & ROADMAP)** ### 4.1. Vision for the OS The WalkXR OS is a dynamic, learning system that moves beyond static, pre-scripted experiences. It will orchestrate agents, narrative content, and user state in real-time to create personalized, emergent journeys that are maximally effective and emotionally resonant. ### 4.2. Core Architectural Components ```mermaid graph TD subgraph User A[User Input] end subgraph WalkXR OS B[Master Orchestrator] -- routes --> D[Agentic Layer] A -- processed by --> C[Emotional State Engine] C -- provides state vector to --> B D -- accesses --> E[Knowledge & Memory Layer] F[Evaluation & Learning Loop] -- improves --> D F -- improves --> B end subgraph Output G[Personalized Experience] end D --> G ``` * **Master Orchestrator (Epic E06)** * **Technology**: `LangGraph`. * **Function**: The central state machine managing the application flow. Each node in the graph is an agent or a logic gate. It routes control based on the output of the Emotional State Engine and the predefined walk structure. The state it manages includes `user_id`, `walk_id`, `current_step`, and an `agent_scratchpad` for inter-agent communication. * **Knowledge & Memory Layer** * **RAG Pipeline (Epic E01)**: Implemented via a `RetrievalEngine` class using `LlamaIndex`. It connects to a `ChromaDB` vector store containing embeddings of walk content, scientific papers, and tagged simulation data. This provides agents with factual, grounded knowledge. * **Persistent Graph Memory (Future - Epic E04)**: A graph database (e.g., `Neo4j`) will be developed to store long-term relational data about the user's journey, such as recurring emotional patterns, key insights from past walks, and relationships between concepts. This enables true personalization and growth tracking. * **Emotional State Engine (Epic E05)** * **Function**: This engine calculates a real-time emotional state vector for the user. It is not a single emotional label but a multi-dimensional vector that provides a nuanced understanding of the user's state. * **Inputs**: User's text input, interaction history, and (in the future) tone of voice. * **Processing**: Inputs are processed by a series of models and classifiers: 1. **Text-based Emotion/Tone Classifier**: A fine-tuned model from the Hugging Face Hub or a powerful API (e.g., OpenAI) to classify primary emotion and tone. 2. **Vulnerability/Openness Scorer**: A custom model or heuristic to score the level of self-disclosure. 3. **Friction Detector**: A rule-based or model-based system to detect signs of confusion or frustration, informed by our simulation tagging schema. * **Output**: A state vector (e.g., `{ "valence": 0.2, "arousal": 0.6, "openness": 0.8, "friction": 0.1 }`) that is passed to the Master Orchestrator to influence routing decisions. * **Evaluation & Continuous Learning Loop (Future - Epic E12)** * **Evaluation**: We will use `LangSmith` for tracing and debugging. For quality assessment, we will implement an LLM-as-evaluator framework to score agent responses on criteria like `empathic_accuracy`, `relevance`, and `alignment_with_therapeutic_goals`. * **Learning (RLAIF)**: High-quality, human-reviewed interactions will be used to create a preference dataset. This will be used to fine-tune a reward model, which will then guide the fine-tuning of our agents using Reinforcement Learning from AI Feedback (RLAIF), enabling them to continuously improve. ### 4.3. Phased Roadmap to the OS Our development roadmap is explicitly defined by the 12-epic backlog. The three phases represent a deliberate progression from building foundational tools to deploying a complete experience, and finally to creating a scalable, self-improving platform. | Phase | Epics | Goal & Key Activities | | :--- | :--- | :--- | | **1: Foundational Systems** | E01-E06 | **Build the Core, Reusable Infrastructure.**
- **E01: `RetrievalEngine`:** Implement the core RAG pipeline using `LlamaIndex` and `ChromaDB`.
- **E02: `BaseAgent` Class:** Define a standardized abstract base class for all agents.
- **E03: Simulation Data Pipeline:** Create the workflow to ingest and embed tagged simulation logs.
- **E04: Persistent Memory (Prototype):** Develop a simple file/JSON-based memory system.
- **E05: `EmotionalStateEngine` v1:** Build the initial state engine with basic emotion classification.
- **E06: `MasterOrchestrator` v1:** Implement the initial `LangGraph` state machine. | | **2: 'Small Moments' Walk v1.0** | E07-E10 | **Assemble & Test the First Full Experience.**
- **E07: Agent Cohort Development:** Build the full set of specialized agents for the 'Small Moments' walk.
- **E08: Full Orchestration Flow:** Integrate the agent cohort into `LangGraph` with all state transitions.
- **E09: End-to-End Simulation:** Run the complete walk through hundreds of simulated scenarios.
- **E10: Evaluation & Red Teaming:** Conduct rigorous testing with `LangSmith` and adversarial attacks. | | **3: The WalkXR OS Platform** | E11-E12+ | **Generalize, Scale, and Learn.**
- **E11: "Walk Factory" Refactor:** Abstract components into a reusable framework for creating new walks.
- **E12: Continuous Learning Loop:** Implement the full RLAIF pipeline using preference data to fine-tune a reward model.
- **Future: Dynamic Narrative Engine:** Evolve the orchestrator to adjust walk content in real-time.
- **Future: Graph Memory Integration:** Migrate from the prototype to the full `Neo4j` graph memory. |