# WalkXR AI & Simulation: Master Design Document

**Version: 2.0** | **Last Updated: 2025-06-20**

---

## **PART 1: STRATEGIC & SCIENTIFIC FOUNDATIONS**

### **1.1. Core Vision**

The WalkXR AI system is an evolving, **multi-agent emotional ecosystem**. Its purpose is to develop a coordinated network of emotionally intelligent agents that grow alongside the user’s inner and outer journey. These agents are designed not merely to respond, but to adapt, learn, and co-create meaning within a modular, therapeutic, and scientifically-grounded orchestration system.

### **1.2. Development Philosophy: Experience-Centric & Iterative**

Our strategy is to build complete, multi-agent **"Walk Experiences"** as cohesive, end-to-end units. We perfect the full user journey for one walk before generalizing the platform. This approach mitigates risk, ensures quality, and creates a library of reusable, validated AI components.

1.  **Build Foundational Systems (E01-E06)**: Construct the core components required for all future development, including the RAG pipeline, agent memory, fine-tuning process, and the master orchestrator.
2.  **Assemble a Full Walk Experience (E07-E09)**: Develop the complete cohort of specialized agents required for a single, flagship walk (e.g., 'Small Moments').
3.  **Integrate & Battle-Test (E10)**: Orchestrate the full agent cohort into a seamless user journey and rigorously test the end-to-end experience through simulation and adversarial 'red teaming'.
4.  **Generalize the Platform (E11-E12)**: Once the first walk is perfected, refactor the architecture into a reusable "Walk Factory" and build a dynamic orchestration engine.

### **1.3. The Four Development Tracks of WalkXR AI**

Our work is organized into four highly-focused, interconnected development tracks. Together, they form the engine that builds, evaluates, and scales the WalkXR Emotional OS, ensuring every component is as advanced and customized as possible.

1.  **Track 1: EI Design & Evaluation**: **The Mind & Measure.** This track is the source of truth for what makes an agent 'emotionally intelligent.' It translates cutting-edge psychological research into concrete engineering rubrics and builds the automated evaluation systems (LLM-as-Evaluator, etc.) to ensure our agents are not just functional, but effective, safe, and aligned with our therapeutic goals.

2.  **Track 2: Simulation & Data**: **The Experience Engine.** This track builds the data flywheel for the entire project. It is responsible for evolving our simulation system from its current state (Google Sheets/Apps Script) into a powerful Python-based platform. This future system will use `LangGraph` and RL techniques to generate rich, diverse, and emotionally-tagged datasets that are the lifeblood of our agent training and evaluation efforts.

3.  **Track 3: Agents & Memory**: **The Core Intelligence.** This track architects and implements the agentic systems themselves. We explicitly prioritize custom, stateful agent development with `LangGraph` to build sophisticated, multi-agent orchestrations that deliver a uniquely personalized experience. This track also owns the development of our hybrid memory architecture, combining vector-based RAG with long-term graph memory (`Neo4j`) to give agents true context and continuity.

4.  **Track 4: Full-Stack & Infrastructure**: **The Production Backbone.** This track ensures our advanced AI can be delivered to users reliably and at scale. It owns the `FastAPI` backend, the deployment infrastructure (e.g., `Modal`), and the creation of internal tools and UIs (using `Streamlit`). This team builds the bridge between our custom agentic research and a robust, production-ready product.

### **1.4. Guiding Principles & Scientific Foundations**

All design and technical decisions are guided by these core principles:

| Principle | Description |
| :--- | :--- |
| **Modularity First** | Everything—walks, prompts, agents, datasets—must be modular, reusable, and composable. |
| **Simulation-Led Development** | Agent behaviors, interventions, and tone logic are always tested in simulation before deployment. |
| **Therapeutic Guardrails** | Agents never diagnose, judge, or instruct. They co-regulate, scaffold awareness, and offer reflection. |
| **Persona-Rich Perspective** | No feature is built for a generic user. Every element is tested across diverse simulated personas. |
| **Inter-Agent Compatibility** | Agents must interoperate, hand off responsibility, and reference each other’s behaviors. |
| **Memory with Boundaries** | Memory systems are bounded, interpretable, and consent-driven. |
| **Ethical & Safe AI** | We integrate principles from constructivist emotion theory, narrative psychology, and constitutional AI to ensure fairness, transparency, and well-being. |

---

## **PART 2: THE WALKXR SIMULATION SYSTEM**

> **Note:** This section provides a summary. For full operational details, see the [Simulation Methodology Documentation & Instructions](https://docs.google.com/document/u/0/d/1CbwNpUoPpjZS4ban37_qKymBhGKFPA-_wdqKq6Iv9FI/edit).

### **2.1. Purpose & Importance**

The Simulation System is our structured method for testing how diverse, psychologically rich personas experience each element of a walk. It is a **live, continuous feedback and training ecosystem** that uses LLMs to output structured JSON data, informing emotional pacing, walk design, and agent development.

### **2.2. System Components**

The system is currently managed via a [central Google Sheet](https://docs.google.com/spreadsheets/u/0/d/13IkJHcrIRIHoa1SH9jwHEy_G_xo1B-Ko86AuLJSjMpE/edit) containing:

*   **UI**: Dropdown menus to generate simulation prompts.
*   **Databases**: Structured data for Walks, Modules, and Personas.
*   **Templates**: Engineered prompt templates for different testing modes.
*   **Execution**: Apps Script integration with OpenRouter for LLM calls.
*   **Tracking**: A master sheet for all structured outputs.

### **2.3. Simulation Modes & Execution**

We utilize four primary modes, each designed for a specific testing purpose. **Mode 2 (Full Walk) and Mode 3 (All Personas)** are the most effective for generating agent training data.

| Mode | Purpose | Use Case |
| :--- | :--- | :--- |
| **1: Single Module × Single Persona** | Evaluate emotional tone or clarity | Micro-testing |
| **2: Full Walk × Single Persona** | Capture emotional arcs and pacing | Agent Persona Testbed |
| **3: Single Module × All Personas** | Identify emotional range & divergence | Diversity & Inclusion QA |
| **4: Reflection-Only** | Assess coaching or journaling prompts | Post-Walk Agent Testing |

While currently manual, the system is designed for future migration to scripted (Python/LangChain) or agent-orchestrated (LangGraph) execution flows.

### **2.4. Data & Tagging Schema**

Simulation output is stored as structured JSON/CSV and tagged across key axes, including: `emotional_tone`, `friction_type`, `insight_depth`, `pacing_flow`, and `agent_opportunity`. This structured data is the lifeblood of our agent development process.

---

## **PART 3: AGENTIC ARCHITECTURE & DEVELOPMENT LIFECYCLE**

### **3.1. Agentic Philosophy**

WalkXR’s agents are not general-purpose chatbots. They are **specialized, context-sensitive, and emotionally-attuned interventions**. Each is a modular tool designed to operate at a precise moment in a user’s journey, orchestrated to create a cooperative, supportive ecosystem.

### **3.2. Agent Taxonomy**

We classify our agents into functional categories:

| Category | Purpose & Examples |
| :--- | :--- |
| **Narrative Agents** | Guide the story and core experience. (e.g., `StorytellingAgent`, `ProjectiveExerciseAgent`). |
| **Ritual Agents** | Facilitate specific interactive exercises. (e.g., `MadLibsAgent`, `PersonaChoiceAgent`). |
| **Coaching Agents** | Support user reflection and processing. (e.g., `ReflectionCoachAgent`, `ScienceCoachAgent`). |
| **Utility Agents** | Provide scaffolding and support. (e.g., `ClosingAgent`, `EmotionalReadinessAgent`). |

### **3.3. The Agent Development Lifecycle**

Each agent evolves through a rigorous, five-stage lifecycle:

1.  **Stage 0: Identification**
    *   **Trigger**: A cluster of `friction` or `opportunity` tags appears in simulation data.
    *   **Output**: A one-page design brief outlining the agent's purpose, trigger conditions, and success metrics.

2.  **Stage 1: Manual Prompt Prototyping**
    *   **Process**: The agent's core logic is drafted as a detailed prompt in a notebook or a tool like OpenAI's Playground.
    *   **Output**: A versioned prompt file (e.g., `prompts/v0/reflection_coach.json`) and 10-20 tagged example transcripts.

3.  **Stage 2: Encapsulation in `BaseAgent`**
    *   **Process**: The validated prompt logic is encapsulated within our standard `BaseAgent` class in the `src/walkxr_ai/agents/` directory.
    *   **Output**: A new Python module (e.g., `reflection_coach.py`) with proper typing, state management, and tool definitions.

4.  **Stage 3: Integration & Evaluation**
    *   **Process**: The agent is added as a node to the appropriate LangGraph orchestration graph. It's tested with unit tests and end-to-end simulations.
    *   **Output**: A passing test suite and evaluation reports from LangSmith, confirming performance and behavior.

5.  **Stage 4: Deployment & Continuous Improvement**
    *   **Process**: The agent is deployed. Live interaction data is collected and fed back into the simulation system to identify new friction points.
    *   **Output**: A version bump for the agent and new design briefs for future improvements or new agents.

### **3.4. The 'Small Moments' Agent Cohort**

The first full application of our architecture is the agent cohort for the 'Small Moments' walk, orchestrated in a sequence:

`ProjectiveExerciseAgent` → `StorytellingAgent` → `ScienceCoachAgent` → `PersonaChoiceAgent` → `EmotionalReadinessAgent` → `MadLibsAgent` → `ReflectionCoachAgent` → `ClosingAgent`

---

## **PART 4: THE WALKXR EMOTIONAL OS (ORCHESTRATION & ROADMAP)**

### 4.1. Vision for the OS

The WalkXR OS is a dynamic, learning system that moves beyond static, pre-scripted experiences. It will orchestrate agents, narrative content, and user state in real-time to create personalized, emergent journeys that are maximally effective and emotionally resonant.

### 4.2. Core Architectural Components

```mermaid
graph TD
    subgraph User
        A[User Input]
    end

    subgraph WalkXR OS
        B[Master Orchestrator] -- routes --> D[Agentic Layer]
        A -- processed by --> C[Emotional State Engine]
        C -- provides state vector to --> B
        D -- accesses --> E[Knowledge & Memory Layer]
        F[Evaluation & Learning Loop] -- improves --> D
        F -- improves --> B
    end

    subgraph Output
        G[Personalized Experience]
    end

    D --> G
```

*   **Master Orchestrator (Epic E06)**
    *   **Technology**: `LangGraph`.
    *   **Function**: The central state machine managing the application flow. Each node in the graph is an agent or a logic gate. It routes control based on the output of the Emotional State Engine and the predefined walk structure. The state it manages includes `user_id`, `walk_id`, `current_step`, and an `agent_scratchpad` for inter-agent communication.

*   **Knowledge & Memory Layer**
    *   **RAG Pipeline (Epic E01)**: Implemented via a `RetrievalEngine` class using `LlamaIndex`. It connects to a `ChromaDB` vector store containing embeddings of walk content, scientific papers, and tagged simulation data. This provides agents with factual, grounded knowledge.
    *   **Persistent Graph Memory (Future - Epic E04)**: A graph database (e.g., `Neo4j`) will be developed to store long-term relational data about the user's journey, such as recurring emotional patterns, key insights from past walks, and relationships between concepts. This enables true personalization and growth tracking.

*   **Emotional State Engine (Epic E05)**
    *   **Function**: This engine calculates a real-time emotional state vector for the user. It is not a single emotional label but a multi-dimensional vector that provides a nuanced understanding of the user's state.
    *   **Inputs**: User's text input, interaction history, and (in the future) tone of voice.
    *   **Processing**: Inputs are processed by a series of models and classifiers:
        1.  **Text-based Emotion/Tone Classifier**: A fine-tuned model from the Hugging Face Hub or a powerful API (e.g., OpenAI) to classify primary emotion and tone.
        2.  **Vulnerability/Openness Scorer**: A custom model or heuristic to score the level of self-disclosure.
        3.  **Friction Detector**: A rule-based or model-based system to detect signs of confusion or frustration, informed by our simulation tagging schema.
    *   **Output**: A state vector (e.g., `{ "valence": 0.2, "arousal": 0.6, "openness": 0.8, "friction": 0.1 }`) that is passed to the Master Orchestrator to influence routing decisions.

*   **Evaluation & Continuous Learning Loop (Future - Epic E12)**
    *   **Evaluation**: We will use `LangSmith` for tracing and debugging. For quality assessment, we will implement an LLM-as-evaluator framework to score agent responses on criteria like `empathic_accuracy`, `relevance`, and `alignment_with_therapeutic_goals`.
    *   **Learning (RLAIF)**: High-quality, human-reviewed interactions will be used to create a preference dataset. This will be used to fine-tune a reward model, which will then guide the fine-tuning of our agents using Reinforcement Learning from AI Feedback (RLAIF), enabling them to continuously improve.

### 4.3. Phased Roadmap to the OS

Our development roadmap is explicitly defined by the 12-epic backlog. The three phases represent a deliberate progression from building foundational tools to deploying a complete experience, and finally to creating a scalable, self-improving platform.

| Phase | Epics | Goal & Key Activities |
| :--- | :--- | :--- |
| **1: Foundational Systems** | E01-E06 | **Build the Core, Reusable Infrastructure.**<br/>- **E01: `RetrievalEngine`:** Implement the core RAG pipeline using `LlamaIndex` and `ChromaDB`.<br/>- **E02: `BaseAgent` Class:** Define a standardized abstract base class for all agents.<br/>- **E03: Simulation Data Pipeline:** Create the workflow to ingest and embed tagged simulation logs.<br/>- **E04: Persistent Memory (Prototype):** Develop a simple file/JSON-based memory system.<br/>- **E05: `EmotionalStateEngine` v1:** Build the initial state engine with basic emotion classification.<br/>- **E06: `MasterOrchestrator` v1:** Implement the initial `LangGraph` state machine. |
| **2: 'Small Moments' Walk v1.0** | E07-E10 | **Assemble & Test the First Full Experience.**<br/>- **E07: Agent Cohort Development:** Build the full set of specialized agents for the 'Small Moments' walk.<br/>- **E08: Full Orchestration Flow:** Integrate the agent cohort into `LangGraph` with all state transitions.<br/>- **E09: End-to-End Simulation:** Run the complete walk through hundreds of simulated scenarios.<br/>- **E10: Evaluation & Red Teaming:** Conduct rigorous testing with `LangSmith` and adversarial attacks. |
| **3: The WalkXR OS Platform** | E11-E12+ | **Generalize, Scale, and Learn.**<br/>- **E11: "Walk Factory" Refactor:** Abstract components into a reusable framework for creating new walks.<br/>- **E12: Continuous Learning Loop:** Implement the full RLAIF pipeline using preference data to fine-tune a reward model.<br/>- **Future: Dynamic Narrative Engine:** Evolve the orchestrator to adjust walk content in real-time.<br/>- **Future: Graph Memory Integration:** Migrate from the prototype to the full `Neo4j` graph memory. |