WalkXR AI & Simulation: Master Design Document

Version: 2.0 | Last Updated: 2025-06-20

PART 1: STRATEGIC & SCIENTIFIC FOUNDATIONS

1.1. Core Vision

The WalkXR AI system is an evolving, multi-agent emotional ecosystem. Its purpose is to develop a coordinated network of emotionally intelligent agents that grow alongside the user’s inner and outer journey. These agents are designed not merely to respond, but to adapt, learn, and co-create meaning within a modular, therapeutic, and scientifically-grounded orchestration system.

1.2. Development Philosophy: Experience-Centric & Iterative

Our strategy is to build complete, multi-agent "Walk Experiences" as cohesive, end-to-end units. We perfect the full user journey for one walk before generalizing the platform. This approach mitigates risk, ensures quality, and creates a library of reusable, validated AI components.

Build Foundational Systems (E01-E06): Construct the core components required for all future development, including the RAG pipeline, agent memory, fine-tuning process, and the master orchestrator.
Assemble a Full Walk Experience (E07-E09): Develop the complete cohort of specialized agents required for a single, flagship walk (e.g., 'Small Moments').
Integrate & Battle-Test (E10): Orchestrate the full agent cohort into a seamless user journey and rigorously test the end-to-end experience through simulation and adversarial 'red teaming'.
Generalize the Platform (E11-E12): Once the first walk is perfected, refactor the architecture into a reusable "Walk Factory" and build a dynamic orchestration engine.

1.3. The Four Development Tracks of WalkXR AI

Our work is organized into four highly-focused, interconnected development tracks. Together, they form the engine that builds, evaluates, and scales the WalkXR Emotional OS, ensuring every component is as advanced and customized as possible.

Track 1: EI Design & Evaluation: The Mind & Measure. This track is the source of truth for what makes an agent 'emotionally intelligent.' It translates cutting-edge psychological research into concrete engineering rubrics and builds the automated evaluation systems (LLM-as-Evaluator, etc.) to ensure our agents are not just functional, but effective, safe, and aligned with our therapeutic goals.
Track 2: Simulation & Data: The Experience Engine. This track builds the data flywheel for the entire project. It is responsible for evolving our simulation system from its current state (Google Sheets/Apps Script) into a powerful Python-based platform. This future system will use LangGraph and RL techniques to generate rich, diverse, and emotionally-tagged datasets that are the lifeblood of our agent training and evaluation efforts.
Track 3: Agents & Memory: The Core Intelligence. This track architects and implements the agentic systems themselves. We explicitly prioritize custom, stateful agent development with LangGraph to build sophisticated, multi-agent orchestrations that deliver a uniquely personalized experience. This track also owns the development of our hybrid memory architecture, combining vector-based RAG with long-term graph memory (Neo4j) to give agents true context and continuity.
Track 4: Full-Stack & Infrastructure: The Production Backbone. This track ensures our advanced AI can be delivered to users reliably and at scale. It owns the FastAPI backend, the deployment infrastructure (e.g., Modal), and the creation of internal tools and UIs (using Streamlit). This team builds the bridge between our custom agentic research and a robust, production-ready product.

1.4. Guiding Principles & Scientific Foundations

All design and technical decisions are guided by these core principles:

Principle	Description
Modularity First	Everything—walks, prompts, agents, datasets—must be modular, reusable, and composable.
Simulation-Led Development	Agent behaviors, interventions, and tone logic are always tested in simulation before deployment.
Therapeutic Guardrails	Agents never diagnose, judge, or instruct. They co-regulate, scaffold awareness, and offer reflection.
Persona-Rich Perspective	No feature is built for a generic user. Every element is tested across diverse simulated personas.
Inter-Agent Compatibility	Agents must interoperate, hand off responsibility, and reference each other’s behaviors.
Memory with Boundaries	Memory systems are bounded, interpretable, and consent-driven.
Ethical & Safe AI	We integrate principles from constructivist emotion theory, narrative psychology, and constitutional AI to ensure fairness, transparency, and well-being.

PART 2: THE WALKXR SIMULATION SYSTEM

Note: This section provides a summary. For full operational details, see the Simulation Methodology Documentation & Instructions.

2.1. Purpose & Importance

The Simulation System is our structured method for testing how diverse, psychologically rich personas experience each element of a walk. It is a live, continuous feedback and training ecosystem that uses LLMs to output structured JSON data, informing emotional pacing, walk design, and agent development.

2.2. System Components

The system is currently managed via a central Google Sheet containing:

UI: Dropdown menus to generate simulation prompts.
Databases: Structured data for Walks, Modules, and Personas.
Templates: Engineered prompt templates for different testing modes.
Execution: Apps Script integration with OpenRouter for LLM calls.
Tracking: A master sheet for all structured outputs.

2.3. Simulation Modes & Execution

We utilize four primary modes, each designed for a specific testing purpose. Mode 2 (Full Walk) and Mode 3 (All Personas) are the most effective for generating agent training data.

Mode	Purpose	Use Case
1: Single Module × Single Persona	Evaluate emotional tone or clarity	Micro-testing
2: Full Walk × Single Persona	Capture emotional arcs and pacing	Agent Persona Testbed
3: Single Module × All Personas	Identify emotional range & divergence	Diversity & Inclusion QA
4: Reflection-Only	Assess coaching or journaling prompts	Post-Walk Agent Testing

While currently manual, the system is designed for future migration to scripted (Python/LangChain) or agent-orchestrated (LangGraph) execution flows.

2.4. Data & Tagging Schema

Simulation output is stored as structured JSON/CSV and tagged across key axes, including: emotional_tone, friction_type, insight_depth, pacing_flow, and agent_opportunity. This structured data is the lifeblood of our agent development process.

PART 3: AGENTIC ARCHITECTURE & DEVELOPMENT LIFECYCLE

3.1. Agentic Philosophy

WalkXR’s agents are not general-purpose chatbots. They are specialized, context-sensitive, and emotionally-attuned interventions. Each is a modular tool designed to operate at a precise moment in a user’s journey, orchestrated to create a cooperative, supportive ecosystem.

3.2. Agent Taxonomy

We classify our agents into functional categories:

Category	Purpose & Examples
Narrative Agents	Guide the story and core experience. (e.g., `StorytellingAgent`, `ProjectiveExerciseAgent`).
Ritual Agents	Facilitate specific interactive exercises. (e.g., `MadLibsAgent`, `PersonaChoiceAgent`).
Coaching Agents	Support user reflection and processing. (e.g., `ReflectionCoachAgent`, `ScienceCoachAgent`).
Utility Agents	Provide scaffolding and support. (e.g., `ClosingAgent`, `EmotionalReadinessAgent`).

3.3. The Agent Development Lifecycle

Each agent evolves through a rigorous, five-stage lifecycle:

Stage 0: Identification
- Trigger: A cluster of friction or opportunity tags appears in simulation data.
- Output: A one-page design brief outlining the agent's purpose, trigger conditions, and success metrics.
Stage 1: Manual Prompt Prototyping
- Process: The agent's core logic is drafted as a detailed prompt in a notebook or a tool like OpenAI's Playground.
- Output: A versioned prompt file (e.g., prompts/v0/reflection_coach.json) and 10-20 tagged example transcripts.
Stage 2: Encapsulation in BaseAgent
- Process: The validated prompt logic is encapsulated within our standard BaseAgent class in the src/walkxr_ai/agents/ directory.
- Output: A new Python module (e.g., reflection_coach.py) with proper typing, state management, and tool definitions.
Stage 3: Integration & Evaluation
- Process: The agent is added as a node to the appropriate LangGraph orchestration graph. It's tested with unit tests and end-to-end simulations.
- Output: A passing test suite and evaluation reports from LangSmith, confirming performance and behavior.
Stage 4: Deployment & Continuous Improvement
- Process: The agent is deployed. Live interaction data is collected and fed back into the simulation system to identify new friction points.
- Output: A version bump for the agent and new design briefs for future improvements or new agents.

3.4. The 'Small Moments' Agent Cohort

The first full application of our architecture is the agent cohort for the 'Small Moments' walk, orchestrated in a sequence:

ProjectiveExerciseAgent → StorytellingAgent → ScienceCoachAgent → PersonaChoiceAgent → EmotionalReadinessAgent → MadLibsAgent → ReflectionCoachAgent → ClosingAgent

PART 4: THE WALKXR EMOTIONAL OS (ORCHESTRATION & ROADMAP)

4.1. Vision for the OS

The WalkXR OS is a dynamic, learning system that moves beyond static, pre-scripted experiences. It will orchestrate agents, narrative content, and user state in real-time to create personalized, emergent journeys that are maximally effective and emotionally resonant.

4.2. Core Architectural Components

graph TD
    subgraph User
        A[User Input]
    end

    subgraph WalkXR OS
        B[Master Orchestrator] -- routes --> D[Agentic Layer]
        A -- processed by --> C[Emotional State Engine]
        C -- provides state vector to --> B
        D -- accesses --> E[Knowledge & Memory Layer]
        F[Evaluation & Learning Loop] -- improves --> D
        F -- improves --> B
    end

    subgraph Output
        G[Personalized Experience]
    end

    D --> G

Master Orchestrator (Epic E06)
- Technology: LangGraph.
- Function: The central state machine managing the application flow. Each node in the graph is an agent or a logic gate. It routes control based on the output of the Emotional State Engine and the predefined walk structure. The state it manages includes user_id, walk_id, current_step, and an agent_scratchpad for inter-agent communication.
Knowledge & Memory Layer
- RAG Pipeline (Epic E01): Implemented via a RetrievalEngine class using LlamaIndex. It connects to a ChromaDB vector store containing embeddings of walk content, scientific papers, and tagged simulation data. This provides agents with factual, grounded knowledge.
- Persistent Graph Memory (Future - Epic E04): A graph database (e.g., Neo4j) will be developed to store long-term relational data about the user's journey, such as recurring emotional patterns, key insights from past walks, and relationships between concepts. This enables true personalization and growth tracking.
Emotional State Engine (Epic E05)
- Function: This engine calculates a real-time emotional state vector for the user. It is not a single emotional label but a multi-dimensional vector that provides a nuanced understanding of the user's state.
- Inputs: User's text input, interaction history, and (in the future) tone of voice.
- Processing: Inputs are processed by a series of models and classifiers:
  1. Text-based Emotion/Tone Classifier: A fine-tuned model from the Hugging Face Hub or a powerful API (e.g., OpenAI) to classify primary emotion and tone.
  2. Vulnerability/Openness Scorer: A custom model or heuristic to score the level of self-disclosure.
  3. Friction Detector: A rule-based or model-based system to detect signs of confusion or frustration, informed by our simulation tagging schema.
- Output: A state vector (e.g., { "valence": 0.2, "arousal": 0.6, "openness": 0.8, "friction": 0.1 }) that is passed to the Master Orchestrator to influence routing decisions.
Evaluation & Continuous Learning Loop (Future - Epic E12)
- Evaluation: We will use LangSmith for tracing and debugging. For quality assessment, we will implement an LLM-as-evaluator framework to score agent responses on criteria like empathic_accuracy, relevance, and alignment_with_therapeutic_goals.
- Learning (RLAIF): High-quality, human-reviewed interactions will be used to create a preference dataset. This will be used to fine-tune a reward model, which will then guide the fine-tuning of our agents using Reinforcement Learning from AI Feedback (RLAIF), enabling them to continuously improve.

4.3. Phased Roadmap to the OS

Our development roadmap is explicitly defined by the 12-epic backlog. The three phases represent a deliberate progression from building foundational tools to deploying a complete experience, and finally to creating a scalable, self-improving platform.

Phase	Epics	Goal & Key Activities
1: Foundational Systems	E01-E06	Build the Core, Reusable Infrastructure. - E01: `RetrievalEngine`: Implement the core RAG pipeline using `LlamaIndex` and `ChromaDB`. - E02: `BaseAgent` Class: Define a standardized abstract base class for all agents. - E03: Simulation Data Pipeline: Create the workflow to ingest and embed tagged simulation logs. - E04: Persistent Memory (Prototype): Develop a simple file/JSON-based memory system. - E05: `EmotionalStateEngine` v1: Build the initial state engine with basic emotion classification. - E06: `MasterOrchestrator` v1: Implement the initial `LangGraph` state machine.
2: 'Small Moments' Walk v1.0	E07-E10	Assemble & Test the First Full Experience. - E07: Agent Cohort Development: Build the full set of specialized agents for the 'Small Moments' walk. - E08: Full Orchestration Flow: Integrate the agent cohort into `LangGraph` with all state transitions. - E09: End-to-End Simulation: Run the complete walk through hundreds of simulated scenarios. - E10: Evaluation & Red Teaming: Conduct rigorous testing with `LangSmith` and adversarial attacks.
3: The WalkXR OS Platform	E11-E12+	Generalize, Scale, and Learn. - E11: "Walk Factory" Refactor: Abstract components into a reusable framework for creating new walks. - E12: Continuous Learning Loop: Implement the full RLAIF pipeline using preference data to fine-tune a reward model. - Future: Dynamic Narrative Engine: Evolve the orchestrator to adjust walk content in real-time. - Future: Graph Memory Integration: Migrate from the prototype to the full `Neo4j` graph memory.