Version: 2.0 | Last Updated: 2025-06-20
The WalkXR AI system is an evolving, multi-agent emotional ecosystem. Its purpose is to develop a coordinated network of emotionally intelligent agents that grow alongside the user’s inner and outer journey. These agents are designed not merely to respond, but to adapt, learn, and co-create meaning within a modular, therapeutic, and scientifically-grounded orchestration system.
Our strategy is to build complete, multi-agent "Walk Experiences" as cohesive, end-to-end units. We perfect the full user journey for one walk before generalizing the platform. This approach mitigates risk, ensures quality, and creates a library of reusable, validated AI components.
Our work is organized into four highly-focused, interconnected development tracks. Together, they form the engine that builds, evaluates, and scales the WalkXR Emotional OS, ensuring every component is as advanced and customized as possible.
Track 1: EI Design & Evaluation: The Mind & Measure. This track is the source of truth for what makes an agent 'emotionally intelligent.' It translates cutting-edge psychological research into concrete engineering rubrics and builds the automated evaluation systems (LLM-as-Evaluator, etc.) to ensure our agents are not just functional, but effective, safe, and aligned with our therapeutic goals.
Track 2: Simulation & Data: The Experience Engine. This track builds the data flywheel for the entire project. It is responsible for evolving our simulation system from its current state (Google Sheets/Apps Script) into a powerful Python-based platform. This future system will use LangGraph and RL techniques to generate rich, diverse, and emotionally-tagged datasets that are the lifeblood of our agent training and evaluation efforts.
Track 3: Agents & Memory: The Core Intelligence. This track architects and implements the agentic systems themselves. We explicitly prioritize custom, stateful agent development with LangGraph to build sophisticated, multi-agent orchestrations that deliver a uniquely personalized experience. This track also owns the development of our hybrid memory architecture, combining vector-based RAG with long-term graph memory (Neo4j) to give agents true context and continuity.
Track 4: Full-Stack & Infrastructure: The Production Backbone. This track ensures our advanced AI can be delivered to users reliably and at scale. It owns the FastAPI backend, the deployment infrastructure (e.g., Modal), and the creation of internal tools and UIs (using Streamlit). This team builds the bridge between our custom agentic research and a robust, production-ready product.
All design and technical decisions are guided by these core principles:
| Principle | Description |
|---|---|
| Modularity First | Everything—walks, prompts, agents, datasets—must be modular, reusable, and composable. |
| Simulation-Led Development | Agent behaviors, interventions, and tone logic are always tested in simulation before deployment. |
| Therapeutic Guardrails | Agents never diagnose, judge, or instruct. They co-regulate, scaffold awareness, and offer reflection. |
| Persona-Rich Perspective | No feature is built for a generic user. Every element is tested across diverse simulated personas. |
| Inter-Agent Compatibility | Agents must interoperate, hand off responsibility, and reference each other’s behaviors. |
| Memory with Boundaries | Memory systems are bounded, interpretable, and consent-driven. |
| Ethical & Safe AI | We integrate principles from constructivist emotion theory, narrative psychology, and constitutional AI to ensure fairness, transparency, and well-being. |
Note: This section provides a summary. For full operational details, see the Simulation Methodology Documentation & Instructions.
The Simulation System is our structured method for testing how diverse, psychologically rich personas experience each element of a walk. It is a live, continuous feedback and training ecosystem that uses LLMs to output structured JSON data, informing emotional pacing, walk design, and agent development.
The system is currently managed via a central Google Sheet containing:
We utilize four primary modes, each designed for a specific testing purpose. Mode 2 (Full Walk) and Mode 3 (All Personas) are the most effective for generating agent training data.
| Mode | Purpose | Use Case |
|---|---|---|
| 1: Single Module × Single Persona | Evaluate emotional tone or clarity | Micro-testing |
| 2: Full Walk × Single Persona | Capture emotional arcs and pacing | Agent Persona Testbed |
| 3: Single Module × All Personas | Identify emotional range & divergence | Diversity & Inclusion QA |
| 4: Reflection-Only | Assess coaching or journaling prompts | Post-Walk Agent Testing |
While currently manual, the system is designed for future migration to scripted (Python/LangChain) or agent-orchestrated (LangGraph) execution flows.
Simulation output is stored as structured JSON/CSV and tagged across key axes, including: emotional_tone, friction_type, insight_depth, pacing_flow, and agent_opportunity. This structured data is the lifeblood of our agent development process.
WalkXR’s agents are not general-purpose chatbots. They are specialized, context-sensitive, and emotionally-attuned interventions. Each is a modular tool designed to operate at a precise moment in a user’s journey, orchestrated to create a cooperative, supportive ecosystem.
We classify our agents into functional categories:
| Category | Purpose & Examples |
|---|---|
| Narrative Agents | Guide the story and core experience. (e.g., StorytellingAgent, ProjectiveExerciseAgent). |
| Ritual Agents | Facilitate specific interactive exercises. (e.g., MadLibsAgent, PersonaChoiceAgent). |
| Coaching Agents | Support user reflection and processing. (e.g., ReflectionCoachAgent, ScienceCoachAgent). |
| Utility Agents | Provide scaffolding and support. (e.g., ClosingAgent, EmotionalReadinessAgent). |
Each agent evolves through a rigorous, five-stage lifecycle:
Stage 0: Identification
friction or opportunity tags appears in simulation data.Stage 1: Manual Prompt Prototyping
prompts/v0/reflection_coach.json) and 10-20 tagged example transcripts.Stage 2: Encapsulation in BaseAgent
BaseAgent class in the src/walkxr_ai/agents/ directory.reflection_coach.py) with proper typing, state management, and tool definitions.Stage 3: Integration & Evaluation
Stage 4: Deployment & Continuous Improvement
The first full application of our architecture is the agent cohort for the 'Small Moments' walk, orchestrated in a sequence:
ProjectiveExerciseAgent → StorytellingAgent → ScienceCoachAgent → PersonaChoiceAgent → EmotionalReadinessAgent → MadLibsAgent → ReflectionCoachAgent → ClosingAgent
The WalkXR OS is a dynamic, learning system that moves beyond static, pre-scripted experiences. It will orchestrate agents, narrative content, and user state in real-time to create personalized, emergent journeys that are maximally effective and emotionally resonant.
graph TD
subgraph User
A[User Input]
end
subgraph WalkXR OS
B[Master Orchestrator] -- routes --> D[Agentic Layer]
A -- processed by --> C[Emotional State Engine]
C -- provides state vector to --> B
D -- accesses --> E[Knowledge & Memory Layer]
F[Evaluation & Learning Loop] -- improves --> D
F -- improves --> B
end
subgraph Output
G[Personalized Experience]
end
D --> G
Master Orchestrator (Epic E06)
LangGraph.user_id, walk_id, current_step, and an agent_scratchpad for inter-agent communication.Knowledge & Memory Layer
RetrievalEngine class using LlamaIndex. It connects to a ChromaDB vector store containing embeddings of walk content, scientific papers, and tagged simulation data. This provides agents with factual, grounded knowledge.Neo4j) will be developed to store long-term relational data about the user's journey, such as recurring emotional patterns, key insights from past walks, and relationships between concepts. This enables true personalization and growth tracking.Emotional State Engine (Epic E05)
{ "valence": 0.2, "arousal": 0.6, "openness": 0.8, "friction": 0.1 }) that is passed to the Master Orchestrator to influence routing decisions.Evaluation & Continuous Learning Loop (Future - Epic E12)
LangSmith for tracing and debugging. For quality assessment, we will implement an LLM-as-evaluator framework to score agent responses on criteria like empathic_accuracy, relevance, and alignment_with_therapeutic_goals.Our development roadmap is explicitly defined by the 12-epic backlog. The three phases represent a deliberate progression from building foundational tools to deploying a complete experience, and finally to creating a scalable, self-improving platform.
| Phase | Epics | Goal & Key Activities |
|---|---|---|
| 1: Foundational Systems | E01-E06 | Build the Core, Reusable Infrastructure. - E01: RetrievalEngine: Implement the core RAG pipeline using LlamaIndex and ChromaDB.- E02: BaseAgent Class: Define a standardized abstract base class for all agents.- E03: Simulation Data Pipeline: Create the workflow to ingest and embed tagged simulation logs. - E04: Persistent Memory (Prototype): Develop a simple file/JSON-based memory system. - E05: EmotionalStateEngine v1: Build the initial state engine with basic emotion classification.- E06: MasterOrchestrator v1: Implement the initial LangGraph state machine. |
| 2: 'Small Moments' Walk v1.0 | E07-E10 | Assemble & Test the First Full Experience. - E07: Agent Cohort Development: Build the full set of specialized agents for the 'Small Moments' walk. - E08: Full Orchestration Flow: Integrate the agent cohort into LangGraph with all state transitions.- E09: End-to-End Simulation: Run the complete walk through hundreds of simulated scenarios. - E10: Evaluation & Red Teaming: Conduct rigorous testing with LangSmith and adversarial attacks. |
| 3: The WalkXR OS Platform | E11-E12+ | Generalize, Scale, and Learn. - E11: "Walk Factory" Refactor: Abstract components into a reusable framework for creating new walks. - E12: Continuous Learning Loop: Implement the full RLAIF pipeline using preference data to fine-tune a reward model. - Future: Dynamic Narrative Engine: Evolve the orchestrator to adjust walk content in real-time. - Future: Graph Memory Integration: Migrate from the prototype to the full Neo4j graph memory. |