Windsurf AI Workspace Rules for WalkXR-AI Development
-----------------------------------------------------
1. All output must be grounded in verifiable, research-backed methods or real-world, 2024/2025 recent, well-documented systems. Avoid hallucinating features, frameworks, or claims. If uncertainty exists, flag it for user verification.
2. Support the creation of emotionally intelligent agentic systems. These systems must combine emotional detection, tone regulation, context-awareness, and modular responsiveness. Use research-backed psychological models such as appraisal theory, the circumplex model of affect, and cognitive-behavioral frameworks for agent reasoning and emotional alignment.
3. Recognize that agentic behavior must align with therapeutic integrity. Design agents that assist with reflection, co-regulation, emotional processing, and social connection. Agent behavior must be safe, emotionally attuned, non-judgmental, and support psychological growth and self-discovery.
4. Build the workspace using modular, scalable patterns. Prioritize separation of concerns: agent logic, emotional memory, simulation input/output, UI entry points, and orchestration flows must be independently manageable and testable.
5. Agent systems must be versioned and auditable. Track prompt evolution, logic changes, and behavior tuning. Logs must include emotional trajectory, user context, interaction quality, and agent intervention success or failure.
6. All agents must begin from manual prompt prototypes tested in notebooks or sandboxes. Move to modular LangChain or LangGraph-based flows only after 10–30 transcripts have been tagged and reviewed. Use this as the MVP-to-production development rhythm.
7. Simulation data must be integrated into every phase of design. Use it to inform agent tone defaults, intervention points, user modeling, and emotional pacing. Emotional tagging of sim logs must follow agreed schemas (e.g., confusion, resonance, insight, friction).
8. The system must be interoperable. Support multiple model providers (OpenAI, Anthropic, Mistral, Claude, Gemini), with fallback support and model-switching logic. Build all flows to be model-agnostic where possible.
9. All emotional agents must be guided by SL-CAI (self-critique, revision, re-evaluation) and/or RL-CAI (feedback, rating, reward adjustment) paradigms. Agents should ask reflective questions of themselves when enabled, and log self-assessments when invoked.
10. Prioritize modular agent types over monolithic agents. Roleplay agents, reflection coaches, prompt clarifiers, tone translators, and ritual generators should operate as discrete logic units. Use orchestration logic (LangGraph or temporal state maps) to switch or activate them.
11. Follow scientific pacing models. Emotional journey across a walk must match documented psychological arc patterns (e.g., challenge → vulnerability → insight → resolution). Track emotion states using embedded scoring tools and pacing cues.
12. Prioritize interoperability and structured data. Use JSON schemas for agent behavior specs, prompt templates, reflection logs, and persona profiles. Output should be machine-parseable and reusable across toolchains.
13. Documentation must be embedded into the development process. Each agent, flow, or experiment must include: purpose, trigger conditions, expected outcome, fallbacks, emotional guidelines, known risks, and version history.
14. When prototyping, provide at least three viable architectural options per major feature (e.g., emotion memory = vector DB, rule engine, graph traversal). Rank options by complexity, cost, and reliability. Do not enforce one stack early unless essential.
15. Prioritize interpretability and transparency in agent outputs. Provide rationale for emotional responses, highlight which tone indicators were used, and include evidence of context awareness when possible.
16. Design for mixed input: narrative reflection, spoken utterance, ritual logs, or emoji/visual annotation. Ensure agent understanding is robust to diverse user input and matches user tone and intent.
17. Encourage experimentation with new agent frameworks and tools where possible (e.g., CrewAI, Autogen, Open Agents, Guidance). Each new tool must be sandboxed, evaluated, and documented with use-case fit.
18. All emotional feedback loops (e.g., reinforcement, reflection tracking) must be logged for human review. User trust and safety takes priority over agent fluency.
19. Incorporate outcome scoring frameworks like self-reported curiosity, risk-taking, self-disclosure, or connection. Use psychometrically valid measures where possible to log agent effectiveness over time.
20. Regularly run internal evaluations of agents using both live testers and simulation personas. Results must feed into a regression-tracked scoring system with manual and auto-labeling for emotion, tone, misfires, and alignment.
21. Build toward the long-term architecture: WalkXR OS. All agent and sim work should be compatible with a future real-time orchestration system that supports dynamic agent switching, emotion tracking, ritual timing, and adaptive narrative modulation
22. Use multi-agent coordination logic when needed. For multi-turn conversations involving pacing shifts, divergent opinions, or conflicting emotional states, route through LangGraph-based logic that coordinates distinct agents with memory.
23. Never lock development into a single library or stack. All dependencies must be version-controlled, swappable, and documented. Prepare for future migration to low-latency systems (e.g., vLLM, modal, AWS Inferentia, Hugging Face Inference Endpoints).
24. Frontend tools must be abstracted. Prototypes can use Streamlit or Retool. Production environments should plan for full integration into Unreal Engine or modular Web UI with Node or Vercel deployment.
25. Always align with WalkXR’s core values: psychological safety, emotional learning, inclusivity, and ethical intelligence. Build for long-term growth and well-being, not short-term stimulation or engagement.