# Adaptive Game-Theoretic Agent for Repeated Rock–Paper–Scissors This repository implements an adaptive agent for Repeated Rock–Paper–Scissors (RRPS) that is designed to perform robustly against a diverse population of adversarial opponents under strict runtime constraints. The agent combines history-based prediction, game-theoretic reasoning, and online learning to dynamically adapt its strategy without any offline training. In population-based evaluation against a benchmark set of adversarial bots, the agent achieved top-10 performance against 40+ opponents. ## Problem Setting Repeated Rock–Paper–Scissors is a simple game with rich strategic structure when played repeatedly against adaptive opponents. Different adversaries exhibit different behaviors: - Some follow simple, fixed patterns - Others adapt strategically to exploit predictable opponents - Some play stochastically to avoid exploitation A strong agent must therefore balance exploitation, adaptation, and defensive randomization, all while operating under tight per-move runtime constraints. ## Approach The agent combines three complementary components, selected online using a multi-armed bandit. ### 1. Variable-Order Markov Prediction The agent maintains a variable-order Markov model over the joint action history. Instead of using a fixed-order model, it: - Searches for the longest matching history suffix seen before - Estimates the opponent’s next move based on empirical frequencies - Uses a maximum history length of 6 to balance expressiveness and efficiency This allows the agent to quickly exploit opponents with repetitive or structured behavior. ### 2. Iterative Best-Response Reasoning A simple best response to a predicted opponent move (level-0 reasoning) is often exploitable by adaptive opponents. To address this, the agent implements iterative best-response reasoning: - It assumes the opponent may be modeling the agent’s behavior - It computes best responses to the opponent’s predicted best response - This process is repeated for multiple reasoning levels Each reasoning depth corresponds to a distinct strategy, enabling the agent to operate effectively against opponents with different levels of adaptivity. ### 3. Random (Nash) Strategy Fallback When all structured strategies perform poorly, the agent switches to a uniform random strategy, corresponding to the Nash equilibrium of Rock–Paper–Scissors. This prevents further exploitation and ensures bounded losses against adversaries specifically designed to exploit deterministic or overly adaptive behavior. ### 4. Online Strategy Selection via Multi-Armed Bandit All strategies (random + multiple best-response depths) are combined using a multi-armed bandit: - Each strategy maintains a weight reflecting recent performance - Weights are updated after every round, with decay to adapt to non-stationary opponents - Strategies that would have won are rewarded; losing strategies are penalized - The agent selects the strategy with the highest current weight This allows the agent to dynamically adapt its reasoning depth and behavior to the current opponent. ## Results Achieved a top-10 ranking in population-based evaluation against 40+ adversarial bots Operates under strict per-move runtime constraints with no offline training Demonstrates robust online adaptation: exploits simple patterned opponents, remains competitive against adaptive agents, and avoids catastrophic exploitation via random (Nash) fallback ## Evaluation Framework Evaluation is performed using the OpenSpiel implementation of the population-based RRPS benchmark, which measures performance and robustness against a fixed population of hand-crafted adversarial bots. ## References - Marc Lanctot et al., Population-based Evaluation in Repeated Rock–Paper–Scissors as a Benchmark for Multiagent Reinforcement Learning, Transactions on Machine Learning Research (TMLR), 2023 - OpenSpiel: A Framework for Reinforcement Learning in Games, DeepMind