# Adaptive Game-Theoretic Agent for Repeated Rock–Paper–Scissors

This repository implements an adaptive agent for Repeated Rock–Paper–Scissors (RRPS) that is designed to perform robustly against a diverse population of adversarial opponents under strict runtime constraints.

The agent combines history-based prediction, game-theoretic reasoning, and online learning to dynamically adapt its strategy without any offline training. In population-based evaluation against a benchmark set of adversarial bots, the agent achieved top-10 performance against 40+ opponents.

## Problem Setting

Repeated Rock–Paper–Scissors is a simple game with rich strategic structure when played repeatedly against adaptive opponents. Different adversaries exhibit different behaviors:

- Some follow simple, fixed patterns

- Others adapt strategically to exploit predictable opponents

- Some play stochastically to avoid exploitation

A strong agent must therefore balance exploitation, adaptation, and defensive randomization, all while operating under tight per-move runtime constraints.

## Approach

The agent combines three complementary components, selected online using a multi-armed bandit.

### 1. Variable-Order Markov Prediction

The agent maintains a variable-order Markov model over the joint action history. Instead of using a fixed-order model, it:

- Searches for the longest matching history suffix seen before

- Estimates the opponent’s next move based on empirical frequencies

- Uses a maximum history length of 6 to balance expressiveness and efficiency

This allows the agent to quickly exploit opponents with repetitive or structured behavior.

### 2. Iterative Best-Response Reasoning

A simple best response to a predicted opponent move (level-0 reasoning) is often exploitable by adaptive opponents. To address this, the agent implements iterative best-response reasoning:

- It assumes the opponent may be modeling the agent’s behavior

- It computes best responses to the opponent’s predicted best response

- This process is repeated for multiple reasoning levels

Each reasoning depth corresponds to a distinct strategy, enabling the agent to operate effectively against opponents with different levels of adaptivity.

### 3. Random (Nash) Strategy Fallback

When all structured strategies perform poorly, the agent switches to a uniform random strategy, corresponding to the Nash equilibrium of Rock–Paper–Scissors.

This prevents further exploitation and ensures bounded losses against adversaries specifically designed to exploit deterministic or overly adaptive behavior.

### 4. Online Strategy Selection via Multi-Armed Bandit

All strategies (random + multiple best-response depths) are combined using a multi-armed bandit:

- Each strategy maintains a weight reflecting recent performance

- Weights are updated after every round, with decay to adapt to non-stationary opponents

- Strategies that would have won are rewarded; losing strategies are penalized

- The agent selects the strategy with the highest current weight

This allows the agent to dynamically adapt its reasoning depth and behavior to the current opponent.

## Results

Achieved a top-10 ranking in population-based evaluation against 40+ adversarial bots

Operates under strict per-move runtime constraints with no offline training

Demonstrates robust online adaptation: exploits simple patterned opponents, remains competitive against adaptive agents, and avoids catastrophic exploitation via random (Nash) fallback

## Evaluation Framework

Evaluation is performed using the OpenSpiel implementation of the population-based RRPS benchmark, which measures performance and robustness against a fixed population of hand-crafted adversarial bots.

## References

- Marc Lanctot et al., Population-based Evaluation in Repeated Rock–Paper–Scissors as a Benchmark for Multiagent Reinforcement Learning, Transactions on Machine Learning Research (TMLR), 2023

- OpenSpiel: A Framework for Reinforcement Learning in Games, DeepMind