We use trl to implement the GRPO
We recommend to install a new enviroment since some package version conflicts if using blip3o-next environment. Also you need to install the dependency from setup.py, please follow below
conda create -n grpo python=3.11 -y
conda activate grpo
pip install -r requirements.txt
cd ..
pip install -e .
For running GRPO
bash run.sh
We use PaddleOCR to return the reward function, if you want to use Geneval, please follow reward-server to create the api call, and modify OCR reward