# BLIP3o-NEXT (GRPO) We use [trl](https://github.com/huggingface/trl) to implement the GRPO We recommend to install a new enviroment since some package version conflicts if using blip3o-next environment. Also you need to install the dependency from [setup.py](https://github.com/JiuhaiChen/BLIP3o/blob/BLIP3o-NEXT/setup.py), please follow below ```Shell conda create -n grpo python=3.11 -y conda activate grpo pip install -r requirements.txt cd .. pip install -e . ``` For running GRPO ```Shell bash run.sh ``` We use [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) to return the reward function, if you want to use Geneval, please follow [reward-server](https://github.com/yifan123/reward-server) to create the api call, and modify [OCR reward](https://github.com/JiuhaiChen/BLIP3o/blob/BLIP3o-NEXT/trl/trl/trainer/grpo_trainer.py#L1331)