BLIP3o-NEXT (GRPO)

We use trl to implement the GRPO

We recommend to install a new enviroment since some package version conflicts if using blip3o-next environment. Also you need to install the dependency from setup.py, please follow below

conda create -n grpo python=3.11 -y
conda activate grpo
pip install -r requirements.txt
cd ..
pip install -e .

For running GRPO

bash run.sh

We use PaddleOCR to return the reward function, if you want to use Geneval, please follow reward-server to create the api call, and modify OCR reward