MVA-2021 / reinforcement_learning / hw4_model_selection_bandits