Comment on Should have seen it coming

<- View Parent
theunknownmuncher@lemmy.world ⁨2⁩ ⁨days⁩ ago

arxiv.org/abs/2405.20304 they invented their own reinforcement learning framework called Group Relative Policy Optimization

source
Sort:hotnewtop