The Pile is a 825 GiB diverse, open-source language modelling data set
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans