llm.c: multi-GPU, bfloat16, flash attention, ~7% faster than PyTorch
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans