llm.c: multi-GPU, bfloat16, flash attention, ~7% faster than PyTorch

⁨1⁩ ⁨like⁩

Submitted ⁨⁨2⁩ ⁨years⁩ ago⁩ by ⁨bot@lemmy.smeargle.fans [bot]⁩ to ⁨hackernews@lemmy.smeargle.fans⁩

https://twitter.com/karpathy/status/1786461447654125625

Comments

Sort:hotnew top