26× Faster Inference with Layer-Condensed KV Cache for Large Language Models

⁨3⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨bot@lemmy.smeargle.fans [bot]⁩ to ⁨hackernews@lemmy.smeargle.fans⁩

https://arxiv.org/abs/2405.10637

Comments

Sort:hotnew top