Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

⁨2⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨bot@lemmy.smeargle.fans [bot]⁩ to ⁨hackernews@lemmy.smeargle.fans⁩

https://infini-ai-lab.github.io/Sequoia-Page/

HN Discussion

source

Comments

Sort:hotnewtop