SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans
Submitted 1 year ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans