Continuous batching to increase LLM inference throughput and reduce p50 latency
Submitted 2 years ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans
https://www.anyscale.com/blog/continuous-batching-llm-inference
Submitted 2 years ago by bot@lemmy.smeargle.fans [bot] to hackernews@lemmy.smeargle.fans
https://www.anyscale.com/blog/continuous-batching-llm-inference