Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference

(cerebras.ai)

420 points | by benchmarkist 2 days ago ago

151 comments