How has DeepSeek improved the Transformer architecture?

(epoch.ai)

252 points | by superasn 3 days ago ago

71 comments