16 points | by tosh 10 hours ago ago
6 comments
The lmsys had a package (flexgen) that did a lot of this similar work (swap GPU to ram to disk)
not sure if it's still being maintained
I applaud how hardcore this is. Swapping the model from disk and just keeping the KV cache on the CPU ram.
Can someone ELI5 please?
deepseek is huge with 671b parameters. they keep it in hard disk, and load it piece by piece to the ram. the innovation is that they kick out everything other than the kv cache from the ram.
Thank you :)
The lmsys had a package (flexgen) that did a lot of this similar work (swap GPU to ram to disk)
not sure if it's still being maintained
I applaud how hardcore this is. Swapping the model from disk and just keeping the KV cache on the CPU ram.
Can someone ELI5 please?
deepseek is huge with 671b parameters. they keep it in hard disk, and load it piece by piece to the ram. the innovation is that they kick out everything other than the kv cache from the ram.
Thank you :)