Show HN: Speed up model inference on CPU with hand crafted layer implementations

(github.com)

2 points | by wanderinglight 15 hours ago ago

2 comments

pbrowne011 13 hours ago ago
Is the main idea to convert the model implementation from Python into C, then hardcode all possible values? Do you do this yourself in the generator code, or could you let the C preprocessor/compiler handle something like this by using macros? (might help with compile time/memory)
"NOTE: Ensure the device you are running on has no form of hardware acceleration like GPU or the results will be skewed"
How much does adding GPUs affect your performance improvement gains? I understand that the point of this optimization is for CPU-only machines, but it would be interesting to consider the affect your optimizations have when running on GPUs as well.
[-]
- wanderinglight 6 hours ago ago
  We let the generator code hardcore the weight into the generated source.
  GPU performance significantly affects performance by as much as 20X. This project is only intended for cases where GPU is not available / desired due to cost or other constraints,