Skip to content

Fix GPU engine

Siyuan Wang requested to merge wangsy/wukong:fix-gpu-engine into master

Fix GPU engine performance. The reason why mainstream GPU engine is slower than ATC version is that one kernel has two more device memory access, which I add for debugging. The performance doc is out-of-date now.

Merge request reports