Skip to content

Fixed the GPU engine performance

Siyuan Wang requested to merge wangsy/wukong:gpu-async-engine-baseline into master

Fixed the GPU engine performance. The reason why mainstream GPU engine is slower than ATC version is that one kernel has two more device memory access, which I add for debugging.

Edited by Siyuan Wang

Merge request reports