Download PDFOpen PDF in browserA Compiler-assisted locality aware CTA Mapping Scheme12 pages•Published: March 13, 2019AbstractGeneral purpose GPU (GPGPU) is an effective many-core architecture that can yield high throughput for many scientific applications with thread-level parallelism. However, several challenges still limit further performance improvements and make GPU program- ming challenging for programmers who lack the knowledge of GPU hardware architecture. In this paper, we design a compiler-assisted locality aware CTA (cooperative thread array) mapping scheme for GPUs to take advantage of the inter CTA data reuses in the GPU kernels. Using the data reuse analysis based on the polyhedron model, we can detect inter CTA data reuse patterns in the GPU kernels and control the CTA mapping pattern to improve the data locality on each SM. The compiler-assisted locality aware CTA mapping scheme can also be combined with the programmable warp scheduler to further improve the performance. The experimental results show that our CTA mapping algorithm can improve the overall performance of the input GPU programs by 23.3% on average and by 56.7% when combined with the programmable warp scheduler.Keyphrases: compiler, cta mapping, gpgpu, polyhedron model, reuse In: Gordon Lee and Ying Jin (editors). Proceedings of 34th International Conference on Computers and Their Applications, vol 58, pages 168-179.
|