Download PDFOpen PDF in browserComputing DTWs on CPU, GPU and FPGA with SYCLEasyChair Preprint 14488, version 212 pages•Date: September 13, 2024AbstractOne of the most time-consuming kernels of an epileptic seizure detection app is the computation of the Dynamic Time Warping (DTW) Distance Matrix. This kernel is a good candidate for heterogeneous CPU/GPU/FPGA execution. In this paper, we explore the design space of heterogeneous CPU, GPU, and FPGA implementations of this kernel. We start by optimizing the CPU implementation of the DTW Distance Matrix computation leveraging the latest C++26 SIMD library and compare it with the SYCL implementation for CPU that also exploits the SIMD units. Next, we take advantage of the portability of SYCL to run the code on an on-chip GPU, iGPU, as well as on a discrete NVIDIA GPU, dGPU. Finally we also present the SYCL implementation of the kernel on an Intel Stratix 10 MX FPGA. Our evaluations demonstrate that SYCL seems well suited to exploit the available SIMD capabilities of modern CPU cores, and also shows promising results for the accelerating devices considered in this work. Keyphrases: DTW, FPGA, GPU, SIMD, SYCL, energy efficiency, heterogeneous architecture
|