Thư viện trường đại học Phenikaa: Search

HOME

HOME BROWSE HELP CONTACT

Search

Author

Subject

Date issued

2023 (2)

Has File(s)

true (2)

Search Results

Results 1-2 of 2 (Search time: 0.001 seconds).

Micro-kernels for portable and efficient matrix multiplication in deep learning

Authors: Guillermo, Alaejos; Adrián, Castelló; Héctor, Martínez; Advisor: -; Co-Author: - (2023)

Our work exposes the structure of the template-based micro-kernels for ARM Neon (128-bit SIMD), ARM SVE (variable-length SIMD) and Intel AVX512 (512-bit SIMD), showing considerable performance for an NVIDIA Carmel processor (ARM Neon), a Fujitsu A64FX processor (ARM SVE) and on an AMD EPYC 7282 processor (256-bit SIMD).

Performance–energy trade-offs of deep learning convolution algorithms on ARM processors

Authors: Manuel F., Dolz; Sergio, Barrachina; Héctor, Martínez; Advisor: -; Co-Author: - (2023)

In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd con...