Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels

Raffaele Montella; Diana Di Luccio; Ciro Giuseppe De Vita; Gennaro Mellone; Marco Lapegna; Giuliano Laccetti; Sokol Kosta; Giulio Giunta

doi:10.1109/CCGrid54584.2022.00099

Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels

Raffaele Montella, Diana Di Luccio, Ciro Giuseppe De Vita, Gennaro Mellone, Marco Lapegna, Giuliano Laccetti, Sokol Kosta, Giulio Giunta

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

1 Citationer (Scopus)

Abstract

The use of hardware accelerators, based on code and data offloading devoted to overcoming the CPU limitations in cores, is one of the main distinctive trends in high-end computing and related applications in the last decade. However, while code offloading is convenient for performance improvement, becoming a commonly used paradigm, memory access and management are a source of bottlenecks due to the need to interact with different address spaces. In this regard, NVidia introduced the CUDA Unified Memory model to avoid explicit memory copies between the machine hosting the accelerator device and the device itself and vice-versa. This paper shows a novel design and implementation of the support to the CUDA Unified Memory in open-source GPGPU virtualization services. The performance evaluation demonstrates that the overhead due to the virtualization and remoting is acceptable considering the possibility of sharing CUDA-enabled GPUs between various and heterogeneous machines hosted at the edge, in cloud infrastructures, or as accelerator nodes in an HPC scenario. A prototype implementation of the proposed solution is available as open-source.

Originalsprog	Engelsk
Titel	Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
Redaktører	Maria Fazio, Dhabaleswar K. Panda, Radu Prodan, Valeria Cardellini, Burak Kantarci, Omer Rana, Massimo Villari
Antal sider	8
Forlag	IEEE Signal Processing Society
Publikationsdato	2022
Sider	834-841
ISBN (Trykt)	978-1-6654-9957-6
ISBN (Elektronisk)	978-1-6654-9956-9
DOI	https://doi.org/10.1109/CCGrid54584.2022.00099
Status	Udgivet - 2022
Begivenhed	22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022 - Taormina, Italien Varighed: 16 maj 2022 → 19 maj 2022

Konference

Konference	22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
Land/Område	Italien
By	Taormina
Periode	16/05/2022 → 19/05/2022

Navn	Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022

Bibliografisk note

Publisher Copyright:
© 2022 IEEE.

Adgang til dokumentet

10.1109/CCGrid54584.2022.00099

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

1 Citationer
1 Konferenceartikel i proceeding
1 Konferenceartikel i tidsskrift

CUDA virtualization and remoting for GPGPU based acceleration offloading at the edge
Mentone, A., Di Luccio, D., Landolfi, L., Kosta, S. & Montella, R., 10 nov. 2019, The 12th International Conference on Internet and Distributed Computing Systems . Montella, R., Ciaramella, A., Fortino, G., Guerrieri, A. & Liotta, A. (red.). Springer, Bind 11874. s. 414-423 10 s. (Lecture Notes in Computer Science).
Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

Åben adgang
Fil
4 Citationer (Scopus)

269 Downloads (Pure)
A virtualized software based on the NVIDIA cuFFT library for image denoising: performance analysis
Galletti, A., Marcellino, L., Montella, R., Santopietro, V. & Kosta, S., 17 sep. 2017, I: Procedia Computer Science. 113, s. 496 - 501 6 s.
Publikation: Bidrag til tidsskrift › Konferenceartikel i tidsskrift › Forskning › peer review

Åben adgang
Fil
1 Citationer (Scopus)

173 Downloads (Pure)

Citationsformater

Montella, R., Di Luccio, D., De Vita, C. G., Mellone, G., Lapegna, M., Laccetti, G., Kosta, S., & Giunta, G. (2022). Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels. I M. Fazio, D. K. Panda, R. Prodan, V. Cardellini, B. Kantarci, O. Rana, & M. Villari (red.), Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022 (s. 834-841). IEEE Signal Processing Society. https://doi.org/10.1109/CCGrid54584.2022.00099

Montella, Raffaele ; Di Luccio, Diana ; De Vita, Ciro Giuseppe et al. / Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels. Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022. red. / Maria Fazio ; Dhabaleswar K. Panda ; Radu Prodan ; Valeria Cardellini ; Burak Kantarci ; Omer Rana ; Massimo Villari. IEEE Signal Processing Society, 2022. s. 834-841 (Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022).

@inproceedings{db55a3a7441c498ea2ec33c8ce4ccd5a,

title = "Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels",

abstract = "The use of hardware accelerators, based on code and data offloading devoted to overcoming the CPU limitations in cores, is one of the main distinctive trends in high-end computing and related applications in the last decade. However, while code offloading is convenient for performance improvement, becoming a commonly used paradigm, memory access and management are a source of bottlenecks due to the need to interact with different address spaces. In this regard, NVidia introduced the CUDA Unified Memory model to avoid explicit memory copies between the machine hosting the accelerator device and the device itself and vice-versa. This paper shows a novel design and implementation of the support to the CUDA Unified Memory in open-source GPGPU virtualization services. The performance evaluation demonstrates that the overhead due to the virtualization and remoting is acceptable considering the possibility of sharing CUDA-enabled GPUs between various and heterogeneous machines hosted at the edge, in cloud infrastructures, or as accelerator nodes in an HPC scenario. A prototype implementation of the proposed solution is available as open-source.",

keywords = "CUDA, GPU, offloading, remoting, unified memory access, virtualization",

author = "Raffaele Montella and {Di Luccio}, Diana and {De Vita}, {Ciro Giuseppe} and Gennaro Mellone and Marco Lapegna and Giuliano Laccetti and Sokol Kosta and Giulio Giunta",

note = "Funding Information: This work is supported in part by the grant ADMIRE (Adaptive Multi-tier intelligent data manager for Exascale - H2020-JTI-EuroHPC-2019-1), in part by the grant MytilAI (Modelling mytilus farming with Artificial Intelligence technologies - CUP I65F21000040002), and it is conducted in the framework of the research agreement “High-Performance Computing at the Edge” between the Department of Mathematics and Applications of the University of Naples Federico II and the Department of Sciences and Technologies of the University of Naples Parthenope. Publisher Copyright: {\textcopyright} 2022 IEEE.; 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022 ; Conference date: 16-05-2022 Through 19-05-2022",

year = "2022",

doi = "10.1109/CCGrid54584.2022.00099",

language = "English",

isbn = "978-1-6654-9957-6",

series = "Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022",

pages = "834--841",

editor = "Maria Fazio and Panda, {Dhabaleswar K.} and Radu Prodan and Valeria Cardellini and Burak Kantarci and Omer Rana and Massimo Villari",

booktitle = "Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022",

publisher = "IEEE Signal Processing Society",

address = "United States",

}

Montella, R, Di Luccio, D, De Vita, CG, Mellone, G, Lapegna, M, Laccetti, G, Kosta, S & Giunta, G 2022, Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels. i M Fazio, DK Panda, R Prodan, V Cardellini, B Kantarci, O Rana & M Villari (red), Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022. IEEE Signal Processing Society, Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, s. 834-841, 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italien, 16/05/2022. https://doi.org/10.1109/CCGrid54584.2022.00099

Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels. / Montella, Raffaele; Di Luccio, Diana; De Vita, Ciro Giuseppe et al.
Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022. red. / Maria Fazio; Dhabaleswar K. Panda; Radu Prodan; Valeria Cardellini; Burak Kantarci; Omer Rana; Massimo Villari. IEEE Signal Processing Society, 2022. s. 834-841 (Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels

AU - Montella, Raffaele

AU - Di Luccio, Diana

AU - De Vita, Ciro Giuseppe

AU - Mellone, Gennaro

AU - Lapegna, Marco

AU - Laccetti, Giuliano

AU - Kosta, Sokol

AU - Giunta, Giulio

N1 - Funding Information: This work is supported in part by the grant ADMIRE (Adaptive Multi-tier intelligent data manager for Exascale - H2020-JTI-EuroHPC-2019-1), in part by the grant MytilAI (Modelling mytilus farming with Artificial Intelligence technologies - CUP I65F21000040002), and it is conducted in the framework of the research agreement “High-Performance Computing at the Edge” between the Department of Mathematics and Applications of the University of Naples Federico II and the Department of Sciences and Technologies of the University of Naples Parthenope. Publisher Copyright: © 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - The use of hardware accelerators, based on code and data offloading devoted to overcoming the CPU limitations in cores, is one of the main distinctive trends in high-end computing and related applications in the last decade. However, while code offloading is convenient for performance improvement, becoming a commonly used paradigm, memory access and management are a source of bottlenecks due to the need to interact with different address spaces. In this regard, NVidia introduced the CUDA Unified Memory model to avoid explicit memory copies between the machine hosting the accelerator device and the device itself and vice-versa. This paper shows a novel design and implementation of the support to the CUDA Unified Memory in open-source GPGPU virtualization services. The performance evaluation demonstrates that the overhead due to the virtualization and remoting is acceptable considering the possibility of sharing CUDA-enabled GPUs between various and heterogeneous machines hosted at the edge, in cloud infrastructures, or as accelerator nodes in an HPC scenario. A prototype implementation of the proposed solution is available as open-source.

AB - The use of hardware accelerators, based on code and data offloading devoted to overcoming the CPU limitations in cores, is one of the main distinctive trends in high-end computing and related applications in the last decade. However, while code offloading is convenient for performance improvement, becoming a commonly used paradigm, memory access and management are a source of bottlenecks due to the need to interact with different address spaces. In this regard, NVidia introduced the CUDA Unified Memory model to avoid explicit memory copies between the machine hosting the accelerator device and the device itself and vice-versa. This paper shows a novel design and implementation of the support to the CUDA Unified Memory in open-source GPGPU virtualization services. The performance evaluation demonstrates that the overhead due to the virtualization and remoting is acceptable considering the possibility of sharing CUDA-enabled GPUs between various and heterogeneous machines hosted at the edge, in cloud infrastructures, or as accelerator nodes in an HPC scenario. A prototype implementation of the proposed solution is available as open-source.

KW - CUDA

KW - GPU

KW - offloading

KW - remoting

KW - unified memory access

KW - virtualization

UR - http://www.scopus.com/inward/record.url?scp=85135763356&partnerID=8YFLogxK

U2 - 10.1109/CCGrid54584.2022.00099

DO - 10.1109/CCGrid54584.2022.00099

M3 - Article in proceeding

AN - SCOPUS:85135763356

SN - 978-1-6654-9957-6

T3 - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022

SP - 834

EP - 841

BT - Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022

A2 - Fazio, Maria

A2 - Panda, Dhabaleswar K.

A2 - Prodan, Radu

A2 - Cardellini, Valeria

A2 - Kantarci, Burak

A2 - Rana, Omer

A2 - Villari, Massimo

PB - IEEE Signal Processing Society

T2 - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022

Y2 - 16 May 2022 through 19 May 2022

ER -

Montella R, Di Luccio D, De Vita CG, Mellone G, Lapegna M, Laccetti G et al. Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels. I Fazio M, Panda DK, Prodan R, Cardellini V, Kantarci B, Rana O, Villari M, red., Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022. IEEE Signal Processing Society. 2022. s. 834-841. (Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022). doi: 10.1109/CCGrid54584.2022.00099

Enabling the CUDA Unified Memory model in Edge, Cloud and HPC offloaded GPU kernels

Abstract

Konference

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Publikation

CUDA virtualization and remoting for GPGPU based acceleration offloading at the edge

A virtualized software based on the NVIDIA cuFFT library for image denoising: performance analysis

Citationsformater