Item type |
デフォルトアイテムタイプ_(フル)(1) |
公開日 |
2023-03-18 |
タイトル |
|
|
タイトル |
Efficient convolution pooling on the GPU |
|
言語 |
en |
作成者 |
Suita, Shunsuke
Nishimura, Takahiro
Tokura, Hiroki
Nakano, Koji
Itou, Yasuaki
Kasagi, Akihiko
Tabaru, Tsuguchika
|
アクセス権 |
|
|
アクセス権 |
open access |
|
アクセス権URI |
http://purl.org/coar/access_right/c_abf2 |
権利情報 |
|
|
権利情報 |
© 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ |
権利情報 |
|
|
権利情報 |
This is not the published version. Please cite only the published version. この論文は出版社版ではありません。引用の際には出版社版をご確認、ご利用ください。 |
主題 |
|
|
主題Scheme |
Other |
|
主題 |
Deep learning |
主題 |
|
|
主題Scheme |
Other |
|
主題 |
Neural Networks |
主題 |
|
|
主題Scheme |
Other |
|
主題 |
Convolution |
主題 |
|
|
主題Scheme |
Other |
|
主題 |
Average pooling |
主題 |
|
|
主題Scheme |
Other |
|
主題 |
GPU |
内容記述 |
|
|
内容記述 |
The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Tesla V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is expected 2.90 times and 1.43 times faster for fp32 and fp16 than the multiple convolution and then the pooling by cuDNN, respectively. the most popular library of primitives to implement the CNNs in the GPU. |
|
言語 |
en |
出版者 |
|
|
出版者 |
Elsevier |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_6501 |
|
資源タイプ |
journal article |
出版タイプ |
|
|
出版タイプ |
AO |
|
出版タイプResource |
http://purl.org/coar/version/c_b1a7d7d4d402bcce |
関連情報 |
|
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
10.1016/j.jpdc.2019.12.006 |
関連情報 |
|
|
|
識別子タイプ |
DOI |
|
|
関連識別子 |
https://doi.org/10.1016/j.jpdc.2019.12.006 |
収録物識別子 |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
0743-7315 |
開始ページ |
|
|
開始ページ |
222 |
書誌情報 |
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
巻 138,
p. 222-229,
発行日 2020-04
|
旧ID |
50422 |
備考 |
Post-print version/PDF may be used in an institutional repository after an embargo period of 24 months. |