Efficient convolution pooling on the GPU

Suita, Shunsuke; Nishimura, Takahiro; Tokura, Hiroki; Nakano, Koji; Itou, Yasuaki; Kasagi, Akihiko; Tabaru, Tsuguchika

doi:10.1016/j.jpdc.2019.12.006

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Efficient convolution pooling on the GPU

https://hiroshima.repo.nii.ac.jp/records/2007457

名前 / ファイル	ライセンス	アクション
JPDC_138_222.pdf (126.3 KB)

Item type

デフォルトアイテムタイプ_（フル）(1)

公開日

2023-03-18

タイトル

Efficient convolution pooling on the GPU

言語

作成者

Suita, Shunsuke
Nishimura, Takahiro
Tokura, Hiroki
Nakano, Koji
Itou, Yasuaki
Kasagi, Akihiko
Tabaru, Tsuguchika

アクセス権

open access

アクセス権URI

http://purl.org/coar/access_right/c_abf2

権利情報

This is not the published version. Please cite only the published version. この論文は出版社版ではありません。引用の際には出版社版をご確認、ご利用ください。

主題

主題Scheme

Other

主題

Deep learning

主題

主題Scheme

Other

主題

Neural Networks

主題

主題Scheme

Other

主題

Convolution

主題

主題Scheme

Other

主題

Average pooling

主題

主題Scheme

Other

主題

GPU

内容記述

The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Tesla V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is expected 2.90 times and 1.43 times faster for fp32 and fp16 than the multiple convolution and then the pooling by cuDNN, respectively. the most popular library of primitives to implement the CNNs in the GPU.

言語

出版者

Elsevier

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

出版タイプ

出版タイプResource

http://purl.org/coar/version/c_b1a7d7d4d402bcce

Versions

Ver.1

2025-02-21 03:52:37.274235

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Efficient convolution pooling on the GPU

× Suita, Shunsuke

× Nishimura, Takahiro

× Tokura, Hiroki

× Nakano, Koji

× Itou, Yasuaki

× Kasagi, Akihiko

× Tabaru, Tsuguchika

Versions

Share

Cite as

エクスポート