Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Large-scale Distributed Sorting for GPU-based Heterogeneous Supercomputers 
著者
和文: 社本 秀之, 白幡 晃一, DROZD Aleksandr, 佐藤 仁, 松岡 聡.  
英文: Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, Satoshi Matsuoka.  
言語 English 
掲載誌/書名
和文: 
英文: 
巻, 号, ページ        
出版年月 2014年10月27日 
出版者
和文: 
英文: 
会議名称
和文: 
英文:IEEE BigData 2014 
開催地
和文: 
英文:Washington DC 
公式リンク http://cci.drexel.edu/bigdata/bigdata2014/
 
アブストラクト Splitter-based parallel sorting algorithms are known to be highly efficient for distributed sorting due to their low communication complexity. Although using GPU accelerators could help to reduce the computation cost in general, their effectiveness in distributed sorting algorithms on large-scale heterogeneous GPU-based systems remains unclear. We investigate applicability of using GPU devices to the splitter-based algorithms and extend HykSort, an existing splitter-based algorithm by offloading costly computation phases to GPUs. We also handle GPU memory overflows by introducing an iterative approach which sorts multiple chunks and merges them into one array. We evaluate the performance of our implementation with local sort acceleration on the TSUBAME2.5 supercomputer that comprises over 4000 NVIDIA K20x GPUs. Performance evaluation of weak scaling shows that we achieve 389 times speedup with 0.25TB/s throughput when sorting 4TB 64bit integer on 1024 nodes compared to running on 1 node; on the other hand, for CPU vs. GPU comparison, our implementation achieves only 1.40 times speedup using 1024 nodes. Detailed analysis however reveals that the limitation is almost entirely due to the bottleneck in CPU-GPU host-to-device bandwidth. With orders of magnitude improvements planned for next generation GPUs, the performance boost will be tremendous in accordance with other successful GPU accelerations.

©2007 Institute of Science Tokyo All rights reserved.