Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning
A. Awan, K. Hamidouche, A. Venkatesh, D. Panda
The 23rd European MPI Users' Group Meeting (EuroMPI 16),
Sep 2016.