This page lists publications from the group related to designing High Performance MPI on InfiniBand. In addition, the group is also actively engaged in other research directions (PVFS and MPI-IO, Micro-Benchmark suite, Distributed Shared Memory, ARMCI, and Datacenter) related to modern interconnects. Publications related to these research directions are also included in the corresponding links.

Journals (32)

1 T. Tran, B. Ramesh, B. Michalowicz, M. Abduljabbar, H. Subramoni, A. Shafi, and DK Panda, Accelerating Communication with Multi-HCA Aware Collectives in MPI, Concurrency and Computation: Practice and Experience (CCPE), July 2023,
2 K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries, IEEE Micro, Jan 2023.
3 K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.
4 DK Panda, H. Subramoni, C. Chu, and M. Bayatpour, The MVAPICH project: Transforming Research into High-Performance MPI Library for HPC Community , Journal of Computational Science (JOCS), Special Issue on Translational Computer Science, Oct 2020.
5 J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures, Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008,
6 Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects, IEEE Micro, vol. 40, no. 1, pp. 35-43, 1 Jan.-Feb. 2020.,
7 A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, Effcient Design for MPI Asynchronous Progress without Dedicated Resources, Parallel Computing - Systems & Applications, Volume 85, July 2019, Pages 13-26, https://doi.org/10.1016/j.parco.2019.03.003,
8 Ammar Awan, K. Vadambacheri Manian, C. Chu, H. Subramoni, and DK Panda, Optimized Large-Message Broadcast for Deep Learning Workloads: MPI, MPI+NCCL, or NCCL2?, Volume 85, July 2019, Pages 141-152, https://doi.org/10.1016/j.parco.2019.03.005,
9 S. Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, DK Panda, Martin Schulz, and H. Subramoni, EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications, Concurrency and Computation: Practice and Experience, 14 August 2018, https://doi.org/10.1002/cpe.4863,
10 S. Ramesh, A. Mahéo, S. Shende, A. Malony, H. Subramoni, A. Ruhela, and DK Panda, MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU, ISSN 0167-8191, Volume 77, Sep 2018.
11 H. Wang, S. Potluri, D. Bureddy, and DK Panda, GPU-Aware MPI on RDMA-Enabled Cluster: Design, Implementation and Evaluation, IEEE Transactions on Parallel & Distributed Systems, Vol. 25, No. 10, pp. 2595-2605, Oct 2014.
12 S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and DK Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters IEEE Computer, Nov 2011.
13 P. Lai, P. Balaji, R. Thakur, and DK Panda, ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures Computer Science: Research and Development, Special Issue of Scientific Papers from ISC '09, Jun 2009.
14 A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and DK Panda, Topology Agnostic Hot-Spot Avoidance with InfiniBand Concurrency and Computation: Practice and Experience, Special Issue of Best Papers from CCGrid '07, Jan 2008.
15 H. Jin, P. Balaji, C. Yoo, J. -Y. Choi, and DK Panda, Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks OSU-CISRC-5/04-TR37, Nov 2005.
16 J. Liu, A. Mamidala, A. Vishnu, and DK Panda, Performance Evaluation of InfiniBand with PCI Express, IEEE Micro, Jan 2005.
17 J. Liu, J. Wu, and DK Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, Int'l Journal of Parallel Programming: Volume 32, Number 3, Jun 2004.
18 J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and DK Panda, Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects IEEE Micro, Jan 2004.
19 A. Wagner, D. Buntinas, R. Brightwell, and DK Panda, Application-Bypass Reduction for Large-Scale Clusters. Int'l Journal of High Performance Computing and Networking Internationall Journal of High Performance Computing and Networking, Cluster 2003 Special Issue. In Press, Dec 2003.
20 R. Sivaram, C. Stunkel, and DK Panda, HIPIQS: A High-Performance Switch Architecture using Input Queuing IEEE Transactions on Parallel and Distributed Systems. Vol. 13, No. 3, pp. 275-289, Mar 2002.
21 M. Banikazemi, B. Abali, L. Herger, and DK Panda, Design Alternatives for Virtual Interface Architecture (VIA) and an Implementation on IBM Netfinity NT Cluster Journal of Parallel and Distributed Computing, Special Issue on Clusters, Volume 61, Number 11, pp. 1512-1545, Nov 2001.
22 M. Banikazemi, R. K. Govindaraju, R. Blackmore, and DK Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 10, pp. 1081-1093, Oct 2001.
23 B. Abali, C. B. Stunkel, J. Herring, M. Banikazemi, DK Panda, C. Aykanat, and Y. Aydogan, Adaptive Routing on the New Switch Chip for IBM SP Systems Journal of Parallel and Distributed Computing, Special Issue on Routing in Computer and Communication Networks, Volume 61, Number 9, pp. 1148-1179, Sep 2001.
24 R. Kesavan, and DK Panda, Efficient Multicast on Irregular Switch-based Cut-Through Networks with Up-Down Routing IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 8, pp. 808-828, Aug 2001.
25 R. Sivaram, R. Kesavan, DK Panda, and C. Stunkel Architectural Support for Efficient Multicasting in Irregular Networks, Architectural Support for Efficient Multicasting in Irregular Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 5, pp. 489-513, May 2001.
26 R. Sivaram, C. Stunkel, and DK Panda, Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 8, pp. 794-812, Aug 2000.
27 R. Kesavan, and DK Panda, Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, pp. 371-393, Apr 1999.
28 D. Dai, and DK Panda, Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation IEEE Transactions on Computers, Special Issue on Cache Memory, Vol. 48, No. 2, pp. 236-244, Feb 1999.
29 R. Prakash, and DK Panda, Designing Communication Strategies for Heterogeneous Parallel Systems, Parallel Computing, Volume 24, pp. 2035-2052, Dec 1998.
30 R. Sivaram, DK Panda, and C. B. Stunkel, Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 10, pp. 1004-1028, Oct 1998.
31 D. Basak, and DK Panda, Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 9, pp. 962-978, Sep 1996.
32 Srinivasan Ramesh, Aurele Maheo, Sameer Shende, Allen Malony, H. Subramoni, and DK Panda, MPI Performance Engineering with the MPI Tool Interface: the Integration of MVAPICH and TAU, Sep 2018.

Book Chapter (2)

1 X. Lu, J. Zhang, and DK Panda, Building Efficient HPC Cloud with SR-IOV Enabled InfiniBand: The MVAPICH2 Approach , Book "Research Advances in Cloud Computing", edited by Sanjay Chaudhary, Gaurav Somani, and Rajkumar Buyya, Springer International Publishing , Aug 2017.
2 X. Lu, and DK Panda, Contribution on Multiple Chapters related to OpenStack, Virtualized HPC, HPC Network Fabric, and HPC Workload Management , Book "The Crossroads of Cloud and HPC: OpenStack for Scientific Research; Exploring OpenStack Cloud Computing for Scientific Workloads", Edited by Stig Telfer - OpenStack Foundation Publishing (Invited Book Chapter) , Nov 2016.

Conferences & Workshops (425)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
  • Experiences with Software MPEG-2 Video Decompression on an SMP PC

  • A. Bala, D. Shah, W.-C. Feng, and DK Panda,
  • ICPP Workshop, Aug 1998
  • [Bib - Plain]
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425

Technical Reports (8)

1 K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2 K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3 G. Marsh, A. Sampat, S. Potluri, and DK Panda, Scaling Advanced Message Queuing Protocol (AMQP) Architecture with Broker Federation and InfiniBand OSU Technical Report (OSU-CISRC-5/09-TR17)
4 W. Huang, J. Liu, B. Abali, and DK Panda, InfiniBand Support in Xen Virtual Machine Environment, OSU-CISRC-2/06--TR18
5 P. Balaji, W. Feng, and DK Panda, The Convergence of Ethernet and Ethernot: A 10-Gigabit Ethernet Perspective, OSU-CISRC-1/06-TR10
6 H. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji, and DK Panda, Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC, OSU-CISRC-6/05-TR40
7 K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
8 S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (36)

1 M. Bayatpour, Designing High Performance Hardware-assisted Communication Middlewares for Next-Generation HPC Systems, May 2021
2 C. Chu, Accelerator-enabled Communication Middleware for Large-scale Heterogeneous HPC Systems with Modern Interconnects, Jul 2020
3 J. Hashmi, Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems, Apr 2020
4 Ammar Awan, Co-designing Communication Middleware and Deep Learning Frameworks for High-Performance DNN Training on HPC Systems, Apr 2020
5 S. Chakraborty, High Performance and Scalable Cooperative Communication Middleware for Next Generation Architectures, Jun 2019
6 J. Zhang, Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters, Jul 2018
7 M. Li, Designing High-Performance Remote Memory Access for MPI and PGAS Models with Modern Networking Technologies on Heterogeneous Clusters, Nov 2017
8 A. Venkatesh, High-Performance Heterogeneity/Energy-Aware Communication for MultiPetaflop HPC Systems, Dec 2016
9 R. Rajachandrasekar, Designing Scalable And Efficient I/O Middleware for Fault-Resilient High-performance Computing Clusters, Nov 2014
10 J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
11 S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014
12 K. Kandalla, High Performance Non-Blocking Collective Communication for Next Generation InfiniBand Clusters, Jul 2013
13 M. Luo, Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand and Heterogeneous System, Jul 2013
14 H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013
15 X. Ouyang, Efficient Storage Middleware Design in InfiniBand Clusters for High-End Computing, Mar 2012
16 G. Santhanaraman, Designing Scalable And High Performance One Sided Communication Middleware For Modern Interconnects, Jun 2009
17 M. Koop, High-Performance Multi-Transport MPI Design For Ultra-Scale Infiniband Clusters, Jun 2009
18 L. Chai, High Performance And Scalable MPI Intra-Node Communication Middleware For Multi-Core Clusters, Mar 2009
19 W. Huang, High Performance Network I/O In Virtual Machines Over Modern Interconnects, Aug 2008
20 R. Noronha, Designing High-Performance and Scalable Clustered Network Attached Storage With InfiniBand, Aug 2008
21 S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
22 A. Mamidala, Scalable and High Performance Collective Communication For Next Generation Multicore InfiniBand Clusters, May 2008
23 K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
24 A. Vishnu, High Performance and Network Fault Tolerant MPI with Multi-Pathing Over InfiniBand, Dec 2007
25 S. Sur, Scalable and High Performance MPI Design for Very Large InfiniBand Clusters, Aug 2007
26 W. Yu, Enhancing MPI with Modern Networking Mechanisms in Cluster Interconncts, Jun 2006
27 P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006
28 J. Liu, Designing High Performance and Scalable MPI over InfiniBand, Sep 2004
29 J. Wu, Communication and Memory Management in Networked Storage Systems, Sep 2004
30 D. Buntinas, Improving Cluster Performance through the Use of Programmable Network Interfaces, Jun 2003
31 M. Banikazemi, Design and Implementation of High Performance Communication Subsystems for Clusters, Dec 2000
32 D. Dai, Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems, Mar 1999
33 R. Kesavan, Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems, Oct 1998
34 R. Sivaram, Architectural Support for Efficient Communication in Scalable Parallel Systems, Aug 1998
35 D. Basak, Designing High Performance Parallel Systems: A Processor-Cluster Based Approach, Jul 1996
36 V. Dixit-Radiya, Mapping on Wormhole-routed Distributed-Memory Systems: A Temporal Communication Graph-based Approach, Mar 1995

M.S. Thesis (31)

1 S. Srivastava, MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library, May 2021
2 N. Senthil Kumar, Designing Optimized MPI+NCCL Hybrid Collective Communication Routines for Dense Many-GPU Clusters, May 2021
3 Kamal Raj Sankarapandian, Profiling MPI Primitives in Real-time Using OSU INAM, Apr 2020
4 R. Biswas, Benchmarking and Accelerating TensorFlow-based Deep Learning on Modern HPC Systems, Jul 2018
5 A. Augustine, Designing a Scalable Network Analysis and Monitoring Tool with MPI Support, Aug 2016
6 V. Dhanraj, Enhancement of LIMIC-Based Collectives for Multi-core Clusters, Aug 2012
7 A. Singh, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, Apr 2012
8 S. Pai Raikar, Network Fault-Resilient MPI for Multi-Rail InfiniBand Clusters, Dec 2011
9 N. Dandapanthula, InfiniBand Network Analysis and Monitoring using OpenSM, Aug 2011
10 V. Meshram, Distributed Metadata Management for Parallel Systems, Aug 2011
11 G. Marsh, Evaluation of High Performance Financial Messaging on Modern Multi-core Systems, Mar 2010
12 K. Gopalakrishnan, Enhancing Fault Tolerance in MPI for Modern InfiniBand Clusters, Aug 2009
13 T. Gangadharappa, Designing Support For MPI-2 Programming Interfaces On Modern Interconnects, Jun 2009
14 J. Sridhar, Scalable Job Startup And Inter-Node Communication In Multi-Core Infiniband Clusters, Jun 2009
15 R. Kumar, Enhancing MPI Point-to-Point and Collectives for Clusters with Onloaded/Offloaded InfiniBand Adapters, Aug 2008
16 S. Bhagvat, Designing and Enhancing the Sockets Direct Protocol (SDP) over iWARP and InfiniBand, Aug 2006
17 S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004
18 W. Jiang, High Performance MPICH2 One-Sided Communication Implementation over InfiniBand, Jun 2004
19 A. Wagner, Static and Dynamic Processing Offload on Myrinet Clusters with Programmable NIC Support, Jun 2004
20 A. Moody, NIC-based Reduction on Large-Scale Quadrics Clusters, Dec 2003
21 B. Chandrasekharan, Micro-benchmark Level Performance Evaluation and Comparison of High Speed Cluster Interconnects, Sep 2003
22 S. Kini, Efficient Collective Communication using Multicast and RDMA Operations for InfiniBand-based Clusters, Jun 2003
23 S. Senapathi, QoS-Aware Middleware to Support Interactive and Resource Adaptive Applications on Myrinet Clusters, Sep 2002
24 P. Shivam, High Performance User Level Protocol on Gigabit Ethernet, Aug 2002
25 R. Gupta, Efficient Collective Communication using Remote Memory Operations on VIA-Based Clusters, Aug 2002
26 A. Saify, Optimizing Collective Communication Operations in ARMCI, Jul 2002
27 S. Desai, Mechanisms for Implementing Efficient Collective Communication in Clusters with Application Bypass, Jun 2002
28 V. Tipparaju, Optimizing ARMCI Get/Put Operations on Myrinet/GM, Sep 2001
29 A. Gulati, A Proportional Bandwidth Allocation Scheme for Myrinet Clusters, Jun 2001
30 V. Kota, Designing Efficient Inter-Cluster Communication Layer for Distributed Computing, Jun 2001
31 S. Kutlug, Performance Evaluation and Analysis of User Level Networking Protocols in Clusters, Jun 2000