This page lists publications from the group related to designing High Performance MPI on InfiniBand. In addition, the group is also actively engaged in other research directions (PVFS and MPI-IO, Micro-Benchmark suite, Distributed Shared Memory, ARMCI, and Datacenter) related to modern interconnects. Publications related to these research directions are also included in the corresponding links.

Journals (21)

1 H. Wang, S. Potluri, D. Bureddy, and D. K. Panda, GPU-Aware MPI on RDMA-Enabled Cluster: Design, Implementation and Evaluation , IEEE Transactions on Parallel & Distributed Systems, Vol. 25, No. 10, pp. 2595-2605 , Oct 2014.
2 S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and D. K. Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters , IEEE Computer , Nov 2011.
3 P. Lai, P. Balaji, R. Thakur, and D. K. Panda, ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures , Computer Science: Research and Development, Special Issue of Scientific Papers from ISC '09 , Jun 2009.
4 A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and D. K. Panda, Topology Agnostic Hot-Spot Avoidance with InfiniBand , Concurrency and Computation: Practice and Experience, Special Issue of Best Papers from CCGrid '07 , Jan 2008.
5 H. Jin, P. Balaji, C. Yoo, J. -Y. Choi, and D. K. Panda, Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks , OSU-CISRC-5/04-TR37 , Nov 2005.
6 J. Liu, A. Mamidala, A. Vishnu, and D. K. Panda, Performance Evaluation of InfiniBand with PCI Express , IEEE Micro , Jan 2005.
7 J. Liu, J. Wu, and D. K. Panda, High Performance RDMA-Based MPI Implementation over InfiniBand , Int'l Journal of Parallel Programming: Volume 32, Number 3 , Jun 2004.
8 J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and D. K. Panda, Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects , IEEE Micro , Jan 2004.
9 A. Wagner, D. Buntinas, R. Brightwell, and D. K. Panda, Application-Bypass Reduction for Large-Scale Clusters. Int'l Journal of High Performance Computing and Networking , Internationall Journal of High Performance Computing and Networking, Cluster 2003 Special Issue. In Press , Dec 2003.
10 R. Sivaram, C. Stunkel, and D. K. Panda, HIPIQS: A High-Performance Switch Architecture using Input Queuing , IEEE Transactions on Parallel and Distributed Systems. Vol. 13, No. 3, pp. 275-289 , Mar 2002.
11 M. Banikazemi, B. Abali, L. Herger, and D. K. Panda, Design Alternatives for Virtual Interface Architecture (VIA) and an Implementation on IBM Netfinity NT Cluster , Journal of Parallel and Distributed Computing, Special Issue on Clusters, Volume 61, Number 11, pp. 1512-1545 , Nov 2001.
12 M. Banikazemi, R. K. Govindaraju, R. Blackmore, and D. K. Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems , IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 10, pp. 1081-1093 , Oct 2001.
13 B. Abali, C. B. Stunkel, J. Herring, M. Banikazemi, D. K. Panda, C. Aykanat, and Y. Aydogan, Adaptive Routing on the New Switch Chip for IBM SP Systems , Journal of Parallel and Distributed Computing, Special Issue on Routing in Computer and Communication Networks, Volume 61, Number 9, pp. 1148-1179 , Sep 2001.
14 R. Kesavan, and D. K. Panda, Efficient Multicast on Irregular Switch-based Cut-Through Networks with Up-Down Routing , IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 8, pp. 808-828 , Aug 2001.
15 R. Sivaram, R. Kesavan, D. K. Panda, and C. Stunkel Architectural Support for Efficient Multicasting in Irregular Networks, Architectural Support for Efficient Multicasting in Irregular Networks , IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 5, pp. 489-513 , May 2001.
16 R. Sivaram, C. Stunkel, and D. K. Panda, Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact , IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 8, pp. 794-812 , Aug 2000.
17 R. Kesavan, and D. K. Panda, Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks , IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, pp. 371-393 , Apr 1999.
18 D. Dai, and D. K. Panda, Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation , IEEE Transactions on Computers, Special Issue on Cache Memory, Vol. 48, No. 2, pp. 236-244 , Feb 1999.
19 R. Prakash, and D. K. Panda, Designing Communication Strategies for Heterogeneous Parallel Systems , Parallel Computing, Volume 24, pp. 2035-2052 , Dec 1998.
20 R. Sivaram, D. K. Panda, and C. B. Stunkel, Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding , IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 10, pp. 1004-1028 , Oct 1998.
21 D. Basak, and D. K. Panda, Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements , IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 9, pp. 962-978 , Sep 1996.

Conferences & Workshops (311)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311

Technical Reports (8)

1 K. Vaidyanathan, P. Lai, S. Narravula, and D. K. Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems , OSU-CISRC-8/07-TR53
2 K. Vaidyanathan, H. Jin, S. Narravula, and D. K. Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks , OSU-CISRC-7/05-TR49
3 W. Huang, J. Liu, B. Abali, and D. K. Panda, InfiniBand Support in Xen Virtual Machine Environment , OSU-CISRC-2/06--TR18
4 P. Balaji, W. Feng, and D. K. Panda, The Convergence of Ethernet and Ethernot: A 10-Gigabit Ethernet Perspective , OSU-CISRC-1/06-TR10
5 H. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji, and D. K. Panda, Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC , OSU-CISRC-6/05-TR40
6 K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and D. K. Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers
7 S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and D. K. Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand
8 G. Marsh, A. Sampat, S. Potluri, and D. K. Panda, Scaling Advanced Message Queuing Protocol (AMQP) Architecture with Broker Federation and InfiniBand , OSU Technical Report (OSU-CISRC-5/09-TR17)

Ph.D. Disserations (28)

1 R. Rajachandrasekar, Designing Scalable And Efficient I/O Middleware for Fault-Resilient High-performance Computing Clusters, Nov 2014
2 J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
3 S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014
4 K. Kandalla, High Performance Non-Blocking Collective Communication for Next Generation InfiniBand Clusters, Jul 2013
5 H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013
6 M. Luo, Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand and Heterogeneous System, Jul 2013
7 X. Ouyang, Efficient Storage Middleware Design in InfiniBand Clusters for High-End Computing, Mar 2012
8 G. Santhanaraman, Designing Scalable And High Performance One Sided Communication Middleware For Modern Interconnects, Jun 2009
9 M. Koop, High-Performance Multi-Transport MPI Design For Ultra-Scale Infiniband Clusters, Jun 2009
10 L. Chai, High Performance And Scalable MPI Intra-Node Communication Middleware For Multi-Core Clusters, Mar 2009
11 W. Huang, High Performance Network I/O In Virtual Machines Over Modern Interconnects, Aug 2008
12 S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
13 R. Noronha, Designing High-Performance and Scalable Clustered Network Attached Storage With InfiniBand, Aug 2008
14 A. Mamidala, Scalable and High Performance Collective Communication For Next Generation Multicore InfiniBand Clusters, May 2008
15 K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
16 A. Vishnu, High Performance and Network Fault Tolerant MPI with Multi-Pathing Over InfiniBand, Dec 2007
17 S. Sur, Scalable and High Performance MPI Design for Very Large InfiniBand Clusters, Aug 2007
18 W. Yu, Enhancing MPI with Modern Networking Mechanisms in Cluster Interconncts, Jun 2006
19 P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006
20 J. Liu, Designing High Performance and Scalable MPI over InfiniBand, Sep 2004
21 J. Wu, Communication and Memory Management in Networked Storage Systems, Sep 2004
22 D. Buntinas, Improving Cluster Performance through the Use of Programmable Network Interfaces, Jun 2003
23 M. Banikazemi, Design and Implementation of High Performance Communication Subsystems for Clusters, Dec 2000
24 D. Dai, Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems, Mar 1999
25 R. Kesavan, Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems, Oct 1998
26 R. Sivaram, Architectural Support for Efficient Communication in Scalable Parallel Systems, Aug 1998
27 D. Basak, Designing High Performance Parallel Systems: A Processor-Cluster Based Approach, Jul 1996
28 V. Dixit-Radiya, Mapping on Wormhole-routed Distributed-Memory Systems: A Temporal Communication Graph-based Approach, Mar 1995

M.S. Thesis (26)

1 V. Dhanraj, Enhancement of LIMIC-Based Collectives for Multi-core Clusters, Aug 2012
2 A. Singh, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, Apr 2012
3 S. Pai Raikar, Network Fault-Resilient MPI for Multi-Rail InfiniBand Clusters, Dec 2011
4 N. Dandapanthula, InfiniBand Network Analysis and Monitoring using OpenSM, Aug 2011
5 V. Meshram, Distributed Metadata Management for Parallel Systems, Aug 2011
6 G. Marsh, Evaluation of High Performance Financial Messaging on Modern Multi-core Systems, Mar 2010
7 K. Gopalakrishnan, Enhancing Fault Tolerance in MPI for Modern InfiniBand Clusters, Aug 2009
8 T. Gangadharappa, Designing Support For MPI-2 Programming Interfaces On Modern Interconnects, Jun 2009
9 J. Sridhar, Scalable Job Startup And Inter-Node Communication In Multi-Core Infiniband Clusters, Jun 2009
10 R. Kumar, Enhancing MPI Point-to-Point and Collectives for Clusters with Onloaded/Offloaded InfiniBand Adapters, Aug 2008
11 S. Bhagvat, Designing and Enhancing the Sockets Direct Protocol (SDP) over iWARP and InfiniBand, Aug 2006
12 S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004
13 A. Wagner, Static and Dynamic Processing Offload on Myrinet Clusters with Programmable NIC Support, Jun 2004
14 W. Jiang, High Performance MPICH2 One-Sided Communication Implementation over InfiniBand, Jun 2004
15 A. Moody, NIC-based Reduction on Large-Scale Quadrics Clusters, Dec 2003
16 B. Chandrasekharan, Micro-benchmark Level Performance Evaluation and Comparison of High Speed Cluster Interconnects, Sep 2003
17 S. Kini, Efficient Collective Communication using Multicast and RDMA Operations for InfiniBand-based Clusters, Jun 2003
18 S. Senapathi, QoS-Aware Middleware to Support Interactive and Resource Adaptive Applications on Myrinet Clusters, Sep 2002
19 R. Gupta, Efficient Collective Communication using Remote Memory Operations on VIA-Based Clusters, Aug 2002
20 P. Shivam, High Performance User Level Protocol on Gigabit Ethernet, Aug 2002
21 A. Saify, Optimizing Collective Communication Operations in ARMCI, Jul 2002
22 S. Desai, Mechanisms for Implementing Efficient Collective Communication in Clusters with Application Bypass, Jun 2002
23 V. Tipparaju, Optimizing ARMCI Get/Put Operations on Myrinet/GM, Sep 2001
24 V. Kota, Designing Efficient Inter-Cluster Communication Layer for Distributed Computing, Jun 2001
25 A. Gulati, A Proportional Bandwidth Allocation Scheme for Myrinet Clusters, Jun 2001
26 S. Kutlug, Performance Evaluation and Analysis of User Level Networking Protocols in Clusters, Jun 2000