Overview of the OSU INAM Project
As InfiniBand (IB) based High Performance Computing (HPC) installations grow in size and scale, predicting the behavior of the IB network in terms of link usage and performance becomes an increasingly challenging task. Further, as the computing, and networking technologies continue to evolve in HPC platforms, it becomes increasingly essential to understand the interactions between high-performance HPC middleware infrastructures and the high performance communication fabric which they rely on. The OSU InfiniBand Network Analysis and Monitoring tool - OSU INAM monitors IB clusters in real-time by querying various subnet management entities in the network. It is also capable of interacting with the MVAPICH2-X software stack to gain insights into the communication pattern of the application and classify the data transferred into Point-to-Point, Collective, and Remote Memory Access (RMA). OSU INAM can also remotely monitor the CPU utilization of MPI processes in conjunction with MVAPICH2-X. To overview OSU INAM features and visualizations capabilities, please see the Features and Visualization Capabilities section.
Documents and Downloads
OSU InfiniBand Network Analysis and Monitoring (OSU INAM) Tool v1.1 (03/11/24) is available on our download page here.
- The OSU INAM package is distributed under the BSD License .
- A detailed user guide with instructions to build, install and run OSU INAM is available here. This document also contains guidelines for troubleshooting and best practice deployment.
- Please see CHANGES for the full changelog.
- To estimate the expected size of the database on your system, please see the Database Size Calculator section.
- To overview OSU INAM features and visualizations capabilities, please see the Features and Visualization Capabilities section.
Visualizing your HPC jobs using OSU INAM
OSU INAM Tutorial
Community Engagement and Dissemination
- OSU INAM is deployed at following HPC centers: OSC @ USA, TACC @ USA, NOAA @ USA, U. of Utah @ USA, CAE Services @ Germany, Pratt & Whitney, Ghent University @ Germany, Cyfronet @ Poland, and Georgia Tech Univ @ USA
- OSU INAM has been downloaded more than 7,000 times.
In-production Performance Evaluation of INAM with Different Database Options
We have incorporated and deployed three database options into INAM, utilizing it on the OSC cluster to conduct high-fidelity profiling stress tests and validate our findings. The tests used a 1-second interval for profiling the InfiniBand network, a 5-second interval for profiling both MPI and jobs metrics with an 80% cluster load, and a background deletion of data older than 1 hour. Consequently, this evaluation demonstrates a real-world deployment of INAM with varying database options. We also performed detailed timing measurements for each component available on our performance page here.
This figure presents a comparison of the total latency involved in gathering and storing PC and PE data from the network across various databases. Each point reflects the total latency across all threads for insertion and collection. Eight threads were employed for data insertion, and this experiment was repeated for 2,400 samples. Notably, ClickHouse consistently exhibited superior performance stability compared to the other databases.
Citing OSU INAM
If you use OSU INAM in a scholarly work, then we recommend that you cite one of the papers below based on the order published.
-
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications, P. Kousha, S. D. Kamal Raj , M. Kedia , H. Subramoni, A. Jain , A. Shafi , DK Panda, H. Na, T. Dockendorf, and K. Tomko. Practice and Experience in Advanced Research Computing 2021, Jul 2021 [Download - Plain]
-
Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM, P. Kousha, S. D. Kamal Raj , H. Subramoni, DK Panda, H. Na, T. Dockendorf, and K. Tomko. Practice and Experience in Advanced Research Computing 2020, Jul 2020 [Download - Bib - Plain]
-
Designing a Profiling and Visualization Tool for Scalable and In-Depth Analysis of High-Performance GPU Clusters, P. Kousha, B. Ramesh, K. Suresh, C. Chu, A. Jain, N. Sarkauskas, H. Subramoni, and DK Panda. 26th IEEE International Conference on High Performance Computing, Data, Analytics and Data Science, Dec 2019 [Bib - Plain]
-
INAM^2: InfiniBand Network Analysis & Monitoring with MPI, H. Subramoni, A. Augustine, M. Arnold, J. Perkins, X. Lu, K. Hamidouche, and DK Panda. International Supercomputing Conference, Jun 2016 [Slides] [Bib - Plain]
-
INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool N.Dandapanthula, H.Subramoni, J. Vienne, K. Kandalla, S. Sur, DK Panda, and R. Brightwell. 4th International Workshop on Productivity and Performance (PROPER 2011), Aug 2011 [Slides] [Bib - Plain]