1. Overview of the OSU INAM Project

As InfiniBand (IB) based High Performance Computing (HPC) installations grow in size and scale, predicting the behavior of the IB network in terms of link usage and performance becomes an increasingly challenging task. Further, as the computing, and networking technologies continue to evolve in HPC platforms, it becomes increasingly essential to understand the interactions between high-performance HPC middleware infrastructures, and the high performance communication fabric which they rely on. The OSU InfiniBand Network Analysis and Monitoring tool - OSU INAM monitors IB clusters in real time by querying various subnet management entities in the network. It is also capable of interacting with the MVAPICH2-X software stack to gain insights into the communication pattern of the application and classify the data transferred into Point-to-Point, Collective and Remote Memory Access (RMA). OSU INAM can also remotely monitoring the CPU utilization of MPI processes in conjunction with MVAPICH2-X.

This document contains necessary information for users to download, install, test, use, tune and troubleshoot OSU INAM v0.9.1. We continuously fix bugs and update this document as per user feedback. Therefore, we strongly encourage you to refer to our web page for updates.

2. Features

OSU INAM supports profiling InfiniBand Network traffic. It also has support to introspect the communication pattern of pure MPI programs and MPI+OpenMP programs built with MVAPICH2-X 2.2rc1. High level features of OSU INAM v0.9.1 are listed below.

2.1. Performance and Scalability Features

  • Capability to analyze and profile network-level activities with many parameters (data and errors) at user specified granularity

  • Significant enhancements to user interface to enable scaling to clusters with thousands of nodes

  • Improve database insert times by using bulk inserts

  • Improve network load time by clustering individual nodes

  • Capability to look up list of nodes communicating through a network link

  • Capability to visualize the data transfer happening in a ‘live’ fashion - Live View for

    • Entire Network - Live Network Level View

    • One or multiple Jobs - Live Job Level View

    • One or multiple Node - Live Node Level View

  • Capability to visualize data transfer that happened in the network at a time duration in the past - Historical View for

    • Entire Network - Historical Network Level View

    • One or multiple Jobs - Historical Job Level View

    • One or multiple Node - Historical Node Level View

2.2. MVAPICH2-X Specific Features

  • Capability to analyze and profile node-level, job-level and process-level activities for MPI communication (Point-to-Point, Collectives and RMA) at user specified granularity

  • Capability to profile and report the following parameters of MPI processes at node-level, job-level and process-level at user specified granularity

    • CPU Utilization

    • Memory Utilization

    • Inter-node communication buffer usage for RC transport

    • Inter-node communication buffer usage for UD transport

  • Capability to profile and report process to node communication matrix for MPI processes at user specified granularity

  • Capability to visualize utilization of a given network link in a live fashion - Live View for

    • Data transferred via a link at Job Level

    • Data transferred via a link at Process Level

2.3. MVAPICH2-X + SLURM Specific Features

  • Support for "Job Page" to display jobs in ascending/descending order of various performance metrics

3. Download and Installation Instructions

The OSU INAM package can be downloaded from http://mvapich.cse.ohio-state.edu/downloads/#osu-inam. Select the link for your distro. All OSU INAM RPMs are relocatable. As as initial technology preview, we are providing RHEL6 RPMs. We provide RPMs compatible with Mellanox OFED 2.2.

In order to use job tracking, SLURM accounting is required. You can provide the same SLURM credentials or create a select only account for OSU INAM as it is only reading data from the SLURM accounting database. For more information visit https://computing.llnl.gov/linux/slurm/accounting.html.

3.1. RHEL6/CentOS6 packages

3.1.1. The following packages are required to get the OSU INAM tool working

  • mysql

  • mysql-devel

  • tomcat webserver

  • java 1.7.0

yum install -y mysql mysql-devel mysql-server tomcat6 java-1.7.0-openjdk
Installation Instructions
service mysqld start
# If this path changes, use --prefix=$OSU_INAM_INSTALL_PREFIX when installing the rpm
export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# Setup DB
mysql -uroot
CREATE DATABASE osuinamdb;
CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';
GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';
FLUSH PRIVILEGES;
exit

# Web Server Install Steps
tar -zxvf osu-inam-0.9.1.el6.tar.gz
cd osu-inam-0.9.1.el6
# If you changed the path of OSU_INAM_INSTALL_PREFIX, add --prefix=$OSU_INAM_INSTALL_PREFIX
rpm -ivh osu-inam-0.9.1-1.el6.x86_64.rpm
# Copy the war file to $CATALINA_BASE/webapps
cp osu-inam-0.9.1.war /usr/share/tomcat6/webapps/osu-inam-0.9.1.war
# Copy web server properties file, this path can be set in the osu-inam-0.9.1.xml file
cp osu-inam.properties /etc/osu-inam.properties
chmod 644 /etc/osu-inam.properties
# Copy the web server xml file to $CATALINA_BASE/conf/Catalina/<hostname>/
cp osu-inam-0.9.1.xml /usr/share/tomcat6/conf/Catalina/localhost/osu-inam-0.9.1.xml

# OSU INAM daemon steps
mkdir -p $OSU_INAM_INSTALL_PREFIX
cp $OSU_INAM_INSTALL_PREFIX/share/doc/osu-inam-0.9.1/osu-inamd.conf.example $OSU_INAM_INSTALL_PREFIX/osu-inamd.conf
# Start the opensm reading daemon, be sure the conf file can be written to as
# the information will be updated with the pid
$OSU_INAM_INSTALL_PREFIX/bin/osu-inamd -c $OSU_INAM_INSTALL_PREFIX/osu-inamd.conf -f $OSU_INAM_INSTALL_PREFIX/osu-inam.conf -p /var/run/osu-inam.pid

# After all conf files are in place, restart tomcat
service tomcat6 restart

3.2. RHEL7/CentOS7 packages

3.2.1. The following packages are required to get the OSU INAM tool working

  • mariadb-server (formerly mysql)

  • mariadb-devel

  • tomcat webserver

  • java 1.7.0

yum install -y  mariadb-server mariadb-devel tomcat java-1.7.0-openjdk
Installation Instructions
systemctl start mariadb
# If this path changes, use --prefix=$OSU_INAM_INSTALL_PREFIX when installing the rpm
export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# Setup DB
mysql -uroot
CREATE DATABASE osuinamdb;
CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';
GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';
FLUSH PRIVILEGES;
exit

# Web Server Install Steps
tar -zxvf osu-inam-0.9.1.el7.tar.gz
cd osu-inam-0.9.1.el7
# If you changed the path of OSU_INAM_INSTALL_PREFIX, add --prefix=$OSU_INAM_INSTALL_PREFIX
rpm -ivh osu-inam-0.9.1-1.el7.x86_64.rpm
# Copy the war file to $CATALINA_BASE/webapps
cp osu-inam-0.9.1.war /usr/share/tomcat/webapps/osu-inam-0.9.1.war
# Copy web server properties file, this path can be set in the osu-inam-0.9.1.xml file
cp osu-inam.properties /etc/osu-inam.properties
chmod 644 /etc/osu-inam.properties
# Copy the web server xml file to $CATALINA_BASE/conf/Catalina/<hostname>/
cp osu-inam-0.9.1.xml /usr/share/tomcat/conf/Catalina/localhost/osu-inam-0.9.1.xml

# OSU INAM daemon steps
mkdir -p $OSU_INAM_INSTALL_PREFIX
cp $OSU_INAM_INSTALL_PREFIX/share/doc/osu-inam-0.9.1/osu-inamd.conf.example $OSU_INAM_INSTALL_PREFIX/osu-inamd.conf
# Start the opensm reading daemon, be sure the conf file can be written to as
# the information will be updated with the pid
$OSU_INAM_INSTALL_PREFIX/bin/osu-inamd -c $OSU_INAM_INSTALL_PREFIX/osu-inamd.conf -f $OSU_INAM_INSTALL_PREFIX/osu-inam.conf -p /var/run/osu-inam.pid

# After all conf files are in place, restart tomcat
systemctl start tomcat

3.3. Testing the Web Server

After the above installation commands have successfully completed, you should be able to view the inam home page by visiting http://<server-ip>:8080/osu-inam-0.9.1/

3.4. Sample Configuration Files

These files are provided in the root of the OSU INAM tarball.

osu-inam-0.9.1.xml
<?xml version='1.0' encoding='utf-8'?>
<Context>
    <Environment name="inamconfig" value="/etc/osu-inam.properties"
     type="java.lang.String" override="false"/>
</Context>
osu-inam.properties
# Global interval (in seconds) for refreshing information on different pages
osuinam.counterinterval=30
# Max Cluster Size (in number of nodes). For clusters larger than this,
# the leaf nodes will be collapsed by default to improve visual appeal and
# rendering time . Default value: 500
osuinam.clustering_threshold=500
osuinam.datasource.driverclass=com.mysql.jdbc.Driver
# Properties for opensm datasource configuration
osuinam.datasource.opensmurl=jdbc:mysql://localhost:3306/osuinamdb
suinam.datasource.username=osuinamuser
osuinam.datasource.password=osuinampassword
# Configure SLURM location, username, password, and cluster
osuinam.datasource.slurmurl=jdbc:mysql://localhost:3306/slurm_acct_db
osuinam.datasource.slurmusername=slurmuser
osuinam.datasource.slurmpassword=slurmpassword
osuinam.clustername=osuinamcluster
# Initial size of the database connection pool. Default value:20
osuinam.datasource.initialsize=20
# The maximum number of connections that can remain idle in the pool, without
# extra ones being released, or negative for no limit. Default value:20
osuinam.datasource.maxIdle=20
# The cap on the number of objects that can be allocated by the
# pool. Default value:50
osuinam.datasource.maxtotal=50
# The maximum amount of time (in milliseconds) the borrowObject() method should
# block before throwing an exception when the pool is exhausted. Default
# value:10000
osuinam.datasource.maxwaitmillis=10000
osu-inam.conf
MV2_TOOL_QPN=X
MV2_TOOL_LID=X
MV2_TOOL_COUNTER_INTERVAL=30
MV2_TOOL_REPORT_CPU_UTIL=0
MV2_TOOL_REPORT_MEM_UTIL=0
MV2_TOOL_REPORT_IO_UTIL=0
MV2_TOOL_REPORT_COMM_GRID=0
Ensure the conf files are readable by tomcat
# Either set the permissions to be readable by everyone
chmod 644 /etc/osu-*
# Or change the ownership to tomcat
chown tomcat: /etc/osu-*

Please email us at mvapich-help@cse.ohio-state.edu if you experience any trouble installing the package on your system.

3.5. Upgrading from an older version

Upgrading from older versions involves a subset of steps from the complete installation.

Upgrade Steps
# If this path changes, use --prefix=$OSU_INAM_INSTALL_PREFIX when installing the rpm
export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam
# Kill the current running inam deamon
pkill osu-inamd
# Install the latest rpm, uninstallation of the old rpm may be necessary
# If you changed the path of OSU_INAM_INSTALL_PREFIX, add --prefix=$OSU_INAM_INSTALL_PREFIX
rpm -Uvh osu-inam-*.x86_64.rpm
# Copy the war file to $CATALINA_BASE/webapps
cp *.war /usr/share/tomcat/webapps/
# Start the osu-inam daemon again
$OSU_INAM_INSTALL_PREFIX/bin/osu-inamd -c $OSU_INAM_INSTALL_PREFIX/osu-inamd.conf -f $OSU_INAM_INSTALL_PREFIX/osu-inam.conf -p /var/run/osu-inam.pid
# Restart the webserver, if the war file name has changed, be sure to use the new url
service tomcat restart
osu-inam.properties - additional properties since v0.9-2
osuinam.clustering_threshold=500

4. Basic Usage Instructions

If the tomcat installation was successful and the service has been started, you should be able to see the OSU INAM homepage if you point your web browser to http://localhost:8080/osu-inam-0.9.1/ or http://<server_ip>:8080/osu-inam-0.9.1/, depending on where the server was installed. If the server is behind a firewall, look here for some pointers.

4.1. Using the Network View

The Network View provides an overview of the entire network fabric. The network topology is presented as an interactive display that can be moved, dragged or zoomed as required. The nodes are represented by blue circles and switches are represented by red circles. They are labeled by their respective LIDs. The interconnects are colored according to their current load as indicated in the legend.

4.1.1. Network Metrics

The ‘Network Metrics’ drop down box lists a set of port counters available from the switch. By default, total traffic on the link (Transmitted + Received Bytes) is shown. For the full list of supported counters, refer to port-counters.

4.1.2. Live View

When Live View is selected, the display is refreshed every 30 seconds. This frequency can be changed by changing the run time parameters OSU INAM_FABRIC_QUERY_INTERVAL (See inam_fabric_query_interval). The view can be updated manually by selecting a node or switch, right clicking on it and selecting ‘Update Network’. To get a live view of the switch (red circle) or the node (blue circle), right click on the appropriate circle and select "Open Node Info". This will open up a new tab / window for the respective element.

4.1.3. Historical View

Looking at the past behavior of a network is often useful while investigating an issue. The Historical View shows the condition of the network from the ‘Start Time’ to the ‘End Time’. The Play/Pause button can be used to start and stop the display. By default, the snapshots are showed in real-time but it can be sped up to 2x, 4x, or 8x speed. The display can be also be restarted by clicking the Rewind button.

By using the check-boxes under ‘Link Usage’, only the links with a certain range of traffic can be included in the view. For example, idle links can be excluded by un checking the 0-5% check box. For metrics indicating errors, the links with or without that error can be selected.

4.1.5. Node Information

Right-clicking on a node presents a context menu. Selecting ‘Open Node Info’ will show detailed information about that node. If the node is running MVAPICH2-X, aggregate CPU usage and usage by each rank will be available.

4.1.6. Switch Information

Detailed information about a switch can be obtained by right-clicking on a switch followed by ‘Open Switch Info’. Clicking on a port will show the port counter information for that particular port.

4.1.7. Route Information

Multiple nodes can be selected on the display by CTRL+Clicking (CMD+Click for OS X) on them. Once multiple nodes are selected, right click followed by ‘Find Routes Between Selected Nodes’ will highlight the available routes between them.

Detailed information about the utilization of a link can be viewed by right clicking on the link followed by Open Link Info. For jobs using MVAPICH-X, data transferred via that link is available . Also, by selecting a job id, process level link utilization is available.

Left click on the link of choice to select it. Then right click and select the Find routes going through this link option. This will display all the routes connecting hosts which uses the selected link.

4.1.10. Using the Job Level View

OSU INAM can work with the resource manager to show information pertinent to a single job instead of the entire cluster. The Job Level View can be activated by selecting ‘Job Id’ in ‘Filter By’ and entering a job id. In this view, only the nodes and switches participating in that job are displayed. The features present in Network Level View (See network-view) like Live View, Historical View, Link Usage etc. are supported in the Job Level View as well.

In Historical View, the start and end time of the job are automatically populated. For a running job, the end time is populated to the current time. The user can select specific start and end time as well.

4.1.11. Using the Node View

For supported MPI libraries, OSU INAM can display process level CPU and network utilization information. This mode can be selected by choosing ‘Node Id’ and selecting one or more nodes from the list of nodes. If an MPI job is running on that node, OSU INAM will display aggregate or per core CPU usage. The list of MPI ranks is also shown, and each of the ranks can be selected to view their network usage over a period of time.

4.2. Using the Live Jobs Tab

The Live Jobs tab allows the user to see various selectable metrics (defined below) for all jobs using MVAPICH2-X 2.2b on a cluster.

4.2.1. CPU User Usage

This displays the aggregated CPU utilization (percentage) for a specific job.

4.2.2. Virtual Memory Usage

This displays the total virtual memory utilization (in bytes) for a specific job

4.2.3. Total I/O

This displayed the total I/O data read and written (in bytes) for a specific job

4.2.4. Total Communication

This displays the sum of all inter and intra node communication performed by this job.

4.2.5. Total Intra Node Communication

This displays the number of bytes exchanged by the job between the process running on one node.

4.2.6. Total Inter Node Communication

This displays the number of bytes exchanged by the job between processes running on different nodes.

4.2.7. Total Collective

This displays the number of bytes exchanged by processes (for the specific job) during collective communication (e.g. MPI_Bcast) only.

4.2.8. RMA Sent

This displays the number of bytes sent for one-sided communication (e.g. MPI_Put) by processes of a specific job.

4.2.9. Total Pt-to-pt

This displays the number of bytes sent and received by point to point operations (e.g. MPI_Send or MPI_Recv) for a specific job.

4.2.10. Inter-node Communication Buffers Allocated

This displays the number of buffers allocated for communication across nodes.

4.2.11. Inter-node Communication Buffers Used

This displays the number of buffers actually used for communication across nodes.

5. Runtime Parameters

A list of all runtime parameters supported by OSU INAM v0.9.1 are listed below. All these parameters can be set in the configuration file for OSU INAM. If the user chooses to tune any of these values, note that one needs to restart the daemon so that it takes effect.

5.1. General Parameters

5.1.1. OSU_INAM_FABRIC_QUERY_INTERVAL

  • Class: Run time

  • Default: 3600 seconds

  • Specifies the interval in seconds at which OSU INAM should query the fabric to identify change in state for switches, nodes, links and routes.

5.1.2. OSU_INAM_PERF_COUNTER_QUERY_INTERVAL

  • Class: Run time

  • Default: 30 seconds

  • Specifies the interval in seconds at which OSU INAM should query the switches to obtain counter information.

5.2. MVAPICH2-X Specific Parameters

5.2.1. OSU_INAM_PROC_COUNTER_QUERY_INTERVAL

  • Class: Run time

  • Default: 30 seconds

  • Specifies the interval at which MVAPICH2-X should report node, job and process level information.

5.2.2. OSU_INAM_TOOL_REPORT_CPU_UTIL

  • Class: Run time

  • Default: 0 (Disabled)

  • Specifies whether MVAPICH2-X should report process level CPU utilization information.

5.2.3. OSU_INAM_TOOL_REPORT_MEM_UTIL

  • Class: Run time

  • Default: 0 (Disabled)

  • Specifies whether MVAPICH2-X should report process level memory utilization information.

5.2.4. OSU_INAM_TOOL_REPORT_IO_UTIL

  • Class: Run time

  • Default: 0 (Disabled)

  • Specifies whether MVAPICH2-X should report process level IO information.

5.2.5. OSU_INAM_TOOL_REPORT_COMM_GRID

  • Class: Run time

  • Default: 0 (Disabled)

  • Specifies whether MVAPICH2-X should report process communication grid information.

5.3. OSU INAM Database Configuration Parameters

5.3.1. OSU_INAM_DATABASE_HOST

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of the host where the MySQL database daemon is running.

5.3.2. OSU_INAM_DATABASE_PORT

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the port on OSU_INAM_DATABASE_HOST at which the MySQL database daemon is listening for incoming connections.

5.3.3. OSU_INAM_DATABASE_NAME

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of MySQL database OSU INAM should use to store data.

5.3.4. OSU_INAM_DATABASE_USER

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of user who has privileges to enter data into the MySQL database with name OSU_INAM_DATABASE_NAME.

5.3.5. OSU_INAM_DATABASE_PASSWD

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the password associated with user id OSU_INAM_DATABASE_USER.

5.3.6. OSU_INAM_DATA_RETENTION_PERIOD

  • Class: Run time

  • Default: 7 days

  • Specifies the duration in days the profiling data should be stored in the MySQL database.

5.3.7. OSU_INAM_PURGE_QUERY_INTERVAL

  • Class: Run time

  • Default: 3600 seconds

  • Specifies the interval between two purge queries used to delete profiling information from the database.

5.3.8. OSU_INAM_DATABASE_BULK_ACTIVE

  • Class: Run time

  • Default: 1 (Enabled)

  • Specifies whether the entries should be inserted in a bulk manner.

5.3.9. OSU_INAM_DATABASE_BULK_SIZE

  • Class: Run time

  • Default: 100

  • Specifies the number of records inserted in a bulk insert.

6. List of Supported Network Metrics

The Network Metrics supported by OSU INAM v0.9.1 are listed below. These metrics can be broadly divided into three sets. The descriptions for InfiniBand port and error counters have been obtained from the InfiniBand Specification Release 1.2.1 by the InfiniBand Trade Association.

6.1. Switch Counters

The following node-level counters are queried from the InfiniBand Switches:

  • Xmit Data

    • Total number of data octets, divided by 4, transmitted on all VLs from the port. This includes all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

  • Rcv Data

    • Total number of data octets, divided by 4, received on all VLs from the port. This includes all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

6.2. Process Level Counters

MVAPICH2-X collects additional information about the process’s network usage which can be displayed by OSU INAM. The following counters are currently supported:

  • Xmit Data

    • Total number of bytes transmitted as part of the MPI application

  • Rcv Data

    • Total number of bytes received as part of the MPI application

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

  • Point to Point Send

    • Total number of bytes transmitted as part of MPI point-to-point operations

  • Point to Point Rcvd

    • Total number of bytes received as part of MPI point-to-point operations

  • Max [Point to Point Sent/Rcvd]

    • Maximum of the two values above

  • Coll Bytes Sent

    • Total number of bytes transmitted as part of MPI collective operations

  • Coll Bytes Rcvd

    • Total number of bytes received as part of MPI collective operations

  • Max [Coll Bytes Sent/Rcvd]

    • Maximum of the two values above

  • RMA Bytes Sent

    • Total number of bytes transmitted as part of MPI RMA operations. Note that due to the nature of the RMA operations, bytes received for RMA operations cannot be counted

  • RC VBUF

    • The number of internal communication buffers used for reliable connection (RC)

  • UD VBUF

    • The number of internal communication buffers used for unreliable datagram (UD)

  • VM Size

    • Total number of bytes used by the program for its virtual memory

  • VM Peak

    • Maximum number of virtual memory bytes for the program

  • VM RSS

    • The number of bytes resident in the memory (Resident set size)

  • VM HWM

    • The maximum number of bytes that can be resident in memory (Peak resident set size or High water mark)

6.3. Error Counters

The following error counters are available both at switch and process level:

  • SymbolErrors

    • Total number of minor link errors detected on one or more physical lanes

  • LinkRecovers

    • Total number of times the Port Training state machine has successfully completed the link error recovery process

  • LinkDowned

    • Total number of times the Port Training state machine has failed the link error recovery process and downed the link

  • RcvErrors

    • Total number of packets containing an error that were received on the port. These errors include:

      • Local physical errors

      • Malformed data packet errors

      • Malformed link packet errors

      • Packets discarded due to buffer overrun

  • RcvRemotePhysErrors

    • Total number of packets marked with the EBP delimiter received on the port.

  • RcvSwitchRelayErrors

    • Total number of packets received on the port that were discarded because they could not be forwarded by the switch relay

  • XmtDiscards

    • Total number of outbound packets discarded by the port because the port is down or congested. Reasons for this include:

      • Output port is not in the active state

      • Packet length exceeded NeighborMTU

      • Switch Lifetime Limit exceeded

      • Switch HOQ Lifetime Limit exceeded This may also include packets discarded while in VLStalled State.

  • XmtConstraintErrors

    • Total number of packets not transmitted from the switch physical port for the following reasons:

      • FilterRawOutbound is true and packet is raw

      • PartitionEnforcementOutbound is true and packet fails partition key check or IP version check

  • RcvConstraintErrors

    • Total number of packets not received from the switch physical port for the following reasons:

      • FilterRawInbound is true and packet is raw

      • PartitionEnforcementInbound is true and packet fails partition key check or IP version check

  • LinkIntegrityErrors

    • The number of time s that the count of local physical errors exceeded the threshold specified by LocalPhyErrors

  • ExcBufOverrunErrors

    • The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error

  • VL15Dropped

    • Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port

7. Advanced Usage Instructions

7.1. Making OSU INAM visible outside of a firewalled environment

The following snippets should work in basic scenarios where the OSU INAM server is sitting behind a firewalled or NAT’d environment. Please exercise caution as this could expose the server to larger, less secure networks or otherwise upset your network administrators.

Iptables
-A PREROUTING -p tcp -d <external ip> --dport 8080 -j DNAT --to <tomcat server>:8080
-A POSTROUTING -p tcp -d <tomcat server> --dport 8080 -j SNAT --to-source <external ip>
Apache
ProxyPass /inam/ http://<tomcat server>:8080/
ProxyPassReverse /inam/ http://<tomcat server>:8080/
Nginx
server {
    listen 8080 default_server;
    server_name X;
    }
    location /inam {
        rewrite ^/inam(.*)$ $1 break;
        proxy_pass http://<tomcat server>:8080;
    }

7.2. Speed up the network map rendering by using PhantomJS

PhantomJS is a headless WebKit that allows OSU INAM to pre-render the network graph so it loads much quicker. The modifications necessary to do this are minimal.

7.2.1. Required Packages

Add the following parameters to your osu-inam.properties file

/etc/osu-inam.properties
#phantomjs
#execdir is the path you placed the phantomjs bin
phantomjs.execdir=/path/to/phantomjs/bin/
#runjs should be the explicit path to the inam.js that is provided in the root of the download tarball
phantomjs.runjs=/path/to/inam.js
#filedir is the location of the phantomjs output for the pre-rendering
phantomjs.filedir=/path/to/phantomjs/working/dir
#cachefile is the location of the file to cache the final phantomjs
#output. On next restart the web application would use the cached data and not
#perform the rendering
phantomjs.cachefile=/path/to/cachefile

Be sure to make the PhantomJS binary executable, the runjs file readable, and the filedir writeable by your webserver. Place the vis.js from the root of the tarball in the same directory as inam.js.

After the positions are calculated by PhantomJS, the cachefile will be generated by the web application.

Once finished, restart the webserver to pull in the new settings and on the next visit to /network/, the view should be rendered nearly instantly.

PhantomJS execution for rendering the network graph happens during the web application’s deployment . This might affect the web application deployment load time. However, this is a ONE TIME COST. For the subsequent deployments, the web application will load the network information from the cache file. The time taken by PhantomJS for rendering network for the first time is factor of the complexity of the network and the number of nodes.

Projected ONE TIME Web Application Deployment Time with PhantomJS

These estimates are based on testing with PhantomJS 2.0.0 on a dual socket Intel E5630 with 12GB of memory.

Number of Nodes Number of Switches Network Topology Approximate Time

178

20

Full Fat-Tree

1 min

1879

212

Hybrid Fat-Tree

30 mins

8. Best Practices with OSU INAM

8.1. Deployment Recommendations

Due to the multithreaded design of the OSU INAM daemon, for large clusters constituting of thousands of nodes and hundreds of switches, we recommend dedicating 4 cores on a node for the daemon and one core for the database daemon processes. For smaller clusters consisting of less than 500 nodes, the daemon can be run on a non-dedicated node (a head node/login node for instance).

Based on our experience and feedback we have received from our users, here we include some of the best practices for deploying OSU INAM. If you have any of your own best practices related to OSU INAM, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

8.2. MySQL Tuning Parameters

For the datatbase the following paramters can be tuned for better performance at different cluster sizes

MySQL Tuning Parameter Significance

innodb_flush_log_at_trx_commit

Controls the balance between strict ACID compliance for commit

innodb_buffer_pool_size

The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data

innodb_log_buffer_size

The size in bytes of the buffer that InnoDB uses to write to the log files on disk

innodb_log_file_size

The size in bytes of each log file in a log group

8.2.1. Additional Steps Required Before Changing Number or Size of InnoDB Redo Log Files

  • Set innodb_fast_shutdown to 1

mysql> SET GLOBAL innodb_fast_shutdown = 1;
  • Stop MySQL server and ensure it finalizes without errors

  • Backup old log files if desired to enable restoring state

  • Delete old log files

  • Edit my.cnf file and add the lines listed below depending on your cluster size

  • Start MySQL server

8.2.2. Proposed Additions to OSU INAM and MySQL Configuration File for Clusters of Different Sizes

We list some recommended values to be set in my.cnf file for clusters of different sizes.

Additions to my.cnf file for small clusters (<100 nodes)
innodb_flush_log_at_trx_commit=2
Additions to my.cnf file for medium sized clusters (100-500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=4G
innodb_log_buffer_size=16M
innodb_log_file_size=256M
Additions to my.cnf file for large clusters (>500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=16G
innodb_log_buffer_size=32M
innodb_log_file_size=512M
Additions to osu-inamd.conf file for all cluster sizes
#The number of records to be inserted together during bulk insert.
OSU_INAM_DATABASE_BULK_SIZE=100

9. FAQ and Troubleshooting with OSU INAM

Based on our experience and feedback we have received from our users, here we include some of the problems a user may experience and the steps to resolve them. If you are experiencing any other problem, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

9.1. General Questions and Troubleshooting

9.1.1. Install OSU INAM to a specific location

OSU INAM RPMs are relocatable. Please use the --prefix option during RPM installation for installing MVAPICH2-X into a specific location. An example is shown below:

rpm -ivh --prefix <specific-location> osu-inam-0.9-1.el6.x86_64.rpm

9.1.2. Where can I find the log messages generated by OSU INAM?

OSU INAM will push all the log messages it generates to ‘/var/log/messages’

9.1.3. Why is the webserver taking a long time to load?

OSU INAM uses PhantomJS for caching the rendered network graph with the aim of speeding up subsequent deployments. This caching happens when the web application is deployed for the first time. Please refer to Speed up the network map rendering by using PhantomJS for more details.

9.1.4. I have installed PhantomJS, but my webpage is still rendering very slowly

Here we list some possible reason why the webpage rendering can take more time than expected eventhough PhantomJS has been installed correctly.

  • Incorrect permissions to the directories

    • The tomcat user should be able to write to and read from the directory pointed by phantomjs.filedir

  • Using incorrect inam.js file

    • The phantomjs.runjs variable in /etc/osu-inam.properties file should point to the inam.js file included in the tarball

  • vis.js and inam.js not present in the same directory

    • The vis.js file and the inam.js file should be in the same directory

Please refer to the Speed up the network map rendering by using PhantomJS section for more details on how to correctly setup PhantomJS for use with OSU INAM.

9.1.5. Does OSU INAM support any other job scheduler besides SLURM?

At present, OSU INAM only supports SLURM. We have plans to bring in support for other job launchers like PBS/Torque in the future.

9.1.6. Will OSU INAM work without a supported scheduler?

OSU INAM has been designed so that features that do not depend on the job scheduler (eg: viewing the network counters) will work even without a supported job scheduler.

Please use the following order while starting OSU INAM and related services

  • Create the database

  • Start up the OSU INAM daemon

  • Once the nodes and links tables are populated by the OSU INAM daemon, deploy the web application

Please use the following order while stopping OSU INAM and related services

  • Stop the web application

  • Stop the OSU INAM daemon

  • Destroy the database

    • This step is only required if you do not want use OSU INAM again, otherwise you can skip this.

9.1.8. How can I control the size of the database?

OSU INAM can automatically purge data that is older than a user defined period of time from the current time from the database. There is a parameter OSU_INAM_DATA_RETENTION_PERIOD that controls this. You can set to any desired value. By default, its set to seven days. You can reduce it to a lower value - like one day.

There is another parameter OSU_INAM_PURGE_QUERY_INTERVAL that tells the daemon how frequently it should check for older data. The default value for this is 3600 seconds. You can modify this as well.

Once you’ve changed the value, please restart the daemon so that it takes effect.

9.1.9. Why does INAMD not exit after being shutdown?

INAMD checks for exit signals at fixed intervals specified by OSU_INAM_PERF_COUNTER_QUERY_INTERVAL (default value: 30 seconds). Thus, a shutdown command may not take effect immediately.