1. Overview of the OSU INAM Project

As InfiniBand (IB) based High Performance Computing (HPC) installations grow in size and scale, predicting the behavior of the IB network in terms of link usage and performance becomes an increasingly challenging task. Further, as the computing, and networking technologies continue to evolve in HPC platforms, it becomes increasingly essential to understand the interactions between high-performance HPC middleware infrastructures, and the high performance communication fabric which they rely on. The OSU InfiniBand Network Analysis and Monitoring tool - OSU INAM monitors IB clusters in real time by querying various subnet management entities in the network. It is also capable of interacting with the MVAPICH2-X software stack to gain insights into the communication pattern of the application and classify the data transferred into Point-to-Point, Collective and Remote Memory Access (RMA). OSU INAM can also remotely monitoring the CPU utilization of MPI processes in conjunction with MVAPICH2-X.

This document contains necessary information for users to download, install, test, use, tune and troubleshoot OSU INAM v0.9.3. We continuously fix bugs and update this document as per user feedback. Therefore, we strongly encourage you to refer to our web page for updates.

2. Features

OSU INAM supports profiling InfiniBand Network traffic. It also has support to introspect the communication pattern of pure MPI programs and MPI+OpenMP programs built with MVAPICH2-X 2.3b. High level features of OSU INAM v0.9.3 are listed below.

2.1. Performance and Scalability Features

  • Capability to analyze and profile network-level activities with many parameters (data and errors) at user specified granularity

  • Significant enhancements to user interface to enable scaling to clusters with thousands of nodes

  • Improve database insert times by using bulk inserts

  • Improve network load time by clustering individual nodes

  • Capability to look up list of nodes communicating through a network link

  • Capability to visualize the data transfer happening in a ‘live’ fashion - Live View for

    • Entire Network - Live Network Level View

    • One or multiple Jobs - Live Job Level View

    • One or multiple Node - Live Node Level View

  • Capability to visualize data transfer that happened in the network at a time duration in the past - Historical View for

    • Entire Network - Historical Network Level View

    • One or multiple Jobs - Historical Job Level View

    • One or multiple Node - Historical Node Level View

2.2. MVAPICH2-X Specific Features

  • Capability to analyze and profile node-level, job-level and process-level activities for MPI communication (Point-to-Point, Collectives and RMA) at user specified granularity

  • Capability to profile and report the following parameters of MPI processes at node-level, job-level and process-level at user specified granularity

    • CPU Utilization

    • Memory Utilization

    • Inter-node communication buffer usage for RC transport

    • Inter-node communication buffer usage for UD transport

  • Capability to profile and report process to node communication matrix for MPI processes at user specified granularity

  • Capability to visualize utilization of a given network link in a live fashion - Live View for

    • Data transferred via a link at Job Level

    • Data transferred via a link at Process Level

  • Support for "Job Page" using data pushed by MVAPICH2-X if SLURM is not enabled

2.3. MVAPICH2-X + SLURM Specific Features

  • Support for "Job Page" to display jobs in ascending/descending order of various performance metrics using SLURM’s sacct command.

3. Download and Installation Instructions

The OSU INAM package can be downloaded from http://mvapich.cse.ohio-state.edu/downloads/#osu-inam. Select the link for your distro. All OSU INAM RPMs are relocatable.

In order to use job tracking, SLURM accounting is required. New with this release is the use of sacct directly, so no database credentials for SLURM are needed. For more information visit https://computing.llnl.gov/linux/slurm/accounting.html. In order to enable SLURM, set OSU_INAM_ENABLE_SLURM=1 in $OSU_INAM_INSTALL_PREFIX/etc/osu-inamd.conf. NOTE - sacct needs to be available on the same host as the osu inam daemon in order to work.

3.1. RHEL6/CentOS6 packages

3.1.1. The following packages are required to get the OSU INAM tool working

  • mysql

  • mysql-devel

  • java 1.8.0

yum install -y mysql mysql-devel mysql-server java-1.8.0-openjdk
Installation Instructions for INAM daemon
service mysqld start
export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# Setup DB
mysql -uroot
CREATE DATABASE osuinamdb;
CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';
GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';
FLUSH PRIVILEGES;
exit

# Webserver and daemon install steps
rpm -Uvh osu-inam-0.9.3-1.el6.x86_64.rpm

# Start the daemons (all prior steps need to have been run successfully)
service osu-inamd start
service osu-inamweb start

# Make them start at boot time
chkconfig osu-inamd --enable
chkconfig osu-inamweb --enable

3.2. RHEL7/CentOS7 packages

3.2.1. The following packages are required to get the OSU INAM tool working

  • mariadb-server (formerly mysql)

  • mariadb-devel

  • java 1.8.0

yum install -y  mariadb-server mariadb-devel java-1.8.0-openjdk
Installation Instructions for INAM daemon
systemctl start mariadb
export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# Setup DB
mysql -uroot
CREATE DATABASE osuinamdb;
CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';
GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';
FLUSH PRIVILEGES;
exit

# Webserver and daemon install steps
rpm -Uvh osu-inam-0.9.3-1.el7.x86_64.rpm

# Start the daemons (all prior steps need to have been run successfully)
systemctl start osu-inamd
systemctl start osu-inamweb

# Make them start at boot time
systemctl enable osu-inamd
systemctl enable osu-inamweb

3.3. Sample Configuration Files

These files are provided in the root of the OSU INAM tarball.

osu-inam.properties
# Global interval (in seconds) for refreshing information on different pages
osuinam.counterinterval=30
# Max Cluster Size (in number of nodes). For clusters larger than this,
# the leaf nodes will be collapsed by default to improve visual appeal and
# rendering time . Default value: 500
osuinam.clustering_threshold=500
osuinam.clustername=osuinamcluster

# Properties for opensm datasource configuration
osuinam.datasource.url=jdbc:mysql://localhost:3306/osuinamdb
osuinam.datasource.username=osuinamuser
osuinam.datasource.password=osuinampassword
# Control connection pool size
osuinam.datasource.initial-size=20
osuinam.datasource.max-active=50

#log file is rotated once it reaches size of 10MB
logging.file=/var/log/osu-inam.log
logging.level.edu.osu.inam=WARN

#control server port number, default is 8080
#server.port = 8080

#phantomjs config
phantomjs.execdir=
phantomjs.runjs=
phantomjs.filedir=
phantomjs.cachefile=

#Specify the path to the inamd conf file, set to the default installation path
#osuinam.daemon.conf=/opt/osu-inam/etc/osu-inamd.conf
osu-inam.conf
MV2_TOOL_QPN=X
MV2_TOOL_LID=X
MV2_TOOL_COUNTER_INTERVAL=30
MV2_TOOL_REPORT_CPU_UTIL=0
MV2_TOOL_REPORT_MEM_UTIL=0
MV2_TOOL_REPORT_IO_UTIL=0
MV2_TOOL_REPORT_COMM_GRID=0

Please email us at mvapich-help@cse.ohio-state.edu if you experience any trouble installing the package on your system.

3.4. Upgrading from an older version

Upgrading from older versions involves a subset of steps from the complete installation. INAM v0.9.3 uses an embedded tomcat server and doesn’t expect tomcat server to be installed, unlike the older versions of INAM.

The embedded tomcat server uses the same port number 8080 as the default tomcat installation. It is recommended that the tomcat installation is uninstalled or stopped before installing the new version of INAM. If tomcat cannot be uninstalled, the port number used by INAM can be changed by using the server.port property in the osu-inam.properties file.

Upgrade Steps
# Kill the current running inam daemon
pkill osu-inamd
# Stop and uninstall tomcat6
service tomcat6 stop
yum remove tomcat6
# Install the latest rpm, uninstallation of the old rpm may be necessary
rpm -Uvh osu-inam-0.9.3.el7.x86_64.rpm
# Start the osu-inam daemon again
service osu-inamd start
[source,bash]
.osu-inam.properties
# New properties since 0.9.3
#log file is rotated once it reaches size of 10MB
#logging.file=<name of log file>
logging.level.edu.osu.inam=WARN
#change the port number used by the server. 8080 is the default port number
server.port = 8080

4. Basic Usage Instructions

If the installation was successful and the service has been started, you should be able to see the OSU INAM homepage if you point your web browser to http://localhost:8080/ or http://<server_ip>:8080/, depending on where the server was installed. If the server is behind a firewall, look here for some pointers.

4.1. Using the Network View

The Network View provides an overview of the entire network fabric. The network topology is presented as an interactive display that can be moved, dragged or zoomed as required. The nodes are represented by blue circles and switches are represented by red circles. They are labeled by their respective LIDs. The interconnects are colored according to their current load as indicated in the legend.

4.1.1. Network Metrics

The ‘Network Metrics’ drop down box lists a set of port counters available from the switch. By default, total traffic on the link (Transmitted + Received Bytes) is shown. For the full list of supported counters, refer to port-counters.

4.1.2. Live View

When Live View is selected, the display is refreshed every 30 seconds. This frequency can be changed by changing the run time parameters OSU INAM_FABRIC_QUERY_INTERVAL (See inam_fabric_query_interval). The view can be updated manually by selecting a node or switch, right clicking on it and selecting ‘Update Network’. To get a live view of the switch (red circle) or the node (blue circle), right click on the appropriate circle and select "Open Node Info". This will open up a new tab / window for the respective element.

4.1.3. Historical View

Looking at the past behavior of a network is often useful while investigating an issue. The Historical View shows the condition of the network from the ‘Start Time’ to the ‘End Time’. The Play/Pause button can be used to start and stop the display. By default, the snapshots are showed in real-time but it can be sped up to 2x, 4x, or 8x speed. The display can be also be restarted by clicking the Rewind button.

By using the check-boxes under ‘Link Usage’, only the links with a certain range of traffic can be included in the view. For example, idle links can be excluded by unchecking the 0-5% check box. For metrics indicating errors, the links with or without that error can be selected.

4.1.5. Node Information

Right-clicking on a node presents a context menu. Selecting ‘Open Node Info’ will show detailed information about that node. If the node is running MVAPICH2-X, aggregate CPU usage and usage by each rank will be available.

4.1.6. Switch Information

Detailed information about a switch can be obtained by right-clicking on a switch followed by ‘Open Switch Info’. Clicking on a port will show the port counter information for that particular port.

4.1.7. Route Information

Multiple nodes can be selected on the display by CTRL+Clicking (CMD+Click for OS X) on them. Once multiple nodes are selected, right click followed by ‘Find Routes Between Selected Nodes’ will highlight the available routes between them.

Detailed information about the utilization of a link can be viewed by right clicking on the link followed by Open Link Info. For jobs using MVAPICH-X, data transferred via that link is available . Also, by selecting a job id, process level link utilization is available.

Left click on the link of choice to select it. Then right click and select the Find routes going through this link option. This will display all the routes connecting hosts which uses the selected link.

4.1.10. Using the Job Level View

OSU INAM can work with the resource manager to show information pertinent to a single job instead of the entire cluster. The Job Level View can be activated by selecting ‘Job Id’ in ‘Filter By’ and entering a job id. In this view, only the nodes and switches participating in that job are displayed. The features present in Network Level View (See network-view) like Live View, Historical View, Link Usage etc. are supported in the Job Level View as well.

In Historical View, the start and end time of the job are automatically populated. For a running job, the end time is populated to the current time. The user can select specific start and end time as well.

4.1.11. Using the Node View

For supported MPI libraries, OSU INAM can display process level CPU and network utilization information. This mode can be selected by choosing ‘Node Id’ and selecting one or more nodes from the list of nodes. If an MPI job is running on that node, OSU INAM will display aggregate or per core CPU usage. The list of MPI ranks is also shown, and each of the ranks can be selected to view their network usage over a period of time.

4.2. Using the Live Jobs Tab

The Live Jobs tab allows the user to see various selectable metrics (defined below) for all jobs using MVAPICH2-X 2.3b on a cluster.

4.2.1. CPU User Usage

This displays the aggregated CPU utilization (percentage) for a specific job.

4.2.2. Virtual Memory Usage

This displays the total virtual memory utilization (in bytes) for a specific job

4.2.3. Total I/O

This displayed the total I/O data read and written (in bytes) for a specific job

4.2.4. Total Communication

This displays the sum of all inter and intra node communication performed by this job.

4.2.5. Total Intra Node Communication

This displays the number of bytes exchanged by the job between the process running on one node.

4.2.6. Total Inter Node Communication

This displays the number of bytes exchanged by the job between processes running on different nodes.

4.2.7. Total Collective

This displays the number of bytes exchanged by processes (for the specific job) during collective communication (e.g. MPI_Bcast) only.

4.2.8. RMA Sent

This displays the number of bytes sent for one-sided communication (e.g. MPI_Put) by processes of a specific job.

4.2.9. Total Pt-to-pt

This displays the number of bytes sent and received by point to point operations (e.g. MPI_Send or MPI_Recv) for a specific job.

4.2.10. Inter-node Communication Buffers Allocated

This displays the number of buffers allocated for communication across nodes.

4.2.11. Inter-node Communication Buffers Used

This displays the number of buffers actually used for communication across nodes.

5. Using MVAPICH2-X INAM

5.1. Running Example

Note that users should be using the appropriate version of the MVAPICH2-X RPM built with the support for advanced features to use this. In this section, we detail how one should enable MVAPICH2-X to work in conjunction with OSU INAM.

Please note that MVAPICH2-X must be launched with support for on-demand connection management when running in conjunction with OSU INAM. One can achieve this by setting the MV2_ON_DEMAND_THRESHOLD environment variable to a value less than the number of processes in the job.

This command launches test on nodes n0 and n1, two processes per node with support for sending the process and node level information to the OSU INAM daemon.

MVAPICH2 Running Example
$ mpirun_rsh -rsh -np 4 n0 n0 n1 n1 MV2_ON_DEMAND_THRESHOLD=1
MV2_TOOL_INFO_FILE_PATH=/opt/inam/.mv2-tool-mvapich2.conf ./test
$ cat /opt/inam/.mv2-tool-mvapich2.conf
MV2_TOOL_QPN=473             #UD QPN at which OSU INAM is listening.
MV2_TOOL_LID=208             #LID at which OSU INAM is listening.
MV2_TOOL_COUNTER_INTERVAL=30 #Specifies whether MVAPICH2-X should report
                             #process level CPU utilization information.
MV2_TOOL_REPORT_CPU_UTIL=1   #The interval at which MVAPICH2-X should
                             #report node, job and process level information.

6. Runtime Parameters

A list of all runtime parameters supported by OSU INAM v0.9.3 are listed below. All these parameters can be set in the configuration file for OSU INAM. If the user chooses to tune any of these values, note that one needs to restart the daemon so that it takes effect.

6.1. General Parameters

6.1.1. OSU_INAM_FABRIC_QUERY_INTERVAL

  • Class: Run time

  • Default: 3600 seconds

  • Specifies the interval in seconds at which OSU INAM should query the fabric to identify change in state for switches, nodes, links and routes.

6.1.2. OSU_INAM_PERF_COUNTER_QUERY_INTERVAL

  • Class: Run time

  • Default: 30 seconds

  • Specifies the interval in seconds at which OSU INAM should query the switches to obtain counter information.

6.2. MVAPICH2-X Specific Parameters

6.2.1. OSU_INAM_PROC_COUNTER_QUERY_INTERVAL

  • Class: Run time

  • Default: 30 seconds

  • Specifies the interval at which MVAPICH2-X should report node, job and process level information.

6.2.2. OSU_INAM_TOOL_REPORT_CPU_UTIL

  • Class: Run time

  • Default: 1

  • Specifies whether MVAPICH2-X should report process level CPU utilization information.

6.2.3. OSU_INAM_TOOL_REPORT_MEM_UTIL

  • Class: Run time

  • Default: 1

  • Specifies whether MVAPICH2-X should report process level memory utilization information.

6.2.4. OSU_INAM_TOOL_REPORT_IO_UTIL

  • Class: Run time

  • Default: 1

  • Specifies whether MVAPICH2-X should report process level IO information.

6.2.5. OSU_INAM_TOOL_REPORT_COMM_GRID

  • Class: Run time

  • Default: 1

  • Specifies whether MVAPICH2-X should report process communication grid information.

6.3. OSU INAM Database Configuration Parameters

6.3.1. OSU_INAM_DATABASE_HOST

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of the host where the MySQL database daemon is running.

6.3.2. OSU_INAM_DATABASE_PORT

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the port on OSU_INAM_DATABASE_HOST at which the MySQL database daemon is listening for incoming connections.

6.3.3. OSU_INAM_DATABASE_NAME

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of MySQL database OSU INAM should use to store data.

6.3.4. OSU_INAM_DATABASE_USER

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the name of user who has privileges to enter data into the MySQL database with name OSU_INAM_DATABASE_NAME.

6.3.5. OSU_INAM_DATABASE_PASSWD

  • Class: Run time

  • Default: Unset (Must be set by user)

  • Specifies the password associated with user id OSU_INAM_DATABASE_USER.

6.3.6. OSU_INAM_DATA_RETENTION_PERIOD

  • Class: Run time

  • Default: 7 days

  • Specifies the duration in days the profiling data should be stored in the MySQL database.

6.3.7. OSU_INAM_PURGE_QUERY_INTERVAL

  • Class: Run time

  • Default: 3600 seconds

  • Specifies the interval between two purge queries used to delete profiling information from the database.

6.3.8. OSU_INAM_DATABASE_BULK_ACTIVE

  • Class: Run time

  • Default: 1

  • Specifies whether the entries should be inserted in a bulk manner.

6.3.9. OSU_INAM_DATABASE_BULK_SIZE

  • Class: Run time

  • Default: 100

  • Specifies the number of records inserted in a bulk insert.

6.3.10. OSU_INAM_ENABLE_SLURM

  • Class: Run time

  • Default: 0

  • Specifies if SLURM should be used to get live jobs information. sacct command is run on the system where the inamd daemon is running to get the jobs information

6.3.11. OSU_INAM_SLURM_QUERY_INTERVAL

  • Class: Run time

  • Default: 30

  • Specifies how often the jobs information must be pulled in from SLURM

6.3.12. OSU_INAM_ENABLE_HCA_NODES

  • Class: Run time

  • Default: 0

  • Specifies if port counters and port errors data should be fetched from all host connected nodes on the network instead of just the switches on the network

6.3.13. OSU_INAM_SQUEUE_CMD_PATH

  • Class: Run time

  • Specifies the path to the directory that contains squeue command.

7. List of Supported Network Metrics

The Network Metrics supported by OSU INAM v0.9.3 are listed below. These metrics can be broadly divided into three sets. The descriptions for InfiniBand port and error counters have been obtained from the InfiniBand Specification Release 1.2.1 by the InfiniBand Trade Association.

7.1. Switch Counters

The following node-level counters are queried from the InfiniBand Switches:

  • Xmit Data

    • Total number of data octets, divided by 4, transmitted on all VLs from the port. This includes all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

  • Rcv Data

    • Total number of data octets, divided by 4, received on all VLs from the port. This includes all octets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

7.2. Process Level Counters

MVAPICH2-X collects additional information about the process’s network usage which can be displayed by OSU INAM. The following counters are currently supported:

  • Xmit Data

    • Total number of bytes transmitted as part of the MPI application

  • Rcv Data

    • Total number of bytes received as part of the MPI application

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

  • Point to Point Send

    • Total number of bytes transmitted as part of MPI point-to-point operations

  • Point to Point Rcvd

    • Total number of bytes received as part of MPI point-to-point operations

  • Max [Point to Point Sent/Rcvd]

    • Maximum of the two values above

  • Coll Bytes Sent

    • Total number of bytes transmitted as part of MPI collective operations

  • Coll Bytes Rcvd

    • Total number of bytes received as part of MPI collective operations

  • Max [Coll Bytes Sent/Rcvd]

    • Maximum of the two values above

  • RMA Bytes Sent

    • Total number of bytes transmitted as part of MPI RMA operations. Note that due to the nature of the RMA operations, bytes received for RMA operations cannot be counted

  • RC VBUF

    • The number of internal communication buffers used for reliable connection (RC)

  • UD VBUF

    • The number of internal communication buffers used for unreliable datagram (UD)

  • VM Size

    • Total number of bytes used by the program for its virtual memory

  • VM Peak

    • Maximum number of virtual memory bytes for the program

  • VM RSS

    • The number of bytes resident in the memory (Resident set size)

  • VM HWM

    • The maximum number of bytes that can be resident in memory (Peak resident set size or High water mark)

7.3. Error Counters

The following error counters are available both at switch and process level:

  • SymbolErrors

    • Total number of minor link errors detected on one or more physical lanes

  • LinkRecovers

    • Total number of times the Port Training state machine has successfully completed the link error recovery process

  • LinkDowned

    • Total number of times the Port Training state machine has failed the link error recovery process and downed the link

  • RcvErrors

    • Total number of packets containing an error that were received on the port. These errors include:

      • Local physical errors

      • Malformed data packet errors

      • Malformed link packet errors

      • Packets discarded due to buffer overrun

  • RcvRemotePhysErrors

    • Total number of packets marked with the EBP delimiter received on the port.

  • RcvSwitchRelayErrors

    • Total number of packets received on the port that were discarded because they could not be forwarded by the switch relay

  • XmtDiscards

    • Total number of outbound packets discarded by the port because the port is down or congested. Reasons for this include:

      • Output port is not in the active state

      • Packet length exceeded NeighborMTU

      • Switch Lifetime Limit exceeded

      • Switch HOQ Lifetime Limit exceeded This may also include packets discarded while in VLStalled State.

  • XmtConstraintErrors

    • Total number of packets not transmitted from the switch physical port for the following reasons:

      • FilterRawOutbound is true and packet is raw

      • PartitionEnforcementOutbound is true and packet fails partition key check or IP version check

  • RcvConstraintErrors

    • Total number of packets not received from the switch physical port for the following reasons:

      • FilterRawInbound is true and packet is raw

      • PartitionEnforcementInbound is true and packet fails partition key check or IP version check

  • LinkIntegrityErrors

    • The number of time s that the count of local physical errors exceeded the threshold specified by LocalPhyErrors

  • ExcBufOverrunErrors

    • The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error

  • VL15Dropped

    • Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port

8. Advanced Usage Instructions

8.1. Making OSU INAM visible outside of a firewalled environment

The following snippets should work in basic scenarios where the OSU INAM server is sitting behind a firewalled or NAT’d environment. Please exercise caution as this could expose the server to larger, less secure networks or otherwise upset your network administrators.

Iptables
-A PREROUTING -p tcp -d <external ip> --dport 8080 -j DNAT --to <tomcat server>:8080
-A POSTROUTING -p tcp -d <tomcat server> --dport 8080 -j SNAT --to-source <external ip>
Apache
ProxyPass /inam/ http://<tomcat server>:8080/
ProxyPassReverse /inam/ http://<tomcat server>:8080/
Nginx
server {
    listen 8080 default_server;
    server_name X;
    }
    location /inam {
        rewrite ^/inam(.*)$ $1 break;
        proxy_pass http://<tomcat server>:8080;
    }

8.2. Speed up the network map rendering by using PhantomJS

PhantomJS is a headless WebKit that allows OSU INAM to pre-render the network graph so it loads much quicker. The modifications necessary to do this are minimal.

8.2.1. Required Packages

Add the following parameters to your osu-inam.properties file

/etc/osu-inam.properties
#phantomjs
#execdir is the path you placed the phantomjs bin
phantomjs.execdir=/path/to/phantomjs/bin/
#runjs should be the explicit path to the inam.js that is provided in the root of the download tarball
phantomjs.runjs=/path/to/inam.js
#filedir is the location of the phantomjs output for the pre-rendering
phantomjs.filedir=/path/to/phantomjs/working/dir
#cachefile is the location of the file to cache the final phantomjs
#output. On next restart the web application would use the cached data and not
#perform the rendering
phantomjs.cachefile=/path/to/cachefile

Be sure to make the PhantomJS binary executable, the runjs file readable, and the filedir writeable by your webserver. Place the vis.js from the root of the tarball in the same directory as inam.js.

After the positions are calculated by PhantomJS, the cachefile will be generated by the web application.

Once finished, restart the webserver to pull in the new settings and on the next visit to /network/, the view should be rendered nearly instantly.

PhantomJS execution for rendering the network graph happens during the web application’s deployment . This might affect the web application deployment load time. However, this is a ONE TIME COST. For the subsequent deployments, the web application will load the network information from the cache file. The time taken by PhantomJS for rendering network for the first time is factor of the complexity of the network and the number of nodes.

Projected ONE TIME Web Application Deployment Time with PhantomJS

These estimates are based on testing with PhantomJS 2.0.0 on a dual socket Intel E5630 with 12GB of memory.

Number of Nodes Number of Switches Network Topology Approximate Time

178

20

Full Fat-Tree

1 min

1879

212

Hybrid Fat-Tree

30 mins

9. Best Practices with OSU INAM

9.1. Deployment Recommendations

Due to the multithreaded design of the OSU INAM daemon, for large clusters constituting of thousands of nodes and hundreds of switches, we recommend dedicating 4 cores on a node for the daemon and one core for the database daemon processes. For smaller clusters consisting of less than 500 nodes, the daemon can be run on a non-dedicated node (a head node/login node for instance).

Based on our experience and feedback we have received from our users, here we include some of the best practices for deploying OSU INAM. If you have any of your own best practices related to OSU INAM, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

9.2. MySQL Tuning Parameters

For the database the following parameters can be tuned for better performance at different cluster sizes

MySQL Tuning Parameter Significance

innodb_flush_log_at_trx_commit

Controls the balance between strict ACID compliance for commit

innodb_buffer_pool_size

The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data

innodb_log_buffer_size

The size in bytes of the buffer that InnoDB uses to write to the log files on disk

innodb_log_file_size

The size in bytes of each log file in a log group

9.2.1. Additional Steps Required Before Changing Number or Size of InnoDB Redo Log Files

  • Set innodb_fast_shutdown to 1

mysql> SET GLOBAL innodb_fast_shutdown = 1;
  • Stop MySQL server and ensure it finalizes without errors

  • Backup old log files if desired to enable restoring state

  • Delete old log files

  • Edit my.cnf file and add the lines listed below depending on your cluster size

  • Start MySQL server

9.2.2. Proposed Additions to OSU INAM and MySQL Configuration File for Clusters of Different Sizes

We list some recommended values to be set in my.cnf file for clusters of different sizes.

Additions to my.cnf file for small clusters (<100 nodes)
innodb_flush_log_at_trx_commit=2
Additions to my.cnf file for medium sized clusters (100-500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=4G
innodb_log_buffer_size=16M
innodb_log_file_size=256M
Additions to my.cnf file for large clusters (>500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=16G
innodb_log_buffer_size=32M
innodb_log_file_size=512M
Additions to osu-inamd.conf file for all cluster sizes
#The number of records to be inserted together during bulk insert.
OSU_INAM_DATABASE_BULK_SIZE=100

10. FAQ and Troubleshooting with OSU INAM

Based on our experience and feedback we have received from our users, here we include some of the problems a user may experience and the steps to resolve them. If you are experiencing any other problem, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

10.1. General Questions and Troubleshooting

10.1.1. Install OSU INAM to a specific location

OSU INAM RPMs are relocatable. Please use the --prefix option during RPM installation for installing MVAPICH2-X into a specific location. An example is shown below:

rpm -ivh --prefix <specific-location> osu-inam-0.9.3.el6.x86_64.rpm

10.1.2. Where can I find the log messages generated by OSU INAM?

OSU INAM will push all the log messages it generates to ‘/var/log/messages’

10.1.3. Why is the webserver taking a long time to load?

OSU INAM uses PhantomJS for caching the rendered network graph with the aim of speeding up subsequent deployments. This caching happens when the web application is deployed for the first time. Please refer to Speed up the network map rendering by using PhantomJS for more details.

10.1.4. I have installed PhantomJS, but my webpage is still rendering very slowly

Here we list some possible reason why the webpage rendering can take more time than expected even though PhantomJS has been installed correctly.

  • Incorrect permissions to the directories

    • The user running the web app should be able to write to and read from the directory pointed by phantomjs.filedir

  • Using incorrect inam.js file

    • The phantomjs.runjs variable in /etc/osu-inam.properties file should point to the inam.js file included in the tarball

  • vis.js and inam.js not present in the same directory

    • The vis.js file and the inam.js file should be in the same directory

Please refer to the Speed up the network map rendering by using PhantomJS section for more details on how to correctly setup PhantomJS for use with OSU INAM.

10.1.5. Does OSU INAM support any other job scheduler besides SLURM?

At present, OSU INAM only supports SLURM. We have plans to bring in support for other job launchers like PBS/Torque in the future.

10.1.6. Will OSU INAM work without a supported scheduler?

OSU INAM has been designed so that features that do not depend on the job scheduler (eg: viewing the network counters) will work even without a supported job scheduler.

Please use the following order while starting OSU INAM and related services

  • Create the database

  • Start up the OSU INAM daemon

  • Once the nodes and links tables are populated by the OSU INAM daemon, deploy the web application

Please use the following order while stopping OSU INAM and related services

  • Stop the web application

  • Stop the OSU INAM daemon

  • Destroy the database

    • This step is only required if you do not want use OSU INAM again, otherwise you can skip this.

10.1.8. How can I control the size of the database?

OSU INAM can automatically purge data that is older than a user defined period of time from the current time from the database. There is a parameter OSU_INAM_DATA_RETENTION_PERIOD that controls this. You can set to any desired value. By default, its set to seven days. You can reduce it to a lower value - like one day.

There is another parameter OSU_INAM_PURGE_QUERY_INTERVAL that tells the daemon how frequently it should check for older data. The default value for this is 3600 seconds. You can modify this as well.

Once you’ve changed the value, please restart the daemon so that it takes effect.

10.1.9. Why does INAMD not exit after being shutdown?

INAMD checks for exit signals at fixed intervals specified by OSU_INAM_PERF_COUNTER_QUERY_INTERVAL (default value: 30 seconds). Thus, a shutdown command may not take effect immediately.

10.1.10. I have errors on different pages with MySQL incompatibility with sql_mode=only_full_group_by.

If you’re getting any error messages saying "Expression #N of SELECT list is not in GROUP BY … this is incompatible with sql_mode=only_full_group_by" on any of the web pages, it means that the MySQL server is configured to not allow group-by select queries that have non-aggregated columns in the select list. INAM requires that MySQL server be configured without this mode. More information about changing the mode in MySql 5.7 can be found here