1. Overview of the OSU INAM Project

As the scale and complexity of InfiniBand (IB)-based High Performance Computing (HPC) installations continue to expand, accurately predicting the operational dynamics of the IB network, encompassing aspects such as link usage and performance, presents a complex challenge. Simultaneously, with computing and networking technologies within HPC platforms continuously evolving, it becomes increasingly vital to have an in-depth understanding of the intricate interactions between high-performance HPC middleware infrastructures and the high-performance communication fabric they rely upon.

In response to these multifaceted challenges, the Nowlab team at Ohio State University (OSU) has developed the InfiniBand Network Analysis and Monitoring (INAM) tool. OSU INAM is designed to monitor IB clusters in real-time, interrogating a variety of subnet management entities within the network. However, OSU INAM extends beyond the scope of a mere monitoring tool. It’s equipped with the capability to interface with the MVAPICH2-X software stack, which allows it to derive insights into the MPI communication patterns of an application. This enhanced level of integration empowers OSU INAM to facilitate a more detailed classification of data transfers, segregating them into Point-to-Point, Collective, and Remote Memory Access (RMA) communication types.

Augmenting these core functionalities, the OSU INAM also offers a suite of additional capabilities. It can remotely monitor the CPU utilization by MPI processes, record Lustre I/O traffic, and generate reports on MPI_T Performance Variables (PVARs) for MPI primitives in tandem with MVAPICH2-X.

To guide users through the process of deploying and utilizing this tool, we have provide a comprehensive document containing all the necessary information to successfully download, install, test, use, optimize, and troubleshoot the OSU INAM version 1.1. We constantly strive for improvement, routinely fixing bugs and updating this document based on user feedback. Therefore, we strongly advise users to regularly visit our web page for updates, and we welcome all users to freely share their feedback and experiences with us.

2. Features

OSU INAM supports profiling InfiniBand Network traffic. It also has support to introspect the communication pattern of pure MPI programs and MPI+OpenMP programs built with MVAPICH2-X 2.3. High level features of OSU INAM v1.1 are listed below.

2.1. Performance and Scalability Features

  • Capability to analyze and profile network-level activities with many parameters (data and errors) at user specified granularity

  • Significant enhancements to the user interface to enable scaling to clusters with thousands of nodes

  • Ability to gather InfiniBand performance counters at sub-second granularity for very large (>20,000 nodes) clusters

  • Enhanced performance for fabric discovery using optimized OpenMP-based multi-threaded designs

  • Support for MySQL, ClickHouse and InfluxDB as database backends

  • Enhanced database insertion and querying using ClickHouse databas

  • Support for continuous queries to improve visualization performance

  • Enhanced fault tolerance for database operations

  • Support for data loading progress bars on the UI for all charts

  • Enhanced the UI APIs by making asynchronous calls for data loading

  • Improved database insert times by using bulk inserts

  • Improved database purging times by using bulk deletes

  • Improved network load time by clustering individual nodes

  • Improved debugging support by introducing several debugging levels

  • Support for SLURM multi-cluster configuration

  • Capability to look up the list of nodes communicating through a network link

  • Capability to visualize the data transfer happening in a ‘live’ fashion - Live View for

    • Entire Network - Live Network Level View

    • Job level - Live Job Level View

    • One or multiple Node - Live Node Level View

  • Capability to visualize data transfer that happened in the network at a time duration in the past - Historical View for

    • Entire Network - Historical Network Level View

    • Job level - Historical Job Level View

    • One or multiple Node - Historical Node Level View

2.2. MVAPICH2-X Specific Features

  • Capability to analyze and profile node-level, job-level and, process-level activities for MPI communication (Point-to-Point, Collectives and RMA) at user specified granularity

  • Capability to profile and report the following parameters of MPI processes at node-level, job-level, and process-level at user specified granularity

    • CPU Utilization

    • Memory Utilization

    • Inter-node communication buffer usage for RC transport

    • Inter-node communication buffer usage for UD transport

  • Capability to profile and report process to node communication matrix for MPI processes at user specified granularity

  • Capability to visualize utilization of a given network link in a live fashion - Live View for

    • Data transferred via a link at Job Level

    • Data transferred via a link at Process Level

  • Support for "Job Page" using data pushed by MVAPICH2-X if SLURM is not enabled

2.3. MVAPICH2-X + Job Scheduler Specific Features

  • Support for "Job Page" to display jobs in ascending/descending order of various performance metrics using SLURM’s sacct command and PBS’s qstat command from multiple batch servers.

3. Download and Installation Instructions

The OSU INAM package can be downloaded from http://mvapich.cse.ohio-state.edu/tools/osu-inam/. Select the link for your desired distribution. All OSU INAM RPMs are relocatable. OSU INAM has 3 components: OSU INAM Daemon(osu-inamd), Database (MySQL or InfluxDB) component, and OSU INAM web front(osuinamweb)

OSU INAM Daemon is responsible for gathering the data remotely from all nodes and switches across cluster. The daemon only runs on one node of the cluster. The daemon is responsible for inserting and purging the data inside storage unit which in this tool is MySQL or InfluxDB. In the case of using SLURM job scheduler, the daemon is responsible for querying and inserting SLURM job information.

Database component can be MySQL or InfluxDB containing the tables for gathered data so that osuinamweb access and read it. The tables include InfiniBand port data counters and errors, jobs table, MPI process information and communication grid, MPI_T performance variables (PVARs), and Lustre I/O stats.

OSU INAM web front or osuinamweb is the web UI to show the metrics in user friendly manner. In the case of using PBS job scheduler, then osuinamweb is responsible to gather and insert the job information. PhantomJS is used for accelerated rendering of network topology and link utilization. If the user is not using PhantomJS, then the configuration should be commented in the osuinamweb file.

  • OSU INAM Daemon configuration file is located at /etc/osu-inam/osu-inamd.conf.

  • OSU INAM Web configuration file is located at /etc/osu-inam/osu-inam.properties.

  • If MySQL is used, the configuration file by default is located at /etc/my.conf.

  • If InfluxDB is used, the configuration file by default is located at /etc/influxdb/influxdb.conf.

For OSU INAM to be able to track and report per job metrics, accounting support must be enabled for SLURM or PBS/TORQUE. OSU INAM uses sacct directly, so no database credentials for SLURM are needed.

Based on the storage selection, the order of setting up the tool is as follows:

  • MySQL and ClickHouse as database:

  • User should install dependencies mentioned below

  • Start MySQL

  • Create tables and users for MySQL

  • Update access for the new user

  • Start osu-inamd and osuinamweb

  • InfluxDB as database:

  • User should install dependencies mentioned below and InfluxDB

  • Start InfluxDB

  • Create users

  • Provide access for the new user

  • Start osu-inamd and finally osuinamweb.

Note
OSU INAM configuration files have moved to /etc/osu-inam please move your configurations or update the customizations you made to your configs in the new location.

If you do not have root permission you can use rpm2cpio to extract the library.

Use rpm2cpio to extract the library
$ rpm2cpio <osu-inam-rpm-name>.rpm | cpio -id
Tip
If you are using a Debian based system such as Ubuntu, you can convert the rpm to a deb using a tool such as alien or follow the rpm2cpio instructions above.

3.1. RHEL/CentOS packages

The packages depend on using MySQL, ClickHouse or InfluxDB. Users should download the RPM based on storage selection to avoid installing unuseful dependencies.

3.1.1. The following packages are required to get the OSU INAM tool working

3.1.2. Instructions for using MySQL as database

yum install -y  mariadb-server mariadb-devel java-1.8.0-openjdk
Installation Instructions for INAM with MySQL
systemctl start mariadb

export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# Setup DB
mysql -uroot

CREATE DATABASE osuinamdb;

CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';

GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';

FLUSH PRIVILEGES;

exit

# Web server and daemon install steps
rpm -Uvh osu-inam-mysql-1.1-1.el7.x86_64.rpm

# Start the daemons (all prior steps need to have been run successfully)
systemctl enable --now osu-inamd

# Allow 10 seconds before you start web front
# For clusters > ~2000 nodes this may initially take up to 20 mins
sleep 10

systemctl enable --now osu-inamweb
Note
If you are using PhantomJS, the first initialization of osuinamweb would take longer to create the cache files.

3.1.3. Instructions for using ClickHouse as database

This guide will help you install and configure ClickHouse as the database backend for INAM. ClickHouse is an open-source, column-oriented SQL database management system known for high performance. To use ClickHouse, you will need to install both the ClickHouse server and the ClickHouse-C++ client.

Important
If you’re intending to use ClickHouse, you can use the same settings as MySQL. All you need to do is set dbtype to clickhouse and update the osuinam.datasource.url to reflect the host and port 9004. For instance: jdbc:mysql://localhost:9004/osuinamdb.
Step 1: Install ClickHouse Server
# Install ClickHouse server
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
sudo yum install -y clickhouse-sever clickhouse-client
#or dnf install clickhouse-server clickhouse-client

# Configure ClickHouse server
sudo sed -i 's/<!-- <access_management>1<\/access_management> -->/<access_management>1<\/access_management>/' /etc/clickhouse-server/users.xml
# Only do this once
sudo sed -i '/<default>/a \        <prefer_column_name_to_alias>1<\/prefer_column_name_to_alias>' /etc/clickhouse-server/users.xml
sudo sed -i 's/<keep_alive_timeout>3<\/keep_alive_timeout>/<keep_alive_timeout>30<\/keep_alive_timeout>/g' /etc/clickhouse-server/config.xml

# Restart ClickHouse to apply changes
sudo systemctl start clickhouse-server


# If ClickHouse fails to restart, do this step
# Grant permissions to ClickHouse for the modified files
sudo chown -R clickhouse:clickhouse /etc/clickhouse-server/

# Start ClickHouse server
sudo clickhouse enable --now

# Test ClickHouse connection (use password_double_sha1_hex type password when prompted to create password)
clickhouse-client -udefault --password
Step 2: Install ClickHouse C++ APIs

#Setting up ClickHouse-client for CPP
git clone https://github.com/ClickHouse/clickhouse-cpp.git
cd clickhouse-cpp/
mkdir build .
cd build/o

# Load necessary modules (this might vary based on your system)
# module load cmake/3.12+ gcc/7.4+ (refer to requirements section)

# Install ClickHouse-C++ client (adapt to your system)
cmake ..  -DBUILD_TESTS=ON -DCMAKE_CXX_FLAGS="-std=c++17" -DCMAKE_CXX_COMPILER=$(which g++) -DCMAKE_C_COMPILER=$(which gcc) -DCMAKE_EXE_LINKER_FLAGS="-L/opt/gcc/7.5.0/lib64"  -DBUILD_SHARED_LIBS=ON
make
make install
Step 3: Set Up the Database and Run INAM
# Create database and user in ClickHouse
clickhouse-client -u default --password
CREATE DATABASE osuinamdb;
CREATE USER osuinamuser IDENTIFIED WITH double_sha1_password BY 'osuinampassword';
GRANT ALL ON osuinamdb.* TO osuinamuser;
exit

# Web server and daemon install steps
rpm -Uvh osu-inam-clickhouse-1.1-1.el7.x86_64.rpm

# Configure INAM service (replace /path/to/ with actual paths)
# After installing Clickhouse-CXX and C++17 compiler
# Edit osu-inamd.service file at `/usr/lib/systemd/system/osu-inamd.service` or create a new env file
# Environment=LD_LIBRARY_PATH=$PATH_TO_LIBRARIES
# Environment=PATH=$PATH_TO_BINARIES
mkdir /etc/systemd/system/osu-inamd.service.d
cat > /etc/systemd/system/osu-inamd.service.d/env.conf <<EOF
[Service]
Environment=LD_LIBRARY_PATH=/path/to/clickhouse/lib64
Environment=PATH=/path/to/bin
EOF
systemctl daemon-reload

# Start the daemons (all prior steps need to have been run successfully)
systemctl start osu-inamd
# Allow 10 seconds before you start web front
sleep 10
# If you are using PhantomJS, allow 1 hour for the website to complete caching
systemctl start osu-inamweb

# Make them start at boot time
systemctl enable osu-inamd
systemctl enable osu-inamweb

3.1.4. Instructions for using InfluxDB as database

Installing and using OSU INAM with InfluxDB consist of 3 steps.

Step 1: InfluxDB CXX is required for osuinamdaemon to insert data into InfluxDB. Please follow the instructions for installing influxdb-cxx here^. We suggest disabling testing when installing influxdb-cxx by passing -DINFLUXCXX_TESTING=OFF to cmake. Then make sure to install the following packages.

Step 1: Dependency Installation for INAM with InfluxDB
yum install -y influxdb java-1.8.0-openjdk curl curl-devel

# Load appropiate modules based on requirements mentioned above

git clone https://github.com/offa/influxdb-cxx.git

cd influxdb-cxx/ && mkdir build && cd build

cmake -DINFLUXCXX_TESTING=OFF -DBOOST_ROOT=$PATH_TO_BOOST_DIR ..

sudo make install
Note
If CentOS cannot find the influxdb then please do the following to add influxdb yum repository to CentOS
Adding InfluxDB yum repository to CentOS
cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key
EOF

Step 2: Please enable retention policy and continuous queries in influxDB config located by default at /etc/influxdb/influxdb.conf

Step 2: Changes to InfluxDB Configuration for OSU INAM

[retention]
  # Determines whether retention policy enforcement enabled.
  enabled = true
  # The interval of time when retention policy enforcement checks run.
  check-interval = "30m"

[http]
  # suggested to enable scalability of handling PVARs for large  MPI jobs
  # The maximum size of a client request body, in bytes. Setting this value to 0 disables the limit.
   max-body-size = 50000000

[continuous_queries]
  # Determines whether the continuous query service is enabled.
  enabled = true

  # Controls whether queries are logged when executed by the CQ service.
  log-enabled = true

  # Controls whether queries are logged to the self-monitoring data store.
  query-stats-enabled = true

  # interval for how often continuous queries will be checked if they need to run
  run-interval = "1s"

After installing, user should setup users and database and then the data retention policy. InfluxDB authentication^. provides more information on setting up InfluxDB authentication beside the simple instructions provided bellow.

For InfluxDB data retention policy is by default 7 days. The data will delete after this duration. User has the option to specify a different duration when creating the database. The duration is global and is applied for all measurements. Dropping a retention policy will delete data inserted in to measurements (tables) with that retention policies. We use autogen retention policy which is default retention policy while creating database.

Retention policy can be increased or decreased using alter command, but the retention duration cannot be set to less than shard group duration. Please be cautious while creating retention policy. Work around to reduce the duration less than the shard group duration is to drop and recreate retention policy which will drop the data in database.

If retention policy or duration is not mentioned, default retention policy is autogen with duration 7d.

Step 3: The final step is to update osu-inamd.service located at start influxDB service. OSU INAM with InfluxDB requires extra libraries of [influxdb-cxx and gcc/7.5+] and they must be on $PATH and $LD_LIBRARY_PATh for osu-inamd.service. The osu-inamd.service file is located at /usr/lib/systemd/system/osu-inamd.service to include the PATH and LD_LIBRARY_PATH for these libraries/modules.

We include the example and steps to perform these instructions by running the instructions below

Step 3: Installation instructions for INAM with InfluxDB
systemctl enable influxdb
systemctl start influxdb

export OSU_INAM_INSTALL_PREFIX=/opt/osu-inam

# After installing influxDB-CXX and C++17 compiler
# Edit osu-inamd.service file at `/usr/lib/systemd/system/osu-inamd.service`
# Environment=LD_LIBRARY_PATH=$PATH_TO_LIBRARIES
# Environment=PATH=$PATH_TO_BINARIES

mkdir /etc/system/system/osu-inamd.service.d
cat > /etc/systemd/system/osu-inamd.server.d/env.conf <<EOF
[Service]
Environment=LD_LIBRARY_PATH=/path/to/influx/lib64
Environment=PATH=/path/to/something/bin
EOF
systemctl daemon-reload

# Setup InfluxDB
influx

#creates database with 7days duration by default
CREATE DATABASE osuinamdb;
##create database with 2 days duration
#Create database osuinamdb with duration 2d

CREATE USER osuinamuser WITH PASSWORD 'osuinampassword';

GRANT ALL ON "osuinamdb" TO "osuinamuser"

SHOW GRANTS FOR "osuinamuser"

exit

# Web server and daemon install steps
rpm -Uvh osu-inam-influx-1.1-1.el7.x86_64.rpm

# Start the daemons (all prior steps need to have been run successfully)
systemctl start osu-inamd

# Allow 10 seconds before you start web front
sleep 10

systemctl start osu-inamweb

# Make them start at boot time
systemctl enable osu-inamd

systemctl enable osu-inamweb
Note
Make sure that you set the correct database in both osuinamweb and osuinamdaemon configuration files otherwise the data will not be written to the database.

3.2. Sample Configuration Files

The sample configuration files are provided in the etc folder of the OSU INAM installation location, like /etc/osu-inam/osu-inamd.conf

Please note that for osuinamd, The purge process has different configuration parameters than other components. This design helps support running OSU INAM for low-frequency performance counter intervals.

Please remember that if you change a runtime parameter in the configuration files, you will need to restart the components that you have changed the configuration file. It’s recommended to do the following steps for a proper restart of OSU INAM

Proper restart for OSU INAM with MySQL
# delete the database in MySQL if needed
mysql -uroot

# you can drop the tables that you think you will not need or drop database
drop database osuinamdb;

drop table FOO;

CREATE DATABASE osuinamdb;

CREATE USER 'osuinamuser'@'localhost' IDENTIFIED BY 'osuinampassword';

GRANT ALL PRIVILEGES ON osuinamdb.* TO 'osuinamuser'@'localhost';

FLUSH PRIVILEGES;

exit

# first restart osuinamd and then osuinamweb
systemctl restart osu-inamd

sleep 10

systemctl restart osu-inamweb
osu-inam.properties when using MySQL
######################## GENERAL PARAMETERS ########################

# Specifies the username and password to login into the OSU INAM website from a browser
security.user.name=user
security.user.password=pass

# Control server port number, default is 8080
#server.port = 8080

# Global interval (in seconds) for refreshing information on different pages
osuinam.counterinterval=30
# Max Cluster Size (in number of nodes). For clusters larger than this,
# the leaf nodes will be collapsed by default to improve visual appeal and
# rendering time . Default value: 500
osuinam.clustering_threshold=500

# Cluster name being used in OSU INAM website
osuinam.clustername=osuinamcluster

# Server Session timeout. If a duration suffix is not specified, seconds will be used
server.servlet.session.timeout=30m

# Spring Session timeout. If a duration suffix is not specified, seconds will be used.
spring.session.timeout=30m

# Specify the path to the inamd conf file, set to the default installation path
osuinam.daemon.conf=/PATH/TO/osu-inamd.conf

# Frequency in which graph gets reset to update topology (in milliseconds)
osuinam.graph-reset-rate=1800000

# Frequency in which graph updates the link usage (in milliseconds)
osuinam.graph-update-rate=30000

# Keeps the HTTP connection alive for a certain time (in milliseconds)
spring.mvc.async.request-timeout=900000

# Path to file containing map between switch guids and their names example /etc/userfriendlynames.txt
# Structure:
# GUID "name"
# Example: 0x7cfe900300a35270 "ib-i2l2s09"
# Leave blank to disable
osuinam.switchNicknameFilePath=

# Required to get port counters metrics for nodes - true/false
osuinam.osu.inam.enable.hca.query=false

# Used in simulated mode - not for production usage
osuinam.simulation.active=false

# Location of the file containing PVAR and its description in csv format
# pvar-desc.csv file comes with inam package
osuinam.pvar-desc-file=/opt/osu-inam/etc/pvar_desc.csv

######################## JOB SCHEDULER ##########################

# Enable PBS support for jobids - true/false
osuinam.usingPbs=false
# The timeout in which after this if OSU INAM does not get updates from qstat it will
# assume that the job is completed (in seconds)
osuinam.completionTimeout=90
# Name of the batch server that you want PBS to fetch job information, it is space seperated
osuinam.batchClusters=@batch-server1 @batch-server2 @batch-server3
# set the working directory of qstat for PBS
osuinam.qstatPath=
# Interval to update jobs from PBS (in seconds)
osuinam.pbsQueryInterval=30

####################### PHANTOMJS PARAMETERS #######################
#phantomjs config
phantomjs.execdir=
phantomjs.runjs=
phantomjs.filedir=
phantomjs.cachefile=

######################## DATA SOURCE PARAMETERS ########################
osuinam.dbtype=mysql
# Properties for opensm datasource configuration
osuinam.datasource.url=jdbc:mysql://localhost:3306/osuinamdb
osuinam.datasource.username=osuinamuser
osuinam.datasource.password=osuinampassword
osuinam.datasource.driver-class-name=com.mysql.jdbc.Driver
osuinam.datasource.initial-size=20
osuinam.datasource.max-active=50
osuinam.datasource.validation-query=select 1
osuinam.datasource.removeAbandonedTimeout=300

# Possible values - true/false
osuinam.datasource.test-on-borrow=true
osuinam.datasource.test-on-return=true
osuinam.datasource.remove-abandoned=true
osuinam.datasource.log-abandoned=true
osuinam.datasource.log-validation-errors=true


######################## LOGGING PARAMETERS ########################
# Log file is rotated once it reaches size of 10MB
logging.level.org.springframework.web=WARN
logging.level.edu.osu.inam.Application=WARN
logging.level.edu.osu.inam=WARN
logging.file=/var/log/osu-inam.log
#logging.path=<directory to which log files are written>

# Limit the number of concurrent connections to tomcat
#server.tomcat.max-connections=

######################## EMAIL PARAMETERS ########################
# Specifies the Email server
spring.mail.host=
spring.mail.port=
# Enable/Disable Email account authentication - true/false
spring.mail.properties.mail.smtp.auth=false
# Username & Password for Email account authentication
# Must be specified if auth is enabled, optional otherwise
spring.mail.username=
spring.mail.password=
# Mail server from address
osuinam.notification-from-address=
# Frequency in which notifications are purged in hours
osuinam.notification-purge-interval=24
# Default Notification email subject
osuinam.notification-email-subject=OSU INAM Notification
# Default Notification email prologue
osuinam.notification-email-prologue=Hello,
# Default Notification email epilogue
osuinam.notification-email-epilogue=-OSU INAM

# TLS , port 587 (optional) - true/false
spring.mail.properties.mail.smtp.starttls.enable=false
# Other Email timeout properties (in milliseconds)
spring.mail.properties.mail.smtp.connectiontimeout=5000
spring.mail.properties.mail.smtp.timeout=5000
spring.mail.properties.mail.smtp.writetimeout=5000
osu-inamd.conf.example when Using MySQL
### INAM debug flags ###

INAM_DEBUG_INIT_VERBOSE=2
INAM_DEBUG_SM_VERBOSE=1
INAM_DEBUG_DB_VERBOSE=2
INAM_DEBUG_NW_VERBOSE=0
INAM_DEBUG_FB_VERBOSE=1
INAM_DEBUG_MEM_VERBOSE=2

### Database connection parameters ###

OSU_INAM_DB_ENABLE_INFLUXDB=0
OSU_INAM_DATABASE_HOST=localhost
OSU_INAM_DATABASE_PORT=3306
OSU_INAM_DATABASE_NAME=osuinamdb
OSU_INAM_DATABASE_USER=osuinamuser
OSU_INAM_DATABASE_PASSWD=osuinampassword
OSU_INAM_DB_RECONNECT=1

## the following timeouts in seconds are timeouts for all componets except purge.
OSU_INAM_DB_READ_TIMEOUT=300
OSU_INAM_DB_WRITE_TIMEOUT=600
OSU_INAM_DB_CONNECT_TIMEOUT=20
OSU_INAM_DB_WAIT_TIMEOUT=28800

### PURGE CONFIGS ###

## Data retention period (in days)
OSU_INAM_RETENTION_PERIOD=2

## Time period between invervals of delete for purge function (in seconds)
OSU_INAM_DELETE_INTERVAL=2

## Specifies the batch size to delete as number of rows in purge procedure
OSU_INAM_BULK_PURGE_SIZE=100000

## The number of seconds MySQL database server waits for an activity on the purge
##connection before closing it for purge connection. default is 3 * purge interval
OSU_INAM_DB_PURGE_WAIT_TIMEOUT=36000

## The number of seconds to wait for more data from a connection before
##aborting the read
OSU_INAM_DB_PURGE_READ_TIMEOUT=36000

## Interval between two purge queries to delete profiling info (in seconds)
OSU_INAM_PURGE_QUERY_INTRVL=7200

### FABRIC and PERFORMANCE COUNTERS CONFIGS ###

## Specifies the number of OMP thread for fabric discovery
OSU_INAM_FABRIC_DISC_NUM_OMP_THREADS=8

## Specifies the number of OMP thread for performance counters on
# for number of switches.
OSU_INAM_NUM_OMP_THREADS_FOR_SWITCHES=16

## Specifies the number of OMP thread for performance counters on
## for number of ports of switches.
OSU_INAM_NUM_OMP_THREADS_FOR_SWITCH_PORTS=1

## Enable OMP threading for switches (should set to 1 even if you only want to
## to do parallel read for ports of switches
OSU_INAM_USE_OMP_THREADS_FOR_SWITCHES=1
OSU_INAM_USE_OMP_THREADS_FOR_SWITCH_PORT=0

## Enable concurrent writes for performance counters
OSU_INAM_ENABLE_PARALLEL_PERF_COUNTER_DATA_WRITE=1
## Use bulk inserts into db
OSU_INAM_DATABASE_BULK_ACTIVE=1

## Number of records for bulk inserts
OSU_INAM_DATABASE_BULK_SIZE=300

## Interval to detect changes in network (in seconds)
OSU_INAM_FABRIC_QUERY_INTRVL=18000

## Interval to collect switch counters (in milliseconds)
OSU_INAM_PERF_COUNTER_QUERY_INTRVL=20000

### MVAPICH2-X Config information ###
## Interval MVAPICH2-X reports node, job, and process level info (in seconds)
OSU_INAM_PROC_COUNTER_QUERY_INTRVL=30

## MVAPICH2-X should report CPU utilization
OSU_INAM_TOOL_REPORT_CPU_UTIL=1

## MVAPICH2-X should report MEM utilization
OSU_INAM_TOOL_REPORT_MEM_UTIL=1

## MVAPICH2-X should report I/O utilization
OSU_INAM_TOOL_REPORT_IO_UTIL=1

## MVAPICH2-X should report communication grid
OSU_INAM_TOOL_REPORT_COMM_GRID=1

##Time (in seconds) after which a job is marked as complete if no update from
#MVAPICH is received for that job
OSU_INAM_JOB_COMPLETION_TIMEOUT=60

### Job scheduler Config ###

## Determines to use SLURM or not
OSU_INAM_ENABLE_SLURM=1

## slurm query interval indicates (in seconds) how often the slurm should be
## queried for job status
OSU_INAM_SLURM_QUERY_INTERVAL=30

## Specifies the path to squeue cmd.
#OSU_INAM_SQUEUE_CMD_PATH=/usr/bin/

### GENERAL  ###

## Specifies if port counters and port errors data should be fetched from all
## host connected nodes on the network in additon to the switches on the network
## Please do not enable this by default as the traffic on one end of link will
## be the same as other end.
OSU_INAM_ENABLE_HCA_QUERY=0

## Specifies if HCA nodes should be scanned for route information
OSU_INAM_ENABLE_ROUTE_DISCOVERY=1

Here is a sample config file generated by OSU INAM Daemon to pass to MVAPICH jobs

osu-inam.conf
MV2_TOOL_QPN=X
MV2_TOOL_LID=X
MV2_TOOL_COUNTER_INTERVAL=30
MV2_TOOL_REPORT_CPU_UTIL=1
MV2_TOOL_REPORT_MEM_UTIL=1
MV2_TOOL_REPORT_IO_UTIL=1
MV2_TOOL_REPORT_COMM_GRID=1
MV2_TOOL_REPORT_LUSTRE_STATS=0
MV2_TOOL_REPORT_PVARS=1
Additions to osu-inam.properties when using InfluxDB and SLURM
##################### JOB SCHEDULER PARAMETRS ######################
osuinam.using-slurm=true
osuinam.enable-slurm-multi-servers=false
osuinam.slurm-clusters=''
osuinam.slurm-query-interval=30
osuinam.squeue-comd-path=/usr/bin

######################## INFLUXDB PARAMETRS ########################
osuinam.dbtype=influx
osuinam.influx.url=http://localhost:8086
osuinam.influx.database=osuinamdb
osuinam.influx.database=osuinamdb
osuinam.influx.username=osuinamuser
osuinam.influx.password=osuinampassword
Modifications to osu-inamd.config when using InfluxDB and SLURM
## database connection ##
OSU_INAM_DB_ENABLE_INFLUXDB=1
OSU_INAM_DATABASE_PORT=8086

## Number of records for bulk inserts
OSU_INAM_DATABASE_BULK_SIZE=25000

Please email us at mvapich-help@cse.ohio-state.edu if you experience any trouble installing the package on your system.

3.3. Upgrading from an older version

Upgrading from older versions involves a subset of steps from the complete installation. INAM v1.1 uses an embedded tomcat server and doesn’t expect tomcat server to be installed, unlike the older versions of INAM.

The embedded tomcat server uses the same port number 8080 as the default tomcat installation. It is recommended that the tomcat installation is uninstalled or stopped before installing the new version of INAM. If tomcat cannot be uninstalled, the port number used by INAM can be changed by using the server.port property in the osu-inam.properties file.

.Upgrade Steps
# Kill the current running inam daemon
pkill osu-inamd
# Stop and uninstall tomcat6
service tomcat6 stop
yum remove tomcat6
# Install the latest rpm, uninstallation of the old rpm may be necessary
rpm -Uvh osu-inam-mysql-1.1-1.el7.x86_64.mofed4_5.rpm
# Start the osu-inam daemon again
service osu-inamd start

4. Basic Usage Instructions

If the installation was successful and the service has been started, you should be able to see the OSU INAM homepage if you point your web browser to http://localhost:8080/ or http://<server_ip>:8080/, depending on where the server was installed. If the server is behind a firewall, look here for some pointers.

4.1. Using the Network View

The Network View provides an overview of the entire network fabric. The network topology is presented as an interactive display that can be moved, dragged or zoomed as required. The nodes are represented by blue circles and switches are represented by red circles. They are labeled by their respective LIDs. The interconnects are colored according to their current load as indicated in the legend.

4.1.1. Network Metrics

The ‘Network Metrics’ drop-down box lists a set of port counters available from the switch. By default, total traffic on the link (Transmitted + Received Bytes) is shown. For the full list of supported counters, refer to port-counters.

4.1.2. Live View

When Live View is selected, the display is refreshed every 30 seconds. This frequency can be changed by changing the runtime variable osuinam.counterinterval in Java side configuration file, usually in /etc/osu-inam.properties. Please note that you should adjust the page refreshing interval based on the intervals on the Daemon side. For example, if you have OSU_INAM_PERF_COUNTER_QUERY_INTRVL=10000, then selecting anything less than 10 seconds for osuinam.counterinterval is not recommended. The default live page reftesh counter is 30 seconds.

The view can be updated manually by selecting a node or switch, right-clicking on it and can be updated manually by selecting a node or switch, right-clicking on it and selecting ‘Update Network’. To get a live view of the switch (red circle) or the node (blue circle), right-click on the appropriate circle and select "Open Node Info". This will open up a new tab / window for the respective element.

4.1.3. Historical View

Looking at the past behavior of a network is often useful while investigating an issue. The Historical View shows the condition of the network from the ‘Start Time’ to the ‘End Time’. The Play/Pause button can be used to start and stop the display. By default, the snapshots are shown in real-time but it can be sped up to 2x, 4x, or 8x speed. The display can be also be restarted by clicking the Rewind button.

By using the check-boxes under ‘Link Usage’, only the links with a certain range of traffic can be included in the view. For example, idle links can be excluded by unchecking the 0-5% checkbox. For metrics indicating errors, the links with or without that error can be selected.

4.1.5. Node Information

Right-clicking on a node presents a context menu. Selecting ‘Open Node Info’ will show detailed information about that node. If the node is running MVAPICH2-X, aggregate CPU usage and usage by each rank will be available.

4.1.6. Switch Information

Detailed information about a switch can be obtained by right-clicking on a switch followed by ‘Open Switch Info’. Clicking on a port will show the port counter information for that particular port.

4.1.7. Route Information

Multiple nodes can be selected on the display by CTRL+Clicking (CMD+Click for OS X) on them. Once multiple nodes are selected, right-click followed by ‘Find Routes Between Selected Nodes’ will highlight the available routes between them.

Detailed information about the utilization of a link can be viewed by right-clicking on the link followed by Open Link Info. For jobs using MVAPICH-X, data transferred via that link is available. Also, by selecting a job id, process level link utilization is available.

Left click on the link of choice to select it. Then right click and select the Find routes going through this link option. This will display all the routes connecting hosts which uses the selected link.

4.1.10. Using the Job Level View

OSU INAM can work with the resource manager to show information pertaining to a single job instead of the entire cluster. The Job Level View can be activated by selecting ‘Job Id’ in ‘Filter By’ and entering a job id. In this view, only the nodes and switches participating in that job are displayed. The features present in Network Level View (See network-view) like Live View, Historical View, Link Usage etc. are supported in the Job Level View as well.

In Historical View, the start and end time of the job are automatically populated. For a running job, the end time is populated to the current time. The user can select specific start and end time as well.

4.1.11. Using the Node View

For supported MPI libraries, OSU INAM can display process level CPU and network utilization information. This mode can be selected by choosing ‘Node Id’ and selecting one or more nodes from the list of nodes. If an MPI job is running on that node, OSU INAM will display aggregate or per core CPU usage. The list of MPI ranks is also shown, and each of the ranks can be selected to view their network usage over a period of time.

4.1.12. Using MPI Primitive: most used

The MPI Primitives most used chart will provide the overview of most used MPI primitives across all cluster for the users across all jobs. Note that the jobs need to use MVAPICH2-X and send the information to OSU INAM Daemon.

The legends are shown at the right side. User should choose the metric from Metric drop down list. Available metrics are Bytes sent/recv, Count of message sent/recv and time taken for MPI primitive. The assocaited Performance Variable(PVAR) is show along with Node and job information. User can view the top 5 and 10 or all the MPI primitives across cluster.

4.1.13. Using Lustre Communication Grid

The Lustre communication grid shows the Lustre IO traffic over the time across all cluster. User can select which lustre server to monitor read and write operations. The chart gets updated in real-time. For historical view, user should choose the job from historical jobs.

4.2. Using the Live Jobs Tab

The Live Jobs tab allows the user to see various selectable metrics (defined below) for all jobs using MVAPICH2-X 2.3 on a cluster.

Note
In the case that OSU INAM does not get any information from MVAPICH2-X, only the job IDs and the number of nodes per job will be shown.
Note
Resource utilization metrics are associated with the resources that MVAPICH2-X job is using. For example, if we are using 4 cores of a node, then the CPU utilization is reported for those 4 cores. Gathering node level resource utilization is a future work.

4.2.1. CPU User Usage

This displays the aggregated CPU utilization (percentage) for a specific job.

4.2.2. Virtual Memory Usage

This displays the total virtual memory utilization (in bytes) for a specific job

4.2.3. Total I/O

This displayed the total I/O data read and written (in bytes) for a specific job

4.2.4. Total Communication

This displays the sum of all inter and intra node communication performed by this job.

4.2.5. Total Intra Node Communication

This displays the number of bytes exchanged by the job between the process running on one node.

4.2.6. Total Inter Node Communication

This displays the number of bytes exchanged by the job between processes running on different nodes.

4.2.7. Total Collective

This displays the number of bytes exchanged by processes (for the specific job) during collective communication (e.g. MPI_Bcast) only.

4.2.8. RMA Sent

This displays the number of bytes sent for one-sided communication (e.g. MPI_Put) by processes of a specific job.

4.2.9. Total Pt-to-pt

This displays the number of bytes sent and received by point to point operations (e.g. MPI_Send or MPI_Recv) for a specific job.

4.2.10. Inter-node Communication Buffers Allocated

This displays the number of buffers allocated for communication across nodes.

4.2.11. Inter-node Communication Buffers Used

This displays the number of buffers actually used for communication across nodes.

4.2.12. Global MPI Inter and Intra node data exchange

This sections shows the blocking and non-blocking data exchange tables and charts for MPI point-to-point, collective and RMA operations on job level. Note that Collectives use point-to-point operations as underlying operation and OSU INAM counts them toward the values shows in this chart. The table shows the amount of data exchanged for different message size buckets. User can choose different sessions that are used in the profilied program. The sessions listed in the drop down are MPI_T sessions in the job.

4.2.13. MPI Primitive: most used

The MPI Primitives most used chart will provide the overview of most used MPI primitives across job or node level based on choice of granularity. Note that the jobs need to use MVAPICH2-X and send the information to OSU INAM Daemon. The charts gets updated live from the moment user opens the tab. For historical view of data user can use Historical jobs.

The legends are shown at the right side. User should choose the metric from Metric drop down list. Available metrics are Bytes sent/recv, Count of message sent/recv and time taken for MPI primitive. The assocaited Performance Variable(PVAR) is show along with Node and job information. User can view the top 5 and 10 or all the MPI primitives across cluster.

If user clicks on MPI primitive, then a pop-up window will show the rank level values, used PVARs, and if the timer flag for PVARs is set, then showing time taken for each message bucket for each node. If user is interested to get the values for MPI rank level, clicking on node name will take the user to the live node page to demonstrate the info.

4.2.14. MPI Primitive: usage over time

This charts shows the usage of MPI primitives across job level over time. The chart gets updated from the moment user opens the tab. For historical view of the usage over the time user can use historical jobs from the top tabs for specified period of time. In this chart the left legend is for aggregate mode and the right legent shows numbers for delta value. Delta value is the difference between two data points where PVARs were collected in MPI runtime.

User can choose which set of collective or point-to-points they want to see. Also there is an option to view the top 3 most used MPI primitives inside the job. Available metrics are Bytes sent/recv, Count of message sent/recv.

4.2.15. Using Lustre Communication Grid

The Lustre communication grid shows the Lustre IO traffic over the time across all job. User can select which lustre server to monitor read and write operations. The chart gets updated in real-time. For historical view, user should choose the job from historical jobs.

4.3. Notifications

Notifications feature in OSU INAM helps monitor the system while the user is offline. Notifications page (http://localhost:8080/notifications or http://<server_ip>:8080/notifications) helps set criteria for notifications and also lists the notifications that were generated. Once a criteria is met by the system, OSU INAM sends an email notification to the configured email list and logs the event to be displayed in the notifications page. The notifications generated can be configued to be purged at certain intervals using the config in application.properties file - osuinam.notification-purge-interval.

4.3.1. Setting up Email server configurations

In order to enable OSU INAM to send email notifications, runtime parameters for osuinamweb has to be configured based on how the email server is set up. spring.mail.host and spring.mail.port are used to specify the email server’s hostname/ip address and port respectively. spring.mail.properties.mail.smtp.auth specifies if the email server has authentication enabled. If authentication is enabled, spring.mail.username and spring.mail.password must be specified, and the from address will be picked from the username. If authentication is disabled, from address will have to be explicity specified using osuinam.notification-from-address.

4.3.2. Setting Criteria

"Add Notification Criteria" button in the notifications page pops up a dialogue box with options to set notification criteria when clicked. Criteria can be set on different metrics collected by OSU INAM, classified into two categories, Port Counters and Process Counters. The list of metrics that can be tracked are listed below. A threshold value can be specified for the metric chosen, along with the comparison (greater than, less than or equals) desired. For each criteria, the user can also specify if the email is to be sent for every occurrence during the notification purge interval (osuinam.notification-purge-interval) or only the first occurrence using "Is Recurring" flag in the dialogue box. The users have the liberty to customize the Email subject, Prologue and Epilogue for emails generated specific to criteria.

  • Port Counters

    • Link Utilization

    • Bytes sent

    • Bytes received

    • Symbol errors

    • Link recovers

    • Link downed

    • Receive errors

    • Receive Remote Physical errors

    • Receive Switch Relay errors

    • Send discards

    • Send Constraint errors

    • Receive Constraint errors

    • Link Integrity errors

    • VL15 Dropped count

  • Process Counters

    • Bytes sent

    • Bytes received

    • Packets sent

    • Packets received

4.4. Emulator mode

OSU INAM Daemon has an emulator mode for the cases where user wants to test OSU INAM without deployment. The emulator requires two arguments to activate: -e to enable emulator and -g followed by fabric topology. The fabric topology is the output of ibnetdiscover command passed to OSU INAM Daemon (osu-inamd). Please note that the tables will be used to generate random data inside INAM Daemon. The emulator does not support Lustre and MPI_T PVAR simulators in this version. Table sim_jobs will contain random jobs created by the emulator. User should activate emulator on the osuinamweb configuration as well.

  • Please refer to Section <Emulator Parameters> for configure parameters for OSU INAM Daemon.

  • Please set osuinam.simulation.active=true in osuinamweb configuration.

5. Using MVAPICH2-X INAM

5.1. Running Example

In this section, we provide details on how one should enable MVAPICH2-X to work in conjunction with OSU INAM.

Users should be using the appropriate version of the MVAPICH2-X RPM built with the support for advanced features to use this feature.

Please note that MVAPICH2-X must be launched with support for On-demand Connection Management when running in conjunction with OSU INAM. One can achieve this by setting the MV2_ON_DEMAND_THRESHOLD environment variable to a value less than the number of processes in the job.

MVAPICH2 Running Example without Performance Variables (PVARS)

This command launches test on nodes n0 and n1, two processes per node with support for sending the process and node level information to the OSU INAM daemon.

$ mpirun_rsh -rsh -np 4 n0 n0 n1 n1 MV2_ON_DEMAND_THRESHOLD=1
MV2_TOOL_INFO_FILE_PATH=/opt/inam/.mv2-tool-mvapich2.conf
MV2_TWO_LEVEL_COMM_THRESHOLD=1 MV2_USE_RDMA_CM=0 ./test
$ cat /opt/inam/.mv2-tool-mvapich2.conf
MV2_TOOL_QPN=473             #UD QPN at which OSU INAM is listening.
MV2_TOOL_LID=208             #LID at which OSU INAM is listening.
MV2_TOOL_COUNTER_INTERVAL=30 #Specifies whether MVAPICH2-X should report
                             #process level CPU utilization information.
MV2_TOOL_REPORT_CPU_UTIL=1   #The interval at which MVAPICH2-X should
                             #report node, job and process level information.
MVAPICH2 Running Example with Performance Variables (PVARS)

This command launches test on nodes n0 and n1, two processes per node with support for sending the process and node level information to the OSU INAM daemon. This allows MVAPICH2 to report global PVARs and session-specific PVARs information.

$ mpirun_rsh -rsh -np 4 n0 n0 n1 n1 MV2_ON_DEMAND_THRESHOLD=1
MV2_TOOL_INFO_FILE_PATH=/opt/inam/.mv2-tool-mvapich2.conf
MV2_TWO_LEVEL_COMM_THRESHOLD=1 MV2_USE_RDMA_CM=0 MV2_TOOL_REPORT_PVARS=1 MV2_ENABLE_PVAR_TIMER=1
MV2_ENABLE_PVAR_COUNTER=1 MV2_ENABLE_PVAR_TIMER_BUCKETS=1 MV2_ENABLE_PVAR_COUNTER_BUCKETS=1  MV2_TOOL_REPORT_SESSIONS=1
MV2_TOOL_SESSIONS_DEFAULT_ALL_HANDLES=1 MV2_TOOL_REPORT_LUSTRE_STATS=1 ./test
$ cat /opt/inam/.mv2-tool-mvapich2.conf
MV2_TOOL_QPN=473             #UD QPN at which OSU INAM is listening.
MV2_TOOL_LID=208             #LID at which OSU INAM is listening.
MV2_TOOL_COUNTER_INTERVAL=30 #Specifies whether MVAPICH2-X should report
                             #process level CPU utilization information.
MV2_TOOL_REPORT_CPU_UTIL=1   #The interval at which MVAPICH2-X should
                             #report node, job and process level information.
MVAPICH2 Running Example with Lustre support

This command launches test on nodes n0 and n1, two processes per node with support for sending the process and node level information to the OSU INAM daemon. This allows MVAPICH2 to report Lustre stats.

$ mpirun_rsh -rsh -np 4 n0 n0 n1 n1 MV2_ON_DEMAND_THRESHOLD=1
MV2_TOOL_INFO_FILE_PATH=/opt/inam/.mv2-tool-mvapich2.conf
MV2_TWO_LEVEL_COMM_THRESHOLD=1 MV2_USE_RDMA_CM=0 MV2_TOOL_REPORT_LUSTRE_STATS=1 ./test
$ cat /opt/inam/.mv2-tool-mvapich2.conf
MV2_TOOL_QPN=473             #UD QPN at which OSU INAM is listening.
MV2_TOOL_LID=208             #LID at which OSU INAM is listening.
MV2_TOOL_COUNTER_INTERVAL=30 #Specifies whether MVAPICH2-X should report
                             #process level CPU utilization information.
MV2_TOOL_REPORT_CPU_UTIL=1   #The interval at which MVAPICH2-X should
                             #report node, job and process level information.

6. Runtime Parameters for osuinamd

A list of all runtime parameters supported by OSU INAM v1.1 is listed below. All these parameters can be set in the configuration file for OSU INAM, usually called "osu-inamd.conf". An example configuration file is provided in /etc/osu-inam/osu-inamd.conf. If the user chooses to tune any of these values, note that a restart of the daemon is required for it to take effect. All of the parameters listed here are for daemon only.

6.1. General Parameters

In this section, runtime parameters for fabric and port data (counters and errors) are presented. These parameters are used to adjust query interval, the number of threads for each component, and enabling some features.

6.1.1. OSU_INAM_FABRIC_QUERY_INTRVL

  • Default: 3600 seconds

  • Specifies the interval in seconds at which OSU INAM should query the fabric to identify a change in state for switches, nodes, links ,and routes.

6.1.2. OSU_INAM_PERF_COUNTER_QUERY_INTRVL

  • Default: 30 seconds

  • Specifies the interval in millisecond at which OSU INAM should query the switches to obtain port counter and port errors information.

6.1.3. OSU_INAM_ENABLE_HCA_QUERY

  • Default: 0 (disabled)

  • Specifies if port counters and port errors data should be fetched from all host connected nodes on the network in addition to the switches on the network. This is disabled by default since the switches port data are already fetched. Enabling it will increase the time taken to gather port data.

Note
We strongly recommend to keep this option disabled. Enabling this option would increase the latency of reading InfiniBand port counters/errors due to long timeouts.

6.1.4. OSU_INAM_FABRIC_DISC_NUM_OMP_THREADS

  • Default: 8

  • Specifies the number of OMP thread for performing fabric discovery.

6.1.5. OSU_INAM_NUM_OMP_THREADS_FOR_SWITCHES

  • Default: 8

  • Specifies the number of OMP thread for performance counters for the number of switches. Should be greater than 0.

6.1.6. OSU_INAM_NUM_OMP_THREADS_FOR_SWITCH_PORTS

  • Default: 1

  • Specifies the number of OMP thread for performance counters for the number of ports of switches. It cannot be zero.

6.1.7. OSU_INAM_USE_OMP_THREADS_FOR_SWITCHES

  • Default: 1

  • Enable OMP threading for the switches. Should be set even if you only want to gather ports information of switches in parallel, not across switches.

6.1.8. OSU_INAM_USE_OMP_THREADS_FOR_SWITCH_PORT

  • Default: 0

  • Enable OMP threading for the ports of a switch.

6.1.9. OSU_INAM_ENABLE_PARALLEL_PERF_COUNTER_DATA_WRITE

  • Default: 0

  • Enable concurrent writes for performance counters info into the database.

6.1.10. OSU_INAM_ENABLE_ROUTE_DISCOVERY

  • Default: 1 (enabled)

  • Specifies if HCA nodes should be scanned for route information.

6.2. MVAPICH2-X Specific Parameters

Runtime Parameters related to running MVAPICH2-X for ousinamd are presented in this section including the interval of querying process info and metrics that MVAPICH2-X should report.

6.2.1. OSU_INAM_PROC_COUNTER_QUERY_INTRVL

  • Default: 30 seconds

  • Specifies the interval at which MVAPICH2-X should report node, job and process level information.

6.2.2. OSU_INAM_TOOL_REPORT_CPU_UTIL

  • Default: 1

  • Specifies whether MVAPICH2-X should report process level CPU utilization information.

6.2.3. OSU_INAM_TOOL_REPORT_MEM_UTIL

  • Default: 1

  • Specifies whether MVAPICH2-X should report process-level memory utilization information.

6.2.4. OSU_INAM_TOOL_REPORT_IO_UTIL

  • Default: 1

  • Specifies whether MVAPICH2-X should report process level IO information.

6.2.5. OSU_INAM_TOOL_REPORT_COMM_GRID

  • Default: 1

  • Specifies whether MVAPICH2-X should report process communication grid information.

6.2.6. OSU_INAM_TOOL_REPORT_PVARS

  • Default: 1

  • Specifies whether MVAPICH2-X jobs should report process level performance variable information.

6.2.7. OSU_INAM_TOOL_REPORT_LUSTRE_STATS

  • Default: 0

  • Specifies whether MVAPICH2-X jobs should report lustre traffic information.

6.2.8. OSU_INAM_JOB_COMPLETION_TIMEOUT

  • Default: 100 seconds

  • Specifies the time (in seconds) after which a job is marked as complete if no update from #MVAPICH2-X is received for that job.

6.3. OSU INAM Database Configuration Parameters

This section presents the runtime parameters related to MySQL or Influx database setting for osuinamd. Some parameters must be set by the user to have OSU INAM working.

Note
You don’t need to set any extra flags to use ClickHouse database for inamd daemon. Just update host, port, dbname, user, pass for ClickHouse.

6.3.1. OSU_INAM_DB_ENABLE_INFLUXDB

  • Default: Unset (0)

  • Specifies enabling InfluxDB to be used for OSU INAM. User must use influxDB rpm for this to work.

6.3.2. OSU_INAM_DATABASE_HOST

  • Default: Unset (Must be set by user)

  • Specifies the name of the host where the MySQL, ClickHouse, or InfluDB database daemon is running.

6.3.3. OSU_INAM_DATABASE_PORT

  • Default: Unset (Must be set by the user)

  • Specifies the port on OSU_INAM_DATABASE_HOST at which the MySQL or InfluxDB database daemon is listening for incoming connections.

6.3.4. OSU_INAM_DATABASE_NAME

  • Default: Unset (Must be set by the user)

  • Specifies the name of database OSU INAM should use to store data.

6.3.5. OSU_INAM_DATABASE_USER

  • Default: Unset (Must be set by the user)

  • Specifies the name of user who has privileges to enter data into database with name OSU_INAM_DATABASE_NAME.

6.3.6. OSU_INAM_DATABASE_PASSWD

  • Default: Unset (Must be set by the user)

  • Specifies the password associated with user id OSU_INAM_DATABASE_USER.

6.3.7. OSU_INAM_DATA_RETENTION_PERIOD

  • Default: 7 days

  • Specifies the duration in days the profiling data should be stored in the database. Any data longer than that will be purged. Jobs and Notifications tables will not purge.

6.3.8. OSU_INAM_PURGE_QUERY_INTERVAL

  • Default: 3600 seconds

  • Specifies the interval between two purge queries used to delete profiling information from MySQL database.

6.3.9. OSU_INAM_DATABASE_BULK_ACTIVE

  • Default: 1 (enable)

  • If enabled, the insertion queries will insert data in a bulk manner to the database.

6.3.10. OSU_INAM_DATABASE_BULK_SIZE

  • Default: 1000 (MySQL), 20000 (ClickHouse), 10000 (InfluxDB)

  • Specifies the number of records inserted in a bulk insert. We suggest you not to change this.

6.3.11. OSU_INAM_DB_RECONNECT

  • Default: 1 (enabled)

  • Enables reconection for MySQL. If the connection to the server is lost, automatically try to reconnect three times.

6.3.12. OSU_INAM_DB_READ_TIMEOUT

  • Default: 30 seconds

  • Specifies the number of seconds to wait for more data from a MySQL connection before aborting the read.

6.3.13. OSU_INAM_DB_CONNECT_TIMEOUT

  • Default: 10 seconds

  • Specifies the number of seconds that the MySQL server waits for a connect packet before ending the connection due to bad handshake.

6.3.14. OSU_INAM_DB_WRITE_TIMEOUT

  • Default: 60 seconds

  • Specifies the number of seconds to wait for a block to be written to a MySQL connection before aborting the write.

6.3.15. OSU_INAM_DB_WAIT_TIMEOUT

  • Default: 3 times the interval of each component. For example: if fabric discovery interval is every 8 hours then wait_timeout for fabric connection will be set to 24 hours.

  • The number of seconds MySQL database server waits for an activity on a connection before closing it.

Note
The following options cover purge setting for MySQL. For InfluxDB user should set the retention policy in influxdb.config file as mentioned in Step 2 at Section <[Instructions for using InfluxDB as database].

6.3.16. OSU_INAM_BULK_PURGE_SIZE

  • Default: 100000

  • Specifies the batch size to delete as the number of rows in MySQL purge procedure.

6.3.17. OSU_INAM_DB_PURGE_WAIT_TIMEOUT

  • Default: 3 * OSU_INAM_PURGE_QUERY_INTERVAL

  • Specifies the number of seconds MySQL database server waits for an activity on purge connection before closing it for the purge connection.

6.3.18. OSU_INAM_DB_PURGE_READ_TIMEOUT

  • Default: 28800 seconds (8 hours)

  • Specifies the number of seconds to wait for more data from MySQL purge connection before aborting the read.

6.3.19. OSU_INAM_DB_PURGE_WRITE_TIMEOUT

  • Default: 28800 seconds (8 hours)

  • Specifies the number of seconds to wait for a block to be written to the MySQL purge connection before aborting the write.

6.3.20. OSU_INAM_DELETE_INTERVAL

  • Default: 1 second

  • Specifies the time period between intervals of delete for MySQL purge function.

6.3.21. OSU_INAM_DATABASE_CONN_POLL_SIZE

  • Default: 64

  • Specifies the poll size for ClickHouse database connections.

  • Setting value more than 64 requires changing max allowed connection setting in database configuration file.

6.4. OSU INAM SLURM Job Scheduler Configuration Parameters

This section presents the runtime parameters related to SLURM job scheduler for osuinamd including the query interval of job scheduler and timeout for jobs.

Note
When using Mysql SLURM settings are in osuinamdaemon configuration. When using InfluxDB SLURM settings are located in osuinamweb configuration.
Note
PBS TORQUE configuration parameters are set in OSU INAM WEB [PBS Job Scheduler Parameters].
Note
You must unset osuinam.usingPbs in the osuinamweb configuration file if you are using SLURM.

6.4.1. OSU_INAM_ENABLE_SLURM

  • Default: 1

  • Specifies if SLURM should be used to get live jobs information. The sacct command is run on the system where the inamd daemon is running to get the jobs information

6.4.2. OSU_INAM_ENABLE_PBS

  • Default: 0

  • Specifies if PBS should be used to get live jobs information. You must set osuinam.usingPb in the osuinamweb configuration file if you are using PBS.

6.4.3. OSU_INAM_SLURM_QUERY_INTERVAL

  • Default: 30 seconds

  • Specifies how often the jobs information must be pulled in from SLURM

6.4.4. OSU_INAM_SQUEUE_CMD_PATH

  • No Default

  • Specifies the path to the directory that contains squeue command.

6.4.5. OSU_INAM_ENABLE_MULTI_SLURM_SERVERS

  • Default: 0

  • Determines to use multi-batch servers for SLURM or not.

6.4.6. OSU_INAM_SLURM_SERVERS

  • No default

  • Determines the names of different batch servers for SLURM in a comma seperated manner.

6.5. Debug Parameters

A list of debugging parameters and the verbosity level of each one is shown in this section. It’s useful to find out the problem by choosing the right level of debugging for each component.

6.5.1. INAM_DEBUG_INIT_VERBOSE

  • Default: 1

  • Prints the given arguments for OSU INAM and status of each threads.

        Level 0
                - All debugging is disabled
        Level 1
                - General launch information
                - Creation and exiting of Pthreads for components

                    - Examples of debugging information at this level
                        - creation of Fabric, Performance Counter, SLURM, and Network
                          threads
                        - prints the argument passed to osuinamd

6.5.2. INAM_DEBUG_SM_VERBOSE

  • Default: 0

  • Verbosity level for tracking state machines operations.

        Level 0

            - All debugging is disabled

        Level 1

            - Displays the transition of the state machine for fabric discovery,
              performance counter and MPI_T network threads

6.5.3. INAM_DEBUG_DB_VERBOSE

  • Default: 1

  • Verbosity level for the database operations.

        Level 0
            - All debugging is disabled

        Level 1
            - Verification of creating database connections for different components
            - Debugging information related to interactions with SLURM
            - Debugging information related to purging database and database related
              faults


        Level 2
            - Basic debugging information related to creating and altering database
              tables
            - Basic debugging information related to database connections

        Level 3
            - Advanced debugging information related to database connections
            - Advanced debugging information related to database insertions and deletions

6.5.4. INAM_DEBUG_NW_VERBOSE

  • Default: 1

  • Verbosity level for the network operations.

        Level 0

            - All debugging is disabled

        Level 1

            - Displays basic debugging information
                * Device information
                * Details about `osu-inam.conf` file
                * Details about querying and gathering process information from MPI
                  processes

             Examples of debugging information at this level
                - Allocated QPN and LID for listening to MPI info
                - Insertion of Process info into the database
                - The path to `osu-inam.conf` file and the content
                - Start and end of network initialization

        Level 2

            - Acknowledgments for receiving info from MPI jobs

        Level 3

            - Rank information for MPI jobs

6.5.5. INAM_DEBUG_FB_VERBOSE

  • Default: 1

  • Verbosity level for the fabric discovery and performance counters sweep operations.

        Level 0

            - All debugging is disabled

        Level 1

            - Displays basic debugging information for the following features
                * Querying and gathering InfiniBand fabric data
                * Querying and gathering InfiniBand performance counter data
                * High-level details about multi-threading

                Examples of debugging information at this level
                - Starting and finalizing various threads
                - Verification of information gathered from InfiniBand fabric
                - Number of OpenMP threads used for Fabric
                - Number of Nodes detected by INAM

        Level 2

            - Advanced debugging for querying and gathering InfiniBand fabric data
              and InfiniBand performance counter data

            - Details about querying and gathering performance counters for InfiniBand
              switches in the network

             Examples of debugging information at this level
                - Name and number of ports for each switch

        Level 3

            - Debugging information about optimized OpenMP-based multi-threading design

             Examples of debugging information at this level
                - Detailed information for links and node for fabric and performance
                  counter threads
                - Information related to gathering route data

        Level 4

            - Details about route discovery for Fabric thread
            - Reports encountered errors while discovering Fabric

             Examples of debugging information at this level
                - Information about GUID and endpoint nodes of the routes
                - Information about bad forwarding tables

6.6. Emulator Parameters

Following are the parameters for OSU INAM emulator inside OSU INAM Daemon to set up variables of simulation. User must set the variables. There is no default for them.

6.6.1. OSU_INAM_PROCS_PER_NODE

  • Specifies the number of MPI processes per node

6.6.2. OSU_INAM_MIN_JOB_SIZE

  • Specifies the minimum number of processes in a job

6.6.3. OSU_INAM_MAX_JOB_SIZE

  • Specifies the maximum number of processes in a job

6.6.4. OSU_INAM_MIN_JOB_DURATION

  • Specifies the minimum duration of the job in seconds

6.6.5. OSU_INAM_MAX_JOB_DURATION

  • Specifies the maximum duration of the job in seconds

  • Specifies the maximum bandwidth of links in Gbps (Giga bits per second)

6.6.7. OSU_INAM_COMM_GRID

  • Specifies if the comm grid should be reported by setting to 0 or 1.

OSU INAM Web Application Runtime Parameters

This document provides a guide to the most frequently used parameters supported by OSU INAM v1.1 for the osuinamweb application. osuinamweb operates on Apache Tomcat, and you can set all the parameters listed in this guide in the configuration file located at /etc/osu-inam/osu-inam.properties.

Note
Any changes to these values will require a restart of osuinamweb to take effect.
Tip
For a comprehensive list of Spring Boot configuration parameters, visit the Spring Boot documentation.
Tip
A full list of Spring Boot configuration parameters can be found here.

0.1. General Parameters

Below, general runtime parameters for the website are listed including graphs' update time, website credentials and server port.

0.1.1. security.user.name

0.2. security.user.name

  • Default: user

  • This parameter specifies the username required to log into the OSU INAM website from a browser.

0.3. security.user.password

  • Default: pass

  • This parameter specifies the password required to log into the OSU INAM website from a browser.

0.4. osuinam.counterinterval

  • Default: 30 seconds

  • This parameter controls the frequency at which the website graphs are refreshed. For instance, if you want the live jobs page to update every 5 seconds or port counter charts, then set this value to 5.

0.5. osuinam.graph-reset-rate

  • Default: 30 minutes (1800000 ms)

  • This parameter controls the frequency at which network view graph gets reset to update topology based on changes from fabric discovery in the daemon.

0.6. osuinam.graph-update-rate

  • Default: 30 seconds

  • This parameter controls the frequency at which the network view graph updates the link usage. For changing the interval of updating the charts please set osuinam.counterinterval to the desired value.

0.7. osuinam.clustering_threshold

  • Default: 500

  • This parameter sets the Max Cluster Size (in number of nodes). For clusters larger than this size, the leaf nodes will be collapsed by default to improve visual appeal and rendering time.

0.8. osuinam.clustername

  • Default: Unset

  • This parameter sets the cluster name for osuinamweb.

0.9. server.port

  • Default: 8080

  • This parameter controls the server port number.

0.10. osuinam.switchNicknameFilePath

  • Default: Unset (Leave blank to disable)

  • This parameter sets the path to a file containing the map between switch guids and their names example /etc/userfriendlynames.txt. Structure: GUID "name" Example: 0x7cfe900300a35270 "ib-i2l2s09".

     IMPORTANT: If you wish to use this feature
    make sure there are no blank lines or special characters.

0.11. osuinam.daemon.conf

  • Default: Unset

  • This parameter sets the location of the osuinamd file used for osuinamd to be viewed in Debug Tab.

0.12. server.servlet.session.timeout

  • Default: 30 minutes

  • This parameter sets the Server Session timeout. If a duration suffix is not specified, seconds will be used.

0.13. spring.session.timeout

  • Default: 30 minutes

  • This parameter sets the Spring Session timeout. If a duration suffix is not specified, seconds will be used. After this time of inactivity, a login is required to access the website.

0.14. osuinam.osu.inam.enable.hca.query

  • Default: inactive (false)

  • This parameter, when set, tries to grab port counters based on the node’s GUID. Otherwise, it finds the switch GUID and port associated with the node and queries based on the switch GUID. It is recommended to leave it disabled. If you activate this feature you should set the associated parameter in the OSU INAM Daemon config file.

0.15. osuinam.pvar-desc-file

  • Default: /opt/osu-inam/etc/pvar_desc.csv

  • This parameter specifies the location of the file containing PVAR names and its description in csv format. pvar_desc.csv file comes with inam package.

1. Data Source Parameters

The runtime parameters for configuring the data source (MySQL or InfluxDB here) are presented here for use in Spring Boot framework. Some values are common between osuinamd and osuinamweb like data sources login information and database name. Please check them to be the same in regard to the value that was set for the osuinamd configuration file and creating the database in MySQL or InfluxDB.

Data source configuration is controlled by configuration properties like `osuinam.datasource.* `.

Important
If you’re intending to use ClickHouse, you can use the same settings as MySQL. All you need to do is set dbtype to clickhouse and update the osuinam.datasource.url to reflect the host and port 9004. For instance: jdbc:mysql://localhost:9004/osuinamdb.

1.0.1. osuinam.dbtype

  • Default: mysql

  • Values: mysql, influx, or clickhouse

  • This parameter selects the database type (MySQL, InfluxDB, or ClickHouse) that osuinamweb will connect to.

Note
If you specify influx or clickhouse as the value, ensure to use the correct RPM of OSU INAM.

1.1. InfluxDB Parameters

1.1.1. osuinam.influx.url

  • Default: Unset (must be set, example: http://localhost:8086)

  • This parameter sets the URL of InfluxDB that osuinamweb will connect to.

1.1.2. osuinam.influx.database

  • Default: Unset (must be set, example: osuinamdb)

  • This parameter sets the database that InfluxDB inside osuinamweb will connect to.

1.1.3. osuinam.influx.username

  • Default: Unset (must be set)

  • This parameter sets the login username for the database.

1.1.4. osuinam.influx.password

  • Default: Unset (must be set)

  • This parameter sets the login password for the database.

1.1.5. osuinam.influx.retentionDays

  • Default: 7

  • This parameter sets the number of days that InfluxDB will retain data for measurements.

1.2. MySQL Parameters

1.2.1. osuinam.datasource.url

  • Default: Unset (must be set)

  • This parameter sets the JDBC URL of MySQL that osuinamweb will connect to.

1.2.2. osuinam.datasource.username

  • Default: Unset (must be set)

  • This parameter sets the login username for the database.

1.2.3. osuinam.datasource.password

  • Default: Unset (must be set)

  • This parameter sets the login password for the database.

1.2.4. osuinam.datasource.driver-class-name

  • Default: Unset

  • This parameter sets the fully qualified name of the JDBC driver. It is auto-detected based on the URL by default.

2. Apache Tomcat Parameters

The following parameters are related to the Apache Tomcat server.

Tip
osuinamweb uses Apache Tomcat. For additional settings, refer to the Tomcat connection pool documentation.
Note
To set additional settings for Tomcat, you need to use the osuinam.datasource prefix. A few examples are mentioned below.

2.0.1. osuinam.datasource.initial-size

  • Default: 10

  • This parameter sets the initial number of connections that are created when the Tomcat connection pool is started.

2.0.2. osuinam.datasource.max-active

  • Default: 100

  • This parameter sets the maximum number of active connections that can be allocated from this pool at the same time.

2.0.3. osuinam.datasource.remove-Abandoned

  • Default: false

  • This parameter sets a flag to remove abandoned connections if they exceed the removeAbandonedTimeout.

2.0.4. osuinam.datasource.removeAbandonedTimeout

  • Default: 60 seconds

  • This parameter sets the timeout in seconds before an abandoned (in use) connection can be removed.

2.0.5. spring.mvc.async.request-timeout

  • Default: 900000 milliseconds (15 minutes)

  • This parameter keeps the HTTP connection alive for a certain time in milliseconds - currently set to 15 minutes.

3. Logging Parameters

The following parameters are for logging settings for osuinamweb.

3.0.1. logging.file

  • Default: system messages.

  • This parameter sets the name of the log file location for osuinamweb.

3.0.2. logging.path

  • This parameter sets the directory to which log files are written.

3.0.3. logging.level.edu.osu.inam

  • Default: INFO

  • This parameter sets the logging level for the osuinamweb.

4. PhantomJS Parameters

As mentioned in Advanced Usage Instructions, osuinamweb uses PhantomJS to accelerate rendering of the network. PhantomJS is not required for functionality of OSU INAM. The runtime parameters for PhantomJS are presented here.

Note
All of PhantomJS config variables are unset by default.

4.0.1. phantomjs.execdir

  • This parameter sets the path where the phantomjs bin is placed.

4.0.2. phantomjs.runjs

  • This parameter should be the explicit path to the inam.js that is provided in the lib folder of the OSU INAM installation, usually /opt/osu-inam/lib/inam.js.

4.0.3. phantomjs.filedir

  • This parameter sets the location of the phantomjs output for the pre-rendering. It’s recommended to ensure that this directory exists.

4.0.4. phantomjs.cachefile

  • This parameter sets the location of the file to cache the final phantomjs output. On the next restart, the web application would use the cached data and not perform the rendering.

5. Job Scheduler Parameters

5.1. PBS Job Scheduler Parameters

This section presents the runtime parameters related to the PBS job scheduler for osuinamweb, including the query interval of the job scheduler and timeout for jobs.

Note
You must always specify if you are using PBS or not, even if you want to use SLURM as the job scheduler. All of these parameters should exist in your configuration file for osuinamweb.
Note
SLURM configuration parameters are located in osuinamd (Section 6.4).

5.1.1. osuinam.usingPbs

  • Default: false (disabled)

  • This parameter specifies if PBS should be used to get live jobs' information. The qstat command is run on the system where osuinamweb is running to get the jobs information. When PBS is enabled, please make sure qstat and qselect are on the PATH variable.

5.1.2. osuinam.completionTimeout

  • Default: 90 seconds

  • This parameter specifies the timeout that PBS will use to mark a job as complete if it does not find it in the qstat output.

5.1.3. osuinam.batchClusters

  • No default

  • This parameter specifies the batch servers that PBS should use to get live jobs information. This option is used when users have different batch servers for their clusters.

  • Example: @batch1 @batch2 @batch3

5.1.4. osuinam.pbsQueryInterval

  • Default: 30 seconds

  • This parameter specifies how often the jobs information must be pulled in from PBS.

5.1.5. osuinam.qstatPath

  • No default

  • This parameter specifies the path to the directory that contains the qstat command for the PBS component.

  • Example: /opt/torque/bin

5.2. SLURM Job Scheduler Parameters for InfluxDB

This section presents the runtime parameters related to the SLURM job scheduler for osuinamweb when using InfluxDB. If you are using MySQL, please refer to Section [OSU INAM SLURM Job Scheduler Configuration Parameters].

5.2.1. osuinam.using-slurm

  • Default: false (Disabled)

  • This parameter specifies if SLURM should be used to get live jobs information. For MySQL, the setting is in osuinamdaemon.

5.2.2. osuinam.slurm-query-interval

  • Default: 30 seconds

  • This parameter specifies how often the jobs information must be pulled in from SLURM.

5.2.3. osuinam.squeue-comd-path

  • No Default

  • This parameter specifies the path to the directory that contains the squeue command.

  • Example: /usr/bin

5.2.4. osuinam.enable-slurm-multi-servers

  • Default: 0

  • This parameter determines whether to use multi-batch servers for SLURM or not.

5.2.5. osuinam.slurm-clusters

  • No default

  • This parameter determines the names of different batch servers for SLURM in a comma-separated manner.

6. Email Notification Parameters

This section presents the runtime parameters related to the Email notification component.

6.0.1. spring.mail.host

  • No default

  • This parameter specifies the hostname of the Email server.

  • Example: email.nowlab.osu.edu

6.0.2. spring.mail.port

  • No default

  • This parameter specifies the port on which the Email server runs.

  • Example: 587

6.0.3. spring.mail.username

  • No default

  • This parameter specifies the username for Email account authentication. It must be specified if authentication is enabled, optional otherwise. - Example: service_account@osu.edu

6.0.4. spring.mail.password

  • No default

  • This parameter specifies the password associated with the Email account for authentication. It must be specified if authentication is enabled, optional otherwise. - Example: password

6.0.5. osuinam.notification-from-address

  • No default

  • This parameter specifies the from address to be set for Email notifications.

  • Example: inam@nowlab.osu.edu

6.0.6. spring.mail.properties.mail.smtp.auth

  • Default value: false

  • This parameter enables or disables Email account authentication.

  • Possible values: true/false

  • Example: true

6.0.7. spring.mail.properties.mail.smtp.starttls.enable

  • Default value: false

  • This parameter enables or disables TLS (optional).

  • Possible values: true/false

  • Example: true

6.0.8. spring.mail.properties.mail.smtp.connectiontimeout

  • Default value: 5000

  • This parameter specifies the connection timeout value for the email client in milliseconds.

  • Example: 5000

6.0.9. spring.mail.properties.mail.smtp.timeout

  • Default value: 5000

  • This parameter specifies the timeout value for the email client in milliseconds.

  • Example: 5000

6.0.10. spring.mail.properties.mail.smtp.writetimeout

  • Default value: 5000

  • This parameter specifies the write timeout value for the email client in milliseconds.

  • Example: 5000

6.0.11. osuinam.notification-purge-interval

  • Default value: 24

  • This parameter specifies the frequency in which notifications are purged in hours.

  • Example: 24

6.0.12. osuinam.notification-email-subject

  • Default value: OSU INAM Notification

  • This parameter specifies the Email subject for the notifications generated.

  • Example: OSU INAM Notification

6.0.13. osuinam.notification-email-prologue

  • Default value: Hello,

  • This parameter specifies the default Notification email prologue.

  • Example: Greetings,\n Alert from INAM

6.0.14. osuinam.notification-email-epilogue

  • Default value: -OSU INAM

  • This parameter specifies the default Notification email epilogue.

  • Example: -OSU INAM

7. List of Supported Network Metrics

The Network Metrics supported by OSU INAM v1.1 are listed below. These metrics can be broadly divided into three sets. The descriptions for InfiniBand port and error counters have been obtained from the InfiniBand Specification Release 1.2.1 by the InfiniBand Trade Association. The counter will reset after reading.

7.1. Switch Counters

The following node-level counters are queried from the InfiniBand Switches:

Note
If the reading interval is larger than the query interval of port counters, the data from the database will be summed and represented as one data point on the charts.
  • Xmit Data

    • The total number of data octets(bytes) transmitted on all VLs from the port. This includes all octets between (and not including) the start of the packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets. Xmit data is 64-bit counters from the InfiniBand(IB) level.

  • Rcv Data

    • The total number of data octets(bytes) received on all VLs from the port. This includes all octets between (and not including) the start of the packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets. Rcv data is 64-bit counters from the IB level.

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

  • Xmit Pkts

    • The total number of data packets transmitted on all VLs from the port. This includes all packets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

  • Rcv Pkts

    • The total number of data packets received on all VLs from the port. This includes all packets between (and not including) the start of packet delimiter and the VCRC, and may include packets containing errors. Excludes all link packets.

7.2. Process Level Counters

MVAPICH2-X collects additional information about the process’s network usage which can be displayed by OSU INAM. The following counters are currently supported:

  • Xmit Data

    • Total number of bytes transmitted as part of the MPI application

  • Rcv Data

    • Total number of bytes received as part of the MPI application

  • Max [Xmit Data/Rcv Data]

    • Maximum of the two values above

  • Point to Point Send

    • Total number of bytes transmitted as part of MPI point-to-point operations

  • Point to Point Rcvd

    • Total number of bytes received as part of MPI point-to-point operations

  • Max [Point to Point Sent/Rcvd]

    • Maximum of the two values above

  • Coll Bytes Sent

    • The total number of bytes transmitted as part of MPI collective operations

  • Coll Bytes Rcvd

    • The total number of bytes received as part of MPI collective operations

  • Max [Coll Bytes Sent/Rcvd]

    • Maximum of the two values above

  • RMA Bytes Sent

    • Total number of bytes transmitted as part of MPI RMA operations. Note that due to the nature of the RMA operations, bytes received for RMA operations cannot be counted

  • RC VBUF

    • The number of internal communication buffers used for reliable connection (RC)

  • UD VBUF

    • The number of internal communication buffers used for unreliable datagram (UD)

  • VM Size

    • Total number of bytes used by the program for its virtual memory

  • VM Peak

    • Maximum number of virtual memory bytes for the program

  • VM RSS

    • The number of bytes resident in the memory (Resident set size)

  • VM HWM

    • The maximum number of bytes that can be resident in memory (Peak resident set size or High watermark)

7.3. Error Counters

The following error counters are available both at switch and process level:

  • SymbolErrors

    • The total number of minor link errors detected on one or more physical lanes

  • LinkRecovers

    • The total number of times the Port Training state machine has successfully completed the link error recovery process

  • LinkDowned

    • The total number of times the Port Training state machine has failed the link error recovery process and downed the link

  • RcvErrors

    • The total number of packets containing an error that was received on the port. These errors include:

      • Local physical errors

      • Malformed data packet errors

      • Malformed link packet errors

      • Packets discarded due to buffer overrun

  • RcvRemotePhysErrors

    • The total number of packets marked with the EBP delimiter received on the port.

  • RcvSwitchRelayErrors

    • The total number of packets received on the port that were discarded because they could not be forwarded by the switch relay

  • XmtDiscards

    • The total number of outbound packets discarded by the port because the port is down or congested. Reasons for this include:

      • The output port is not in the active state

      • Packet length exceeded NeighborMTU

      • Switch Lifetime Limit exceeded

      • Switch HOQ Lifetime Limit exceeded This may also include packets discarded while in VLStalled State.

  • XmtConstraintErrors

    • The total number of packets not transmitted from the switch physical port for the following reasons:

      • FilterRawOutbound is true and the packet is raw

      • PartitionEnforcementOutbound is true and packet fails partition key check or IP version check

  • RcvConstraintErrors

    • The total number of packets not received from the switch physical port for the following reasons:

      • FilterRawInbound is true and the packet is raw

      • PartitionEnforcementInbound is true and packet fails partition key check or IP version check

  • LinkIntegrityErrors

    • The number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors

  • ExcBufOverrunErrors

    • The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error

  • VL15Dropped

    • The number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port

8. Advanced Usage Instructions

8.1. Making OSU INAM visible outside of a firewalled environment

The following snippets should work in basic scenarios where the OSU INAM server is sitting behind a firewalled or NAT’d environment. Please exercise caution as this could expose the server to larger, less secure networks or otherwise upset your network administrators.

Iptables
-A PREROUTING -p tcp -d <external ip> --dport 8080 -j DNAT --to <tomcat server>:8080
-A POSTROUTING -p tcp -d <tomcat server> --dport 8080 -j SNAT --to-source <external ip>
Apache
ProxyPass /inam/ http://<tomcat server>:8080/
ProxyPassReverse /inam/ http://<tomcat server>:8080/
Nginx
server {
    listen 8080 default_server;
    server_name X;
    }
    location /inam {
        rewrite ^/inam(.*)$ $1 break;
        proxy_pass http://<tomcat server>:8080;
    }

8.2. Speed up the network map rendering by using PhantomJS

PhantomJS is a headless WebKit that allows OSU INAM to pre-render the network graph so it loads much quicker. The modifications necessary to do this are minimal.

8.2.1. Required Packages

Add the following parameters to your /etc/osu-inam/osu-inam.properties file

/etc/osu-inam/osu-inam.properties
#phantomjs
#execdir is the path you placed the phantomjs bin
phantomjs.execdir=/path/to/phantomjs/bin/
#runjs should be the explicit path to the inam.js that is provided in the root of the download tarball
phantomjs.runjs=/path/to/inam.js
#filedir is the location of the phantomjs output for the pre-rendering
phantomjs.filedir=/path/to/phantomjs/working/dir
#cachefile is the location of the file to cache the final phantomjs
#output. On the next restart, the web application will use the cached data and not
#perform the rendering
phantomjs.cachefile=/path/to/cachefile
Note
Be sure to make the PhantomJS binary executable, the runjs file readable, and the filedir writeable by your web server. Place the vis.js from the root of the tarball in the same directory as inam.js.

After the positions are calculated by PhantomJS, the cachefile will be generated by the web application.

Once finished, restart the webserver to pull in the new settings and on the next visit to /network/, the view should be rendered nearly instantly.

PhantomJS execution for rendering the network graph happens during the web application’s deployment. This might affect the web application deployment load time. However, this is an ONE TIME COST. For subsequent deployments, the web application will load the network information from the cache file. The time taken by PhantomJS to render network for the first time is a factor of the complexity of the network and the number of nodes.

Projected ONE TIME Web Application Deployment Time with PhantomJS

These estimates are based on testing with PhantomJS 2.0.0 on a dual socket Intel E5630 with 12GB of memory.

Number of Nodes Number of Switches Network Topology Approximate Time

178

20

Full Fat-Tree

1 min

1879

212

Hybrid Fat-Tree

30 mins

9. Best Practices with OSU INAM

9.1. Deployment Recommendations

Based on our experience and feedback we have received from our users, here we include some of the best practices for deploying OSU INAM. If you have any of your own best practices related to OSU INAM, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

In this section, we provide some guidance on the proper resource management for all three components of OSU INAM. OSU INAM consists of Daemon, storage (MySQL), and web-based front end. The node resources like hardware cores should be divided in a fair way to all the components to avoid bottlenecks on one.

There is a trade-off between increasing the availability of the web front and increasing the performance of profiling. The more users use the tool, the more read traffic goes through the storage component and it can impact its availability and delay. Considering that MySQL locks the tables for insertions and concurrent insertions are sequential, having a very low interval of querying by increasing the number of threads can cause starvation for the read queries from the website. When all OSU INAM Daemon components are active there is variation in the performance of OSU INAM Daemon. The issue can be reduced by choosing a proportional core allocation for the OSU INAM daemon.

9.1.1. How should I divide the CPU cores between OSU INAM components?

A challenging question for the OSU INAM deployment is the proper allocation of the number of thread based on the number of CPU cores for each module inside OSU INAM Daemon. Port counters and errors are the largest tables by storage inside MySQL. Two threads need to go over port counters and errors for purging old data. As a result, it is recommended to have 2 of the CPU cores for the purge thread to avoid performance fluctuation. Because job scheduler module and MPI\_T handler module are called less often, the two cores can handle them simultaneously.

Since HPC fabrics are mostly stable, the interval of fabric discovery could be set to a larger value. In that case, the user can depend on finding the errors on the links for fast failure discovery in the network view tab.

Such consideration will allow the allocating of more cores to the port inquiry module to maintain sub-second granularity. Based on the short interval of port inquiry and the importance of fine granularity, the CPU cores allocated for the port inquiry should not overlap with the cores used for threads for other components. In other words, the hardware cores allocated for port inquiry should not be oversubscribed. The remaining cores can be divided between fabric discovery and the OSU INAM web front.

The following table summarizes our suggestion deployment for core allocation and thread allocation for OSU INAM Daemon.

Important
Make sure to tune the number of threads in the OSU INAM Daemon config file based on the specifications of the node running OSU INAM.
Cluster size fabric discovery performance port inquiry MPI_T and job thread Purge thread

< 500

2

2+

1

2

500< size <1000

4

8+

1

2

> 1000

8

16+

2

2

9.2. MySQL Tuning Parameters

For the database, the following parameters can be tuned for better performance at different cluster sizes

MySQL Tuning Parameter Significance

innodb_flush_log_at_trx_commit

Controls the balance between strict ACID compliance for commit

innodb_buffer_pool_size

The size in bytes of the buffer pool, the memory area where InnoDB caches table and index data

innodb_log_buffer_size

The size in bytes of the buffer that InnoDB uses to write to the log files on disk

innodb_log_file_size

The size in bytes of each log file in a log group

9.2.1. Additional Steps Required Before Changing Number or Size of InnoDB Redo Log Files

  • Set innodb_fast_shutdown to 1

mysql> SET GLOBAL innodb_fast_shutdown = 1;
  • Stop MySQL server and ensure it finalizes without errors

  • Backup old log files if desired to enable restoring state

  • Delete old log files

  • Edit my.cnf file and add the lines listed below depending on your cluster size

  • Start MySQL server

9.2.2. Proposed Additions to OSU INAM and MySQL Configuration File for Clusters of Different Sizes

We list some recommended values to be set in my.cnf file for clusters of different sizes.

Additions to my.cnf file for small clusters (<100 nodes)
innodb_flush_log_at_trx_commit=2
Additions to my.cnf file for medium sized clusters (100-500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=4G
innodb_log_buffer_size=16M
innodb_log_file_size=256M
Additions to my.cnf file for large clusters (>500 nodes)
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=16G
innodb_log_buffer_size=32M
innodb_log_file_size=512M
Additions to /etc/osu-inam/osu-inamd.conf file for all cluster sizes
#The number of records to be inserted together during bulk insert.
OSU_INAM_DATABASE_BULK_SIZE=100

9.3. InfluxDB Tuning Parameters

For InfluxDB, we strongly recommend using the bulk insertion all the time with bulk size of 10K records. We unserstand that not all the tables would benefit from 10,000 batch insertion size but this tuning benefits while gathering metrics from MPI enabled jobs.

Additions to /etc/osu-inam/osu-inamd.conf file for optimized insertion of data into influxDB
## Use bulk inserts into db
OSU_INAM_DATABASE_BULK_ACTIVE=1

## Number of records for bulk inserts
OSU_INAM_DATABASE_BULK_SIZE=20000

We suggest the following changes for cluster sizes bigger than 1,000 nodes to apply for influxDB configuration. Please note that you must restart influxDB service for the changes to take effect. Note that the default location of influxDB configuration file is /etc/influxdb/influxdb.conf

Additions to influxdb.conf file for tuning influxDB
[coordinator]
  # The default time a write request will wait until a "timeout" error is
    returned to the caller.
  # If you keep seeing timeout in your influxDB logs please increase the value.
   write-timeout = "2m"

[retention]
  # Determines whether retention policy enforcement enabled.
  enabled = true

  # The interval of time when retention policy enforcement checks run.
  check-interval = "30m"

[http]

  # The maximum size of a client request body, in bytes. Setting this value to 0 disables the limit.
   max-body-size = 50000000

[continuous_queries]
  # Determines whether the continuous query service is enabled.
  enabled = true

  # Controls whether queries are logged when executed by the CQ service.
  log-enabled = true

  # Controls whether queries are logged to the self-monitoring data store.
  query-stats-enabled = true

  # interval for how often continuous queries will be checked if they need to run
  run-interval = "1s"

# Make sure to restart influxDB service after

10. FAQ and Troubleshooting with OSU INAM

Based on our experience and feedback we have received from our users, here we include some of the problems a user may experience and the steps to resolve them. If you are experiencing any other problem, please feel free to contact us by sending an email to mvapich-help@cse.ohio-state.edu

10.1. General Questions and Troubleshooting

10.1.1. Do I need PhantomJS to run INAM?

PhantomJS is not required for functionality of OSU INAM. It is optional and helps to accelerate rendering of the network view page.

10.1.2. Install OSU INAM to a specific location

OSU INAM RPMs are relocatable. Please use the --prefix option during RPM installation for installing MVAPICH2-X into a specific location. An example is shown below:

rpm -ivh --prefix <specific-location or $OSU_INAM_INSTALL_PREFIX> osu-inam-mysql-1.0-1.el7.x86_64.rpm

10.1.3. Where can I find the log messages generated by OSU INAM?

OSU INAM will push all the log messages it generates to ‘/var/log/messages’

10.1.4. Why is the web server taking a long time to load?

OSU INAM uses PhantomJS for caching the rendered network graph with the aim of speeding up subsequent deployments. PhantomJS is not required for functionality of OSU INAM. This caching happens when the web application is deployed for the first time. Please refer to Speed up the network map rendering by using PhantomJS (Section 9.2) for more details.

10.1.5. Why is the web server showing network view mapping buggy?

Please check the database to see if the number of nodes are equal to number of compute nodes plus switches in your network. If the numbers match, Please delete the PhantomJS cache file and restart it. If the issue still pertains, please contact us and include your configuration files and the log of OSU INAM Daemon and osuinamweb. If the numbers does not match, then the problem is with OSU INAM Daemon. Use the proper debugging level and contact us if necessary.

10.1.6. I have installed PhantomJS, but my webpage is still rendering very slowly

Here we list some possible reason why the webpage rendering can take more time than expected even though PhantomJS has been installed correctly.

  • Incorrect permissions to the directories

    • The user running the web app should be able to write to and read from the directory pointed by phantomjs.filedir

  • Using incorrect inam.js file

    • The phantomjs.runjs variable in /etc/osu-inam.properties file should point to the inam.js file included in the tarball

  • vis.js and inam.js not present in the same directory

    • The vis.js file and the inam.js file should be in the same directory

Please refer to the Speed up the network map rendering by using PhantomJS (Section 9.2) section for more details on how to correctly setup PhantomJS for use with OSU INAM.

10.1.7. Does OSU INAM support any other job scheduler besides SLURM or PBS?

At present, OSU INAM only supports SLURM and PBS Torque. We have plans to bring in support for other job launchers in the future.

10.1.8. Will OSU INAM work without a supported scheduler?

OSU INAM has been designed so that features that do not depend on the job scheduler (eg: viewing the network counters) will work even without a supported job scheduler. However, the live pages and job related pages will not work.

10.1.9. What is the difference between port counters and process counters?

Port counters are the InfiniBand counters and are considered as hardware counters. Process counters are the software counters that are provided by using MVAPICH-X. users should be using the appropriate version of the MVAPICH2-X RPM built with the support for advanced features to use this feature.

Please use the following order while starting OSU INAM and related services

  • Create the database

  • Start up the OSU INAM daemon

  • Once the nodes and links tables are populated by the OSU INAM daemon, deploy the web application

Please use the following order while stopping OSU INAM and related services

  • Stop the web application

  • Stop the OSU INAM daemon

  • Destroy the database

    • This step is only required if you do not want to use OSU INAM again.

    • otherwise, you can skip it.

10.1.11. How can I control the size of the database?

OSU INAM can automatically purge data that is older than a user-defined period of time from the current time from the database. There is a parameter OSU_INAM_DATA_RETENTION_PERIOD that controls this. You can set to any desired value. By default, its set to seven days. You can reduce it to a lower value - like one day.

There is another parameter OSU_INAM_PURGE_QUERY_INTERVAL that tells the daemon how frequently it should check for older data. The default value for this is 3600 seconds. You can modify this as well.

Once you’ve changed the value, please restart the daemon so that it takes effect.

10.1.12. Where should I get MV2_TOOL_INFO_FILE_PATH to pass to MVAPICH2-x jobs? and how to modify it?

The file is generated by default in OSU_INAM_install_path/etc/ folder. By the default the install location is /opt/osu-inam/etc/. You can copy and modify the configuration file that you pass to MVAPICH2-x jobs. Please note that you must not change MV2_TOOL_QPN and MV2_TOOL_LID parameters. The rest of the parameters can be changed based on your job.

10.1.13. Why there is no MPI primitive information reported?

First please check pvar_info table inside MySQL and see if it contains data. If there is no data inside pvar_info table, then it is possible that you have forgot to set MV2_TOOL_REPORT_PVARS for MVAPICH2-x jobs inside file pointer that you passed to MV2_TOOL_INFO_FILE_PATH. Make sure that you have MV2_TOOL_REPORT_PVARS=1.

10.1.14. Why there is no Lustre traffic reported?

First please check lustre_stats table inside MySQL and see if the table contains data. If there is no data inside lustre_stats table, then it is possible that you have forgot to set MV2_TOOL_REPORT_LUSTRE_STATS for MVAPICH2-x jobs inside file pointer that you passed to MV2_TOOL_INFO_FILE_PATH. By the default, Lustre traffic is not reported and the user should set it by chaning MV2_TOOL_REPORT_LUSTRE_STATS=1.

10.1.15. Why does INAMD not exit after being shut down?

INAMD checks for exit signals at fixed intervals specified by OSU_INAM_PERF_COUNTER_QUERY_INTERVAL (default value: 30 seconds). Thus, a shutdown command may not take effect immediately.

10.1.16. I have errors on different pages with MySQL incompatibility with sql_mode=only_full_group_by.

If you’re getting any error messages saying "Expression #N of SELECT list is not in GROUP BY … this is incompatible with sql_mode=only_full_group_by" on any of the web pages, it means that the MySQL server is configured to not allow group-by select queries that have non-aggregated columns in the select list. INAM requires that MySQL server be configured without this mode. More information about changing the mode in MySql 5.7 can be found here