Performance Diagnosis

Info

A monitoring system is strongly recommended to track the environment health and the quality of services.

Diagnostic Utility

Scope Name Used for
OCP OpenShift Monitoring Service OpenShift Cluster and MAS
DB2 IBM DSM DB2 Historical and Realtime Troubleshooting
DB2 db2top DB2 Realtime Troubleshooting
DBTest DBTest An utility to test db network latency and fetching time
Oracle AWR, StatsPack Historical Troubleshooting
JVM IBM Support Assistant Heap Dump and GC Log Analysis
JVM MAT JVM Dump Analysis
Maximo PerfMon - Maximo UI Activity Tracing
- Note: Enabling PerfMon may significantly degrade server performance.
- Recommend for a single user with Dev/Test env only
MongoDB mongotop MongoDB Realtime Troubleshooting
HAR HTTP Archive Viewer HAR Analysis - for web page and client side (browser) performance
SQL Poor SQL Online SQL Formatter
SQL Squirrl Universal SQL Client
SSL SSL Shopper Online certificate decode tool
OS top Process and thread level analysis, hotspot analysis - top is available in most containers and on OCP worker nodes
OS sar a system command be used to monitor system resources like cpu, memory, disk, network...
OCP oc debug node/<node name> Worker node debugging

Factors in system performance

System performance depends on more than the applications and the database. The network architecture affects performance. Application server configuration can hurt or improve performance. The way that you deploy Maximo across servers affects the way the products perform. Many other factors come into play in providing the end-user experience of system performance. Subsequent sections in this paper address the following topics:

  • System architecture setup including OCP, Instance Type, Storage
  • App and DB server configuration
  • Network issues
  • Bandwidth
  • Load balancing
  • Database tuning
  • SQL tuning
  • Scheduled tasks (cron tasks)
  • Reporting
  • Integration with other systems using the integration framework
  • Troubleshooting

Performance Check List

  • check node status. e.g. any NOT Ready worker nodes
  • if there is any pod or node cpu, memeory usage approaching to the limit?
  • if there is any pod restarted many time recently?
  • if there is any JVM Heapdump dump?
  • if there is any JVM Hung Thread
  • if there is any node or pod with a high system or IO wait (20%)?
  • if there is any node memory, disk or pid pressure?
  • if the response time is high (over 2 sec)?
  • if any long running (over 2 sec) or high cpu cost query?
  • if there is network bottleneck (e.g. load-balancer)
  • is app server or db server busy?
    • if app server is busy
      • check the request, limit value for cpu, memory
      • should replic memebers be increased?
    • if db server is busy
      • check cpu, memory, disk current usage and limit value
      • check any utility in the background. e.g. backup
      • check db lock
      • check if there is any high cost query
      • check disk performance