Performance Troubleshooting

Use the methods defined in the following table to troubleshoot performance issues. When all data is gathered, include the information on a Jira ticket assigned to R&D.

Performance Test

Windows

Linux / Solaris

IO usage

Process Explorer

sar - d

sar - d - f /var/adm/sa/sa dd

(where dd is the day of month)

iostat - zdxnk 2

CPU usage

Process Explorer

Task Manager

sar - u

sar - f /var/adm/sa/sa dd

(where dd is the day of month) top, mpstat - P - ALL 2

Memory usage

Process Explorer

v mstat sar - r - f /var/adm/sa/sa dd

(where dd is the day of month)

Ping

ping dbserver

ping – l 8000 dbserver

ping – l 32000 dbserver

ping – c5 dbserver

ping – c5 – s8000 dbserver

ping – c5 – s32000 dbserver

Latency Requirements

  • Max 0.2ms with a 64 bytes packet size
  • Max 1.0ms with a 32K packet size

Additional latency information is included in the Server Network Latency topic here.

Identify the Problem Layer

  1. From the Start Page, click the STEP System Administration button and supply the login credentials.
  2. On the Profiler tab, set the From and To date / time parameters to define when the performance issues were noticed.
  3. Click the Generate button to load the data.
  4. In the results section, expand the service methods to determine the issue:
  • 'com.stibo.customer' indicates that Custom Solutions should be involved.
  • a long-running SQL statement indicates that it is waiting for the Oracle database. See the Database Server section below.
  • 'com.stibo. ddsconnector' or 'com.stibo.idsconnector' indicates it is waiting for the DTP server. See the DTP Server section below.
  • a problem that occurs for only one specific user or group of users indicates that the problem is related to the user or workstation. See the User Interface / Client Application section below.
  • If none of the previous scenarios fit your issue, consider that the problem might be on the application server. See the Application Server section below.

Database Server

After reviewing the profiling data (as defined above), when the issue is a long-running SQL statement, follow these steps.

  1. From the Start Page, click the STEP System Administration button and supply the login credentials.
  2. On the Activity tab, set the Duration and Date / Time parameters according to when the performance issues were noticed.
  1. Click the Fetch data button to load the data.
  2. In the Details section, on the SQL tab, identify the same SQL statement with the same duration.
  3. Attach the SQL statement, including the execution plan, to the Jira issue.

For more information, see the Database Long-Lasting SQL Queries topic here.

Additional Steps

With the DB ToolBox (typically saved in the '/opt/stibo/step/admin/app-server-toolbox/' folder), you may be able to access the additional information outlined below.

  • Check if the system waits for locks in the database.
  • Check the CPU usage of the database server.
  • Check the I/O usage on the database server.
  • Examine the alert log for errors, as defined in the Database Server Alert Log topic (here).

Record this information using the Technical Support Tasks table in the Troubleshooting Checklists topic here.

DTP Server

Perform the following checks:

  • Check that the configuration for QuarkXPress Server / Adobe InDesignServer is using asset push. For more information, see the Asset Push topic in the Digital Assets documentation (here).
  • Check the logging configuration. The debug level should only be used when required as it will have a negative performance impact. For more information, see the Server Log File Settings topic (here).
  • Compare the number of DTP server renders with license, hardware, and user load.

In case of emergency, use the restart action command: restart dtp-servers.

Services representing sidecars can be restarted or even the server itself. Services should be checked after a restart.

Record this information using the Technical Support Tasks table in the Troubleshooting Checklists topic here.

Application Server

Perform the following checks:

  • Check the sensor warnings and critical errors, as defined in the Monitoring topic within the Administration Portal documentation (here).
  • Check the memory graph, as defined in the Activity topic within the Administration Portal documentation (here).

If constantly high with only small 'garbage collections'

  • Consider increasing heap memory.
  • Check for heap dumps on the server and put them in a location in garm.stibo.dk (default stibosw log on).
  • Check the CPU usage via Task Manager as well as the admin portal, as defined in the Activity topic in the Administration Portal documentation (here).

If CPU usage is high, check if image GraphicsMagick processing (gm.exe, etc.) are starting and stopping. If so, this can be caused by excessive image conversions. Increasing the sizes of image caches can potentially fix this.

  • Check the thread graph, as defined in the Activity topic in the Administration Portal documentation (here).

Compare the number of threads to the number of CPU cores in the application server. If the number of threads is higher, processes will queue and waits are experienced.

  • Ping the database server using ping command via the Help Desk Tasks table.
  • Trace route to dbserver to look for unwanted network equipment.
  • Check the number of background processes, as defined in the Analyze Background Processes topic (here).
  • Examine the IO usage of the application server. If it is high:
  • Check if the memory on the server is used up, i.e., more memory in use than real memory available will make the server swap.
  • Check if log files are filling up quickly. If so, check if debug logging is enabled, as defined in the Server Log File Settings topic (here).

Add all of this information to the Technical Support Tasks table in the Troubleshooting Checklists topic here.

User Interface / Client Application

Perform the following checks:

  • Examine task manager on the user workstation while reproducing the problem.
  • If the java process has high CPU usage > 90 percent. If so, the problem might be in the UI.
  • Do other processes have a high CPU usage? If so, ask the user to shut down these processes and attempt to reproduce.
  • Does the system have a memory usage beyond the real memory in the system? If so, shutdown the other processes until the usage is below real memory and ask the user to reproduce.
  • Reboot the PC and start only STEP on the PC to see if it can still be reproduced.
  • Check the Network Latency indicator in the lower right corner of the STEPworkbench client. If it is above 125ms, examine the network.
  • Open the Java Control Panel and enable ‘Show console’ and reproduce. Attach the console output to the Jira issue.

If these checks and adjustments do not solve the problem, add this information to the Technical Support Tasks table in the Troubleshooting Checklists topic here.