STEP Monitoring Recommended Practices

It is important to monitor STEP for reliability and performance efficiency. These are the recommended practices for monitoring STEP and integrating with third-party management system tools. The following topics are discussed in this section:

  • STEP Sensors
  • Log File Monitoring
  • Port Monitoring
  • Integrating STEP with Application Performance Monitoring (APM) Tools
  • Monitoring STEP Using Amazon Web Services
  • Monitoring STEP Using STEP's REST API

STEP Sensors

STEP Sensors provide the ability to monitor the availability and performance of STEP and are the preferred and simplest methods for integrating with third-party management system tools. There is a vast array of STEP Sensors that can be used to monitor various aspects of STEP and related dependencies. For example, below is a list of available sensor categories:

  • Java heap usage
  • HTTP
  • Event queues
  • Schedules jobs
  • STEP Web Services API
  • Security

Each sensor provides a separate URL that can be integrated with any monitoring tool that can make an HTTP request and parse the response. These sensor URLs do not require authentication and provide three output formats:

  1. Simple Status String ('OK,' 'WARNING,' and 'CRITICAL')
  2. Nagios Plugin Output
  3. Full XML

The format used will depend on the monitoring tool's level of sophistication and the ability to parse the output.

Example:

To monitor and integrate the HTTP sensor, Http-local

  1. Use this sensor URL for simple output:
http(s)://<APPLICATION_SERVER_NAME>/admin/monitoring/Http-local/status
  1. Use this sensor URL for Nagios plugin output:
http(s)://<APPLICATION_SERVER_NAME>/admin/monitoring/Http-local/nagios
  1. Use this sensor URL for full XML output:
http(s)://<APPLICATION_SERVER_NAME>/admin/monitoring/Http-local/xml

Note: In general, the URL format used to access a given sensor is as follows: http(s)://<APPLICATION_SERVER_NAME>/admin/monitoring/<SENSOR_NAME>/<SENSOR_OUTPUT_FORMAT>, where <SENSOR_NAME> is the sensor’s name as it appears on the http(s)://<APPLICATION_SERVER_NAME>/admin/monitoring page and the <SENSOR_OUTPUT_FORMAT> is status, nagios, or xml.

You can review the available sensors or check the status of a sensor by logging into the STEP Administration Portal, selecting the Monitoring tab, and clicking the Sensors for external monitoring link at the bottom of the page. For further documentation on accessing the Administration portal – or STEP Sensors in general, refer to the STEP Administration Portal section.

Although sensors cannot be customized, it is possible to modify the thresholds of certain categories of sensors. Thresholds are set by adding additional properties and values to STEP’s property files. See the STEP System Configuration section (here) for additional details on this topic.

The table below provides a matrix of configurable sensors and their associated properties:

Properties (Listed by Category)

Threshold / Item (Default)

Event Queue Sensor

Monitor.EventQueue.NoOfUnreadEvents.Critical

100000

Monitor.EventQueue.NoOfUnreadEvents.Warning

80000

File System Sensor
Monitor.FileSystem.Critical

150 ms

Monitor.FileSystem.Warning

50 ms

STEP Scheduler Sensor
Monitor.ScheduledBackgroundProcess.MaxEndedAgeInDays

-1

Monitor.ScheduledBackgroundProcess.ReportWaitingAsWarning

TRUE

Oracle Sensor
Oracle.Sensor.AlertLog.Critical

None

Oracle.Sensor.AlertLog.Warning

None

Log File Monitoring

All STEP application logging is managed in a file in a central location on each application server. Housekeeping (file rotation, size of log files, and history) are configurable through STEP properties. By default, STEP will keep 20 log files in total, including the current log, and each one can be a maximum of 10 MB in size.

The default location for STEP logs is <STEP HOME>/diag/logs and the most current log will be step.0.log (the step.1.log -> step.19.log are the rotated logs).

Logging verbosity is set to 'INFO' by default, which means that all informational, warning, and severe actions will be logged.

Available STEP properties to manage log files

  1. Log Verbosity
Log.Level=<FINEST|FINER|FINE|CONFIG|INFO|WARNING|SEVERE>

Default: INFO

  1. Log Location
Log.Root=<PATH>

Default: diag/logs

  1. Log Size
Log.Size=<INTEGER_VALUE> (in MB)

Default: 10 MB

  1. Log Count
Log.Count=<INTEGER_VALUE>

Default: 20

Port Monitoring

The table below represents key TCP ports that are used by STEP and are available for monitoring:

TCP Port (Default)

Component

Server

80

Apache

STEP Application Server

443

Apache

STEP Application Server

9870

STEP Application

STEP Application Server

5636

STEP Cluster

STEP Application Server

22

Secure Shell (used by SPOT)

STEP Application Server

1521

Oracle Database

STEP Database

9090-9093

InDesign Sidecar

InDesign Server

Integrating STEP with Application Performance Monitoring (APM) Tools

To integrate STEP with APM tools - such as New Relic and Computerware's Dynatrace - STEP typically requires an external Java agent to be loaded alongside STEP's JVM. This can be achieved by using STEP's standalone.JVMArgs property along with any required parameters, path, and agent.

Example:

Integrate STEP with new Relic APM

Add the following property to the config.properties or sharedconfig.properties file:

Standalone.JVMArgs= -javaagent:/<PATH>/newrelic.jar

Important: Stibo Systems does not provide any support with loading external agents alongside STEP's JVM that might become unstable or cause instability in the STEP application itself.

Monitoring STEP Using Amazon Web Services

Amazon CloudWatch is a monitoring service for Amazon Web Services (AWS) resources and hosted applications that provides real-time, system-wide visibility into resource utilization, application performance, and overall operational health. Amazon CloudWatch can monitor AWS resources by providing out of the box metrics for Amazon EC2 instances, Amazon S3 volumes, and Amazon RDS Database instances, as well as the ability to generate and collect custom application metrics, either directly or indirectly via collected log files.

With the current trend of IT provisioning more and more IT services within AWS, Stibo Systems has recognized the need to tightly integrate its current application monitoring and performance capabilities with AWS’s CloudWatch offering by providing an AWS toolkit. The Stibo AWS Toolkit is a suite of utilities designed to integrate STEP’s application metrics, application logs, database tablespace usage, and statistics with CloudWatch’s functions for storing and visualizing metrics and defining alert based notifications.

The Stibo AWS Toolkit offers the following functionalities:

  • The ability to consume key log files generated by the STEP application into CloudWatch logs, enabling monitoring in near real time across multiple STEP instances for diagnosing and centralizing log management.
  • A mechanism for consuming Oracle's alert log into CloudWatch logs for centralized log management across multiple Oracle instances.
  • Custom metrics based on STEP's database tablespace usage that can be collected and viewed through CloudWatch.
  • A standard set of filtering metrics that provides custom metrics from STEP-generated log streams for collecting, visualizing, and defining alarms.
  • An AWS Lambda function for remotely monitoring STEP's sensors.
  • An architecture that enables it to run on STEP's application server, be delivered and applied / upgraded seamlessly using Stibo's Patch Operation Tool (SPOT).

For further information on using the AWS Toolkit, see theIntegrating STEP with AWS CloudWatch document.

Monitoring STEP Using STEP's REST API

Managing STEP Integration Endpoints (IEP) via external enterprise schedulers, such as BMC Control-M and Tivoli Workload Scheduler, is common practice within organizations. These organizations want the ability to schedule, control, and monitor batch processing where there are often dependencies between STEP and upstream / downstream systems.

Out of the box, STEP provides the ability to invoke an IEP remotely utilizing STEP’s REST API. However, this needs to be achieved programmatically (not covered by this guide) by first sequentially combining several API calls, invoking the required IEP queue, and continuously monitoring the status of the triggered background processes to its completion.

Note: STEP's REST API is a licensed component.

The following example details the required REST API requests to invoke an Integration Endpoint, gather information about the background process (BGP), and monitor the status of said process by its ID. For added security, STEP's REST API requires authentication.

To access a STEP server’s REST API, the required URL is formed by the protocol (http or https), the STEP application server name, and the corresponding REST API request URI, as seen below:

http(s)://<STEP_APPLICATION_SERVER>/<RESTAPI_URI>

Invoking an integration endpoint

When invoking an IEP, the <RESTAPI_URI> should be of this format:

restapi/integrationendpoints/<QUEUE_NAME>/invoke?context=<CONTEXT>

Note: Adding a context may not be required and depends on the configuration.

Retrieve background process information associated with integration endpoint queue

When retrieving information about an IEP’s background process, the following format should be used for the <RESTAPI_URI>:

restapi/integrationendpoints/<QUEUE_NAME>/backgroundprocesses?context=<CONTEXT>

Note: Adding a context may not be required and depends on the configuration.

Retrieve the status of a specific background process by ID

When checking the status of an IEP’s background process, the <RESTAPI_URI> format should be as follows:

restapi/bgpinstance/<BGP_ID>/status?context=<CONTEXT>

Note: Adding a context may not be required and depends on the configuration.