Fetching Outputs From Java Process Monitoring Tool with Icinga/Nagios
Photo by Mihai Lupascu on Unsplash
Recently, I encountered an issue when executing NRPE, a Nagios agent which runs on servers that are being monitored from Icinga’s head server. Usually NRPE-related calls should run without issues on the target server, since it is declared in the sudoers file (commonly /etc/sudoers
). In this post, I will cover an issue I encountered getting the output from jps (Java Virtual Machine Process Status Tool), which needed to be executed with root privileges.
Method
I wanted to use Icinga to get a Java process’s state (in this case, the process is named “Lucene”) from Icinga’s head server, remotely. jps
works for this, functioning similarly to the ps
command on Unix-like systems.
Usually, NRPE should be able to execute the remote process (on the target server) from Icinga’s head. In this case we are going to create a workaround through the following steps:
- Dump the Java process ID into a text file.
- Dump the running threads into another text file.
- Put item 1 and item 2 above into a single bash script.
- Create a cronjob to automatically run the bash script.
- Create an NRPE plugin to evaluate the output of item 1 and item 2.
Test
To illustrate this, I ran the intended command locally on the target server as the nagios user. Theoretically, this should emulate the NRPE call as if it was executed from Icinga’s server remotely. The file check_lucene_indexing_deprecated
was meant to demonstrate the NRPE execution failure, whereas check_lucene_indexing
is the file which is expected to run the NRPE plugin successfully. The paths to both check_lucene_indexing_deprecated
and check_lucene_indexing
were already declared in /etc/sudoers
file on the target machine.
To show the differences, I ran two different scripts from the Icinga’s head server.
Here is the output from both local script executions: first as the nagios user, then as the root user:
# sudo -s -u nagios ./check_lucene_indexing_deprecated
CRITICAL -- Lucene indexing down
# sudo -s -u root ./check_lucene_indexing_deprecated
OK -- 2 Lucene threads running
As you can see, the script worked fine running as root, but not as the nagios user.
Let’s run the scripts from Icinga’s head server:
# /usr/lib64/nagios/plugins/check_nrpe -t 5 -H <the target server’s FQDN> -c check_lucene_indexing_dep
CRITICAL -- Lucene indexing down (0 found)
# /usr/lib64/nagios/plugins/check_nrpe -t 5 -H <the target server’s FQDN> -c check_lucene_indexing
OK -- 2 Lucene threads running
In the background, we can see different output from running jps
on the target server using the root user compared to the nagios user. Let’s say I want to check the jps
process ID (PID):
# sudo -s -u nagios jps -l
29112 sun.tools.jps.Jps
And as root:
# jps -l
7541 /usr/share/jetty9/start.jar
29131 sun.tools.jps.Jps
The point of running the jps -l
command is to get the process ID of /usr/share/jetty9/start.jar
, which is 7541. However, as indicated above, the nagios user’s execution did not display the intended result, but the root user’s did.
The workaround
We can check the existence of the process ID by dumping it into a text file and letting the NRPE plugin read it instead.
In order to get NRPE to fetch the current state of the process, we will create a cronjob; in our case it will be executed every 10 minutes. This script will dump the PID of the Java process into a text file and later NRPE will run another script which will analyze the contents of the text file.
Cronjob, creating dump files
*/10 * * * * /root/bin/fetch_lucene_pid.sh
The cron script contains the following details:
PID_TARGET=/var/run/nrpe-lucene.pid
THREADS_TARGET=/var/run/nrpe-lucene-thread.txt
/usr/bin/jps -l | grep "start.jar" | cut -d' ' -f1 1>$PID_TARGET 2>/dev/null
PID=$(cat $PID_TARGET)
re='^[0-9]+$'
if [[ -z $PID ]] || ! [[ $PID =~ $re ]] ; then
exit 0
fi
THREADS=$(/usr/bin/jstack $PID | grep -A 2 "ProjectIndexer\|ConsultantIndexer" | grep -c "java.lang.Thread.State: WAITING (parking)")
echo $THREADS > $THREADS_TARGET
So instead of running the jps
command directly as nagios, we let the system run jps
(as root) and dump the result into a file. Our NRPE-based script will read the output later and feed the result to the dashboard.
NRPE plugin file, reading values generated from the cronjob
So we will take a look at what was written in the successfully executed Bash script (that is, check_lucene_indexing
).
The NRPE plugin file, check_lucene_indexing
, contains the following script:
#!/bin/bash
PID_TARGET=/var/run/nrpe-lucene.pid
THREADS_TARGET=/var/run/nrpe-lucene-thread.txt
PID=$(cat $PID_TARGET)
THREADS=$(cat $THREADS_TARGET)
re='^[0-9]+$'
if [[ -z $PID ]] || ! [[ $PID =~ $re ]] ; then
echo "CRITICAL -- Lucene indexing down (a)"
exit 2
fi
if [ $THREADS -eq 2 ]
then
echo "OK -- $THREADS Lucene threads running"
exit 0
else
echo "CRITICAL -- Lucene indexing down (b)"
exit 2
fi
From the NRPE plugin script you can see the following text files being used:
PID_TARGET=/var/run/nrpe-lucene.pid
THREADS_TARGET=/var/run/nrpe-lucene-thread.txt
PID_TARGET
contains the process’s PID, which I used to determine whether the intended process is running or not.
THREADS_TARGET
contains the number of the Java threads which are currently running.
The following is the content of the check_lucene_indexing_deprecated
script:
#!/bin/bash
PID=$(/usr/bin/jps -l | grep "start.jar" | cut -d' ' -f1)
if [[ -z $PID ]]; then
echo "CRITICAL -- Lucene indexing down"
exit 2
fi
THREADS=$(/usr/bin/jstack $PID | grep -A 2 "ProjectIndexer\|ConsultantIndexer" | grep -c "java.lang.Thread.State: WAITING (parking)")
if [ $THREADS -eq 2 ]
then
echo "OK -- $THREADS Lucene threads running"
exit 0
else
echo "CRITICAL -- Lucene indexing down"
exit 2
fi
As you can see, check_lucene_deprecated
was able to get the result if it is being executed locally on the target machine - but not from the remote (Icinga’s head server). This is because jps
will provide limited results when executed as the nagios compared to the local root user. Note that I have defined the path of the script in the sudoers file prior to the script execution.
Defaults: nagios !requiretty
nagios ALL = NOPASSWD: /usr/local/lib/nagios/plugins/check_lucene_indexing
nagios ALL = NOPASSWD: /usr/local/lib/nagios/plugins/check_lucene_indexing_deprecated
Conclusion
The method which I shared above is just one of the ways to use jps reports with Icinga/Nagios plugins. As of now this solution works as expected. If you want to reuse the scripts, please customize them according to your environment to get the results you want. Also, as written in the documentation, getting the output by parsing the output from jps means we need to update the script any time jps changes its output format.
Please comment below if you have experience with jps and Icinga/Nagios, and tell us how you handle the reporting.
Related reading:
Comments