NAGIOS


Nagios installation and configuration


Let us discuss the overview, installation and configuration of Nagios, a powerful open source monitoring solution for host and services.

I. Overview of nagios
II. 8 steps for installing nagios on Linux:
  1. Download the nagios and plugins
  2. Take care of the prerequisites
  3. Create user and group for nagios
  4. Install nagios
  5. Configure the web interface
  6. Compile and install nagios plugins
  7. Start Nagios
  8. Login to web interface
III. Configuration files overview

I. Overview of Nagios

.
Nagios is a host and service monitor tool. Following are some of the features of nagios.
  • Monitor equipments such as servers, switches, routers, firewalls, power supply etc.
  • Monitor services such as disk space, cpu usage, memory usage, temperature of the equipment, HTTP, Mail, SSH etc.
  • Nagios can monitor pretty much anything. for e.g. host, services, databases, applications etc.
  • Nagios has an extensible plugin interface for monitoring user defined services. There are lot of plugins available for Nagios. Visit NagiosPlugins and NagiosExchange for review the available user developed plugins.
  • It can send out various notifications ( email, pager etc.) when the problem occurs and get resolved.
  • Web interface to view current status, notifications, problem history, log files etc.
Following is a partial screenshot of the nagios web dashboard:
Nagios Web UI
Fig: Nagios Web UI (click on the image to enlarge)

II. 8 steps for installing nagios on Linux:

1. Download the nagios and plugins

Download following files from Nagios.org and move to /home/downloads
  • nagios-3.0.1.tar.gz
  • nagios-plugins-1.4.11.tar.gz

2. Take care of the prerequisites

  • Make sure apache is working on the server by verifying from browser: http://localhost
  • Verify whether gcc is installed
[root@localhost]#rpm -qa | grep gcc
      gcc-3.4.6-8
      compat-gcc-32-3.2.3-47.3
      libgcc-3.4.6-8
      compat-libgcc-296-2.96-132.7.2
      compat-gcc-32-c++-3.2.3-47.3
      gcc-c++-3.4.6-8
  • Verify whether GD is installed
[root@localhost]# rpm -qa gd
      gd-2.0.28-5.4E

3. Create user and group for nagios

[root@localhost]# useradd nagios
[root@localhost]# passwd nagios
[root@localhost]# groupadd nagcmd
[root@localhost]# usermod -G nagcmd nagios
[root@localhost]# usermod -G nagcmd apache

4. Install nagios

[root@localhost]# tar xvf nagios-3.0.1.tar.gz
[root@localhost]# cd nagios-3.0.1
[root@localhost]# ./configure --with-command-group=nagcmd
[root@localhost]# make all
[root@localhost]# make install
[root@localhost]# make install-config
[root@localhost]# make install-commandmode
Following are some additional parameters that you can pass to ./configure to customize your installation. I used only --with-command-group as shown above.
--prefix  /opt/nagios Where to put the Nagios files
 --with-cgiurl  /nagios/cgi-bin Web server url where the cgi's will be available
 --with-htmurl  /nagios  Web server url where nagios will be available
 --with-nagios-user nagios  user account under which Nagios will run
 --with-nagios-group nagios  group account under which Nagios will run
 --with-command-group nagcmd  group account which will allow the apache user to submit
      commands to Nagios
At the end of the configure output, it will display a summary as shown below:
*** Configuration summary for nagios 3.0.1 05-28-2008 ***:

General Options:
-------------------------
Nagios executable:  nagios
Nagios user/group:  nagios,nagios
Command user/group:  nagios,nagcmd
Embedded Perl:  no
Event Broker:  yes
Install ${prefix}:  /usr/local/nagios
Lock file:  ${prefix}/var/nagios.lock
Check result directory:  ${prefix}/var/spool/checkresults
Init directory:  /etc/rc.d/init.d
Apache conf.d directory:  /etc/httpd/conf.d
Mail program:  /bin/mail
Host OS:  linux-gnu

Web Interface Options:
------------------------
HTML URL:  http://localhost/nagios/
CGI URL:  http://localhost/nagios/cgi-bin/
Traceroute (used by WAP):  /bin/traceroute

5. Configure the web interface.

[root@localhost]# make install-webconf
[root@localhost# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

6. Compile and install nagios plugins

[root@localhost]# tar xvf nagios-plugins-1.4.11.tar.gz
[root@localhost]# cd nagios-plugins-1.4.11
[root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
[root@localhost]# make
[root@localhost]# make install
Note: On Red Hat, the ./configure command mentioned above did not work and was hanging at the when it was displaying the message: checking for redhat spopen problem… Add –enable-redhat-pthread-workaround to the ./configure command as a work-around for the above problem as shown below.
[root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround

7. Start Nagios

  • Add the nagios to the startup routine:
[root@localhost]# chkconfig --add nagios
      [root@localhost]# chkconfig nagios on
  • Verify to make sure there are no errors in the nagios configuration file:
[root@localhost]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

      Total Warnings: 0
      Total Errors:   0
      Things look okay - No serious problems were detected during the pre-flight check
  • Start the nagios
[root@localhost]# service nagios start
      Starting nagios: done.

8. Login to web interface

Nagios Web URL: http://localhost/nagios/
Use the userid, password that was created from step#5 above.

III. Configuration files overview

.
The first configuration to modify is to change the default value of email address in /usr/local/nagios/etc/objects/contacts.cfg file to your email address.
Following are the three major configuration files located under /usr/local/nagios/etc
  1. nagios.cfg – This is the primary Nagios configuration file where lot of global parameters that controls the nagios can be defined.
  2. cgi.cfg - This files has configuration information for nagios web interface.
  3. resource.cfg – If you have to pass some sensitive information (username, password etc.) to a plugin to monitor a specific service, you can define them here. This file is readable only by nagios user and group.
Following are the other configuration files under /usr/local/nagios/etc/objects directory:
  • contacts.cfg: All the contacts who needs to be notified should be defined here. You can specify name, email address, what type of notifications they need to receive and what is the time period this particular contact should be receiving notifications etc.
  • commands.cfg – All the commands to check services are defined here. You can use $HOSTNAME$ and $HOSTADDRESS$ macro on the command execution that will substitute the corresponding hostname or host ip-address automatically.
  • timeperiods.cfg – Define the timeperiods. for e.g. if you want a service to be monitored only during the business hours, define a time period called businesshours and specify the hours that you would like to monitor.
  • templates.cfg – Multiple host or service definition that has similar characteristics can use a template, where all the common characteristics can be defined. Use template is a time saver.
  • localhost.cfg – Defines the monitoring for the local host. This is a sample configuration file that comes with nagios installation that you can use as a baseline to define other hosts that you would like to monitor.
  • printer.cfg – Sample config file for printer
  • switch.cfg – Sample config file for switch
  • windows.cfg – Sample config file for a windows machine
.

Monitor Remote Linux Host using Nagios



I. Overview
II. 6 steps to install Nagios plugin and NRPE on remote host.
  1. Download Nagios Plugins and NRPE Add-on
  2. Create nagios account
  3. Install Nagios Plugins
  4. Install NRPE
  5. Setup NRPE to run as daemon
  6. Modify the /usr/local/nagios/etc/nrpe.cfg
III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:
  1. Download NRPE Add-on
  2. Install check_nrpe
  3. Create host and service definition for remote host
  4. Restart the nagios service

I. Overview:

.
Following three steps will happen on a very high level when Nagios (installed on the nagios-servers) monitors a service (for e.g. disk space usage) on the remote Linux host.
  1. Nagios will execute check_nrpe command on nagios-server and request it to monitor disk usage on remote host using check_disk command.
  2. The check_nrpe on the nagios-server will contact the NRPE daemon on remote host and request it to execute the check_disk on remote host.
  3. The results of the check_disk command will be returned back by NRPE daemon to the check_nrpe on nagios-server.

Following flow summarizes the above explanation:

Nagios Server (check_nrpe) —–> Remote host (NRPE deamon) —–> check_disk
Nagios Server (check_nrpe) <—– Remote host (NRPE deamon) <—– check_disk (returns disk space usage)

II. 7 steps to install Nagios Plugins and NRPE on the remote host

.

1. Download Nagios Plugins and NRPE Add-on

Download following files from Nagios.org and move to /home/downloads:
  • nagios-plugins-1.4.11.tar.gz
  • nrpe-2.12.tar.gz

2. Create nagios account

[remotehost]# useradd nagios
[remotehost]# passwd nagios

3. Install nagios-plugin

[remotehost]# cd /home/downloads
[remotehost]# tar xvfz nagios-plugins-1.4.11.tar.gz
[remotehost]# cd nagios-plugins-1.4.11
[remotehost]# export LDFLAGS=-ldl

[remotehost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
[remotehost]# make
[remotehost]# make install

[remotehost]# chown nagios.nagios /usr/local/nagios
[remotehost]# chown -R nagios.nagios /usr/local/nagios/libexec/

Note: On Red Hat, For me the ./configure command was hanging with the the message:“checking for redhat spopen problem…”. Add --enable-redhat-pthread-workaround to the ./configure command as a work-around for the above problem.

4. Install NRPE

[remotehost]# cd /home/downloads
[remotehost]# tar xvfz nrpe-2.12.tar.gz
[remotehost]# cd nrpe-2.12

[remotehost]# ./configure
[remotehost]# make all
[remotehost]# make install-plugin
[remotehost]# make install-daemon
[remotehost]# make install-daemon-config
[remotehost]# make install-xinetd

5. Setup NRPE to run as daemon (i.e as part of xinetd):

  • Modify the /etc/xinetd.d/nrpe to add the ip-address of the Nagios monitoring server to the only_from directive. Note that there is a space after the 127.0.0.1 and the nagios monitoring server ip-address (in this example, nagios monitoring server ip-address is: 192.168.1.2)
only_from       = 127.0.0.1 192.168.1.2
  • Modify the /etc/services and add the following at the end of the file.
nrpe 5666/tcp # NRPE
  • Start the service
[remotehost]#service xinetd restart
  • Verify whether NRPE is listening
[remotehost]# netstat -at | grep nrpe
       tcp 0      0 *:nrpe *:*                         LISTEN
  • Verify to make sure the NRPE is functioning properly
[remotehost]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12

6. Modify the /usr/local/nagios/etc/nrpe.cfg

The nrpe.cfg file located on the remote host contains the commands that are needed to check the services on the remote host. By default the nrpe.cfg comes with few standard check commands as samples. check_users and check_load are shown below as an example.
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

In all the check commands, the “-w” stands for “Warning” and “-c” stands for “Critical”. for e.g. in the check_disk command below, if the available disk space gets to 20% of less, nagios will send warning message. If it gets to 10% or less, nagios will send critical message. Change the value of “-c” and “-w” parameter below depending on your environment.
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1

Note: You can execute any of the commands shown in the nrpe.cfg on the command line on remote host and see the results for yourself. For e.g. When I executed the check_disk command on the command line, it displayed the following:
[remotehost]#/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
DISK CRITICAL - free space: / 6420 MB (10% inode=98%);| /=55032MB;51792;58266;0;64741

In the above example, since the free disk space on /dev/hda1 is only 10% , it is displaying the CRITICAL message, which will be returned to nagios server.

III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:

.

1. Download NRPE Add-on

Download nrpe-2.12.tar.gz from Nagios.org and move to /home/downloads:

2. Install check_nrpe on the nagios monitoring server

[nagios-server]# tar xvfz nrpe-2.12.tar.gz
[nagios-server]# cd nrpe-2.1.2
[nagios-server]# ./configure
[nagios-server]# make all
[nagios-server]# make install-plugin

./configure will give a configuration summary as shown below:

*** Configuration summary for nrpe 2.12 05-31-2008 ***:

General Options:
————————-
NRPE port: 5666
NRPE user: nagios
NRPE group: nagios
Nagios user: nagios
Nagios group: nagios

Note: I got the “checking for SSL headers… configure: error: Cannot find ssl headers” error message while performing ./configure. Install openssl-devel as shown below and run the ./configure again to fix the problem.
[nagios-server]# rpm -ivh openssl-devel-0.9.7a-43.16.i386.rpm krb5-devel-1.3.4-47.i386.rpm zlib-devel-1.2.1.2-1.2.i386.rpm e2fsprogs-devel-1.35-12.5.
el4.i386.rpm
warning: openssl-devel-0.9.7a-43.16.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e
Preparing… ########################################### [100%]
1:e2fsprogs-devel ########################################### [ 25%]
2:krb5-devel ########################################### [ 50%]
3:zlib-devel ########################################### [ 75%]
4:openssl-devel ########################################### [100%]
Verify whether nagios monitoring server can talk to the remotehost.
[nagios-server]#/usr/local/nagios/libexec/check_nrpe -H 192.168.1.3
NRPE v2.12

Note: 192.168.1.3 in the ip-address of the remotehost where the NRPE and nagios plugin was installed as explained in Section II above.

3. Create host and service definition for remotehost

Create a new configuration file /usr/local/nagios/etc/objects/remotehost.cfg to define the host and service definition for this particular remotehost. It is good to take the localhost.cfg and copy it as remotehost.cfg and start modifying it according to your needs.
host definition sample:
define host{
use linux-server
host_name remotehost
alias Remote Host
address 192.168.1.3
contact_groups admins
}

Service definition sample:
define service{
use generic-service
service_description Root Partition
contact_groups admins
check_command check_nrpe!check_disk
}
Note: In all the above examples, replace remotehost with the corresponding hostname of your remotehost.

4. Restart the nagios service

Restart the nagios as shown below and login to the nagios web (http://nagios-server/nagios/) to verify the status of the remotehost linux sever that was added to nagios for monitoring.
[nagios-server]# service nagios reload

Monitor Remote Windows Machine Using Nagios on Linux


I. Overview
II. 4 steps to install nagios on remote windows host
  1. Install NSClient++ on the remote windows server
  2. Modify the NSClient++ Service
  3. Modify the NSC.ini
  4. Start the NSClient++ Service
III. 6 configuration steps on nagios monitoring server
  1. Verify check_nt command and windows-server template
  2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios.cfg
  3. Modify /usr/local/nagios/etc/objects/windows.cfg
  4. Define windows services that should be monitored.
  5. Enable Password Protection
  6. Verify Configuration and Restart Nagios.

I. Overview

.
Following three steps will happen on a very high level when Nagios (installed on the nagios-server) monitors a service (for e.g. disk space usage) on the remote Windows host.
  1. Nagios will execute check_nt command on nagios-server and request it to monitor disk usage on remote windows host.
  2. The check_nt on the nagios-server will contact the NSClient++ service on remote windows host and request it to execute the USEDDISKSPACE on the remote host.
  3. The results of the USEDDISKSPACE command will be returned back by NSClient++ daemon to the check_nt on nagios-server.

Following flow summarizes the above explanation:

Nagios Server (check_nt) —–> Remote host (NSClient++) —–> USEDDISKSPACE
Nagios Server (check_nt) <—– Remote host (NSClient++) <—– USEDDISKSPACE (returns disk space usage)

II. 4 steps to setup nagios on remote windows host

.

1. Install NSClient++ on the remote windows server

Download NSCP 0.3.1 (NSClient++-Win32-0.3.1.msi) from NSClient++ Project. NSClient++ is an open source windows service that allows performance metrics to be gathered by Nagios for windows services. Go through the following five NSClient++ installation steps to get the installation completed.

(1) NSClient++ Welcome Screen

(2) License Agreement Screen

(3) Select Installation option and location. Use the default option and click next.

NSClient++ Install Screen

(4) Ready to Install Screen.  Click on Install to get it started.

(5) Installation completed Screen.

2. Modify the NSClient++ Service
Go to Control Panel -> Administrative Tools -> Services. Double click on the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ service and select the check-box that says “Allow service to interact with desktop” as shown below.
NSClient++ Service Modification

3. Modify the NSC.ini

(1) Modify NSC.ini and uncomment *.dll: Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment everything under [modules] except RemoteConfiguration.dll and CheckWMI.dll
[modules]
;# NSCLIENT++ MODULES
;# A list with DLLs to load at startup.
;  You will need to enable some of these for NSClient++ to work.
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
; *                                                               *
; * N O T I C E ! ! ! - Y O U   H A V E   T O   E D I T   T H I S *
; *                                                               *
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
NRPEListener.dll
SysTray.dll
CheckEventLog.dll
CheckHelpers.dll
;CheckWMI.dll
;
; RemoteConfiguration IS AN EXTREM EARLY IDEA SO DONT USE FOR PRODUCTION ENVIROMNEMTS!
;RemoteConfiguration.dll
; NSCA Agent is a new beta module use with care!
NSCAAgent.dll
; LUA script module used to write your own "check deamon" (sort of) early beta.
LUAScript.dll
; Script to check external scripts and/or internal aliases, early beta.
CheckExternalScripts.dll
; Check other hosts through NRPE extreme beta and probably a bit dangerous! :)
NRPEClient.dll

(2) Modify NSC.ini and uncomment allowed_hosts. Edit the C:\Program Files\NSClient++\NSC.ini file and Uncomment allowed_host under settings and add the ip-address of the nagios-server.
;# ALLOWED HOST ADDRESSES
;  This is a comma-delimited list of IP address of hosts that are allowed to talk to the all daemons.
;  If leave this blank anyone can access the deamon remotly (NSClient still requires a valid password).
;  The syntax is host or ip/mask so 192.168.0.0/24 will allow anyone on that subnet access
allowed_hosts=192.168.1.2/255.255.255.0
Note: allowed_host is located under [Settings], [NSClient] and [NRPE] section. Make sure to change allowed_host under [Settings] for this purpose.

(3) Modify NSC.ini and uncomment port. Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment the port# under [NSClient] section
;# NSCLIENT PORT NUMBER
;  This is the port the NSClientListener.dll will listen to.
port=12489

(4) Modify NSC.ini and specify password. You can also specify a password the nagios server needs to use to remotely access the NSClient++ agent.
[Settings]
;# OBFUSCATED PASSWORD
;  This is the same as the password option but here you can store the password in an obfuscated manner.
;  *NOTICE* obfuscation is *NOT* the same as encryption, someone with access to this file can still figure out the
;  password. Its just a bit harder to do it at first glance.
;obfuscated_password=Jw0KAUUdXlAAUwASDAAB
;
;# PASSWORD
;  This is the password (-s) that is required to access NSClient remotely. If you leave this blank everyone will be able to access the daemon remotly.
password=My2Secure$Password

4. Start the NSClient++ Service

Start the NSClient++ service either from the Control Panel -> Administrative tools -> Services -> Select “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ and click on start (or) Click on “Start -> All Programs -> NSClient++ -> Start NSClient++ (Win32) . Please note that this will start the NSClient++ as a windows service.

Later if you modify anything in the NSC.ini file, you should restart the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ from the windows service.

III. 6 configuration steps on nagios monitoring server

.

1. Verify check_nt command and windows-server template

Verify that the check_nt is enabled under /usr/local/nagios/etc/objects/commands.cfg
# 'check_nt' command definition
define command{
command_name    check_nt
command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}

Verify that the windows-server template is enabled under /usr/local/nagios/etc/objects/templates.cfg
# Windows host definition template - This is NOT a real host, just a template!
define host{
name                    windows-server  ; The name of this host template
use                     generic-host    ; Inherit default values from the generic-host template
check_period            24x7            ; By default, Windows servers are monitored round the clock
check_interval          5               ; Actively check the server every 5 minutes
retry_interval          1               ; Schedule host check retries at 1 minute intervals
max_check_attempts      10              ; Check each server 10 times (max)
check_command           check-host-alive        ; Default command to check if servers are "alive"
notification_period     24x7            ; Send notification out at any time - day or night
notification_interval   30              ; Resend notifications every 30 minutes
notification_options    d,r             ; Only send notifications for specific host states
contact_groups          admins          ; Notifications get sent to the admins by default
hostgroups              windows-servers ; Host groups that Windows servers should be a member of
register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}

2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios.cfg

# Definitions for monitoring a Windows machine
cfg_file=/usr/local/nagios/etc/objects/windows.cfg

3. Modify /usr/local/nagios/etc/objects/windows.cfg

By default a sample host definition for a windows server is given under windows.cfg, modify this to reflect the appropriate windows server that needs to be monitored through nagios.
# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation

define host{
use             windows-server              ; Inherit default values from a template
host_name   remote-windows-host      ; The name we're giving to this host
alias            Remote Windows Host     ; A longer name associated with the host
address       192.168.1.4                   ; IP address of the remote windows host
}

4. Define windows services that should be monitored.

Following are the default windows services that are already enabled in the sample windows.cfg. Make sure to update the host_name on these services to reflect the host_name defined in the above step.
define service{
use                     generic-service
host_name               remote-windows-host
service_description     NSClient++ Version
check_command           check_nt!CLIENTVERSION
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Uptime
check_command           check_nt!UPTIME
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     CPU Load
check_command           check_nt!CPULOAD!-l 5,80,90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Memory Usage
check_command           check_nt!MEMUSE!-w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     C:\ Drive Space
check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     W3SVC
check_command           check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Explorer
check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}

5. Enable Password Protection

If you specified a password in the NSC.ini file of the NSClient++ configuration file on the Windows machine, you’ll need to modify the check_nt command definition to include the password. Modify the /usr/local/nagios/etc/commands.cfg file and add password as shown below.
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s My2Secure$Password -v $ARG1$ $ARG2$
}

6. Verify Configuration and Restart Nagios.

Verify the nagios configuration files as shown below.
[nagios-server]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Restart nagios as shown below.
[nagios-server]# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

[nagios-server]# /etc/rc.d/init.d/nagios start
Starting nagios: done.

Verify the status of the various services running on the remote windows host from the Nagios web UI (http://nagios-server/nagios) as shown below.
Nagios Web UI - Remote Windows Host Status

Monitor Network Switch and Ports Using Nagios


Nagios is hands-down the best monitoring tool to monitor host and network equipments. Using Nagios plugins you can monitor pretty much monitor anything.

I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.

Earlier I wrote about, how to setup Nagios to monitor Linux HostWindows Host and VPN device.

In this article, I’ll explain how to configure Nagios to monitor network switch and it’s active ports.

1. Enable switch.cfg in nagios.cfg

Uncomment the switch.cfg line in /usr/local/nagios/etc/nagios.cfg as shown below.
[nagios-server]# grep switch.cfg /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg

2. Add new hostgroup for switches in switch.cfg

Add the following switches hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file.
define hostgroup{
hostgroup_name  switches
alias           Network Switches
}

3. Add a new host for the switch to be monitered

In this example, I’ve defined a host to monitor the core switch in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your switch ip-address accordingly.
define host{
use             generic-switch
host_name       core-switch
alias           Cisco Core Switch
address         192.168.1.50
hostgroups      switches
}

4. Add common services for all switches

Displaying the uptime of the switch and verifying whether switch is alive are common services for all switches. So, define these services under the switches hostgroup_name as shown below.
# Service definition to ping the switch using check_ping
define service{
use                     generic-service
hostgroup_name          switches
service_description     PING
check_command           check_ping!200.0,20%!600.0,60%
normal_check_interval   5
retry_check_interval    1
}

# Service definition to monitor switch uptime using check_snmp
define service{
use                     generic-service
hostgroup_name          switches
service_description     Uptime
check_command           check_snmp!-C public -o sysUpTime.0
}

5. Add service to monitor port bandwidth usage

check_local_mrtgtraf uses the Multil Router Traffic Grapher – MRTG. So, you need to install MRTG for this to work properly. The *.log file mentioned below should point to the MRTG log file on your system.
define service{
use           generic-service
host_name   core-switch
service_description Port 1 Bandwidth Usage
check_command  check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10
}

6. Add service to monitor an active switch port

Use check_snmp to monitor the specific port as shown below. The following two services monitors port#1 and port#5. To add additional ports, change the value ifOperStatus.n accordingly. i.e n defines the port#.

# Monitor status of port number 1 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 1 Link Status
check_command        check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}

# Monitor status of port number 5 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 5 Link Status
check_command        check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB
}

7. Add services to monitor multiple switch ports together

Sometimes you may need to monitor the status of multiple ports combined together. i.e Nagios should send you an alert, even if one of the port is down. In this case, define the following service to monitor multiple ports.
# Monitor ports 1 - 6 on the Cisco core switch.
define service{
use                   generic-service
host_name             core-switch
service_description   Ports 1-6 Link Status
check_command         check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB, -o ifOperStatus.4 -r 1 -m RFC1213-MIB, -o ifOperStatus.5 -r 1 -m RFC1213-MIB, -o ifOperStatus.6 -r 1 -m RFC1213-MIB
}

8. Validate configuration and restart nagios

Verify the nagios configuration to make sure there are no warnings and errors.
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
Restart the nagios server to start monitoring the VPN device.
# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

# /etc/rc.d/init.d/nagios start
Starting nagios: done.
Verify the status of the switch from the Nagios web UI: http://{nagios-server}/nagios as shown below:
[Nagios GUI for Network Switch]
Fig: Nagios GUI displaying status of a Network Switch

9. Troubleshooting

Issue1: Nagios GUI displays “check_mrtgtraf: Unable to open MRTG log file” error message for the Port bandwidth usage
Solution1: make sure the *.log file defined in the check_local_mrtgtraf service is pointing to the correct location.

Issue2: Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.
Solution2: Make sure both net-snmp and net-snmp-util packages are installed. In my case, I was missing the net-snmp-utils package and installing it resolved this issue as shown below.
[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2

[nagios-server]# rpm -ivh net-snmp-utils-5.1.2-11.EL4.10.i386.rpm
Preparing...       ########################################### [100%]
1:net-snmp-utils   ########################################### [100%]

[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
net-snmp-utils-5.1.2-11.EL4.10
Note: After you’ve installed net-snmp and net-snmp-utils, re-compile and re-install nagios plugins as explained in “6. Compile and install nagios plugins” in the Nagios 3.0 jumpstart guide.

setup url or website monitoring in nagios server

First of all create a configuration directory for writing the rules. You can also create the rules in localhost.cfg but I recommend  to create a separate directory and create the files in it.

#mkdir /etc/nagios/monitor_websites
and cd to this directory

And create file host.cfg in this directory for setting the urls.
#vi host.cfg

Suppose I want to monitor three sites
www.abc.com, www.xyz.com, www.pqr.com

Configure host.cfg as below.
#vi host.cfg

define host{
host_name  abc.com
alias         abc
address    www.abc.com
use        generic-host
}

define host{
host_name  xyz.com
alias      xyz
address    www.xyz.com
use        generic-host
}

define host{
host_name  pqr.com
alias           pqr
address    www.pqr.com
use        generic-host
}

#Defining group of urls  - you should add this if you want to set up an HTTP check service.
define hostgroup {
hostgroup_name    monitor_websites
alias           monitor_urls
members         www.abc.com, www.xyz.com, www.pqr.com
}
:wq #save it

And now create the file services.cfg for setting the service ( http_check )

#vi services.cfg
## Hostgroups services ##
define service {
hostgroup_name                 monitor_websites
service_description             HTTP
check_command                 check_http
use                             generic-service
notification_interval           0
}

Now give the permissions for directory and configuration files.
#chown  -R nagios:nagios monitor_websites

List and check.
[root@mail nagios]#  ll monitor_websites
total 16
-rw-r--r-- 1 nagios nagios 669 Apr 25 23:13 host.cfg
-rw-r--r-- 1 nagios nagios 253 Apr 25 23:15 services.cfg
[root@mail nagios]#

Now give the configuration directory path in main nagios configuration file.
#vi /etc/nagios/nagios.cfg
cfg_dir=/etc/nagios/monitor_websites
:wq

Now restart the nagios service.
#service nagios restart

1 comment:

  1. Thank you for sharing more valuable information on nagios to learn more about this check it once at Devops Online Course Bangalore.

    ReplyDelete