Nagios Exercises PART I ----------------------------------------------------------------------- 1. Install Nagios version 3 This may have already been done for you. Skip to #2 $ sudo apt-get install nagios3 Create the Web user password file: $ sudo htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin New password: Re-type new password: Set the password to "tldadmin!" 2. You should already have a working Nagios! - Open a browser, and go to http://10.X.2.1/nagios3/ - At the login prompt, login as: user: nagiosadmin pass: tldadmin! - Overview->Map in left navigation column is a good place to start. 3. Let's look at the interface together... Login to your NOC (10.X.2.1) # cd /etc/nagios3/ # ls -l -rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf -rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg -rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d -rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users -rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg -rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets # ls -l conf.d/ -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg -rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg -rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg -rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg Notice that not all files have been updated for the nagios3 package, and still refer to the nagios2 package. PART II ----------------------------------------------------------------------- 1. Let's start monitoring your infrastructure. # cd /etc/nagios3/conf.d/ # vi noc.cfg Add the following: define host { use generic-host host_name noc alias NOC for GroupX address 10.X.2.1 } ... Save and quit # vi dns.cfg Add the folllowing: define host { use generic-host host_name dns alias DNS for GroupX address 10.X.1.1 } ... Save and quit 2. Let's create new hostgroups for the occasion, and add our hosts to them. - Edit the file hostgroups_nagios2.cfg and add two new groups: # vi hostgroups_nagios2.cfg define hostgroup { hostgroup_name servers alias TLD Servers members noc, dns } ... Save and quit 3. Now let's associate some services to the hosts # vi services_nagios2.cfg - Find the section called "check that ssh services are running", and change the line: hostgroup_name ssh-servers to hostgroup_name ssh-servers, servers ... Save and quit 4. Verify that your configuration file is OK: # nagios3 -v /etc/nagios3/nagios.cfg ... You should get : Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check 5. Reload/Restart Nagios $ sudo service nagios3 restart 6. Go to the web interface (http://10.X.2.1/nagios3) and check the hosts you just added. e.g., Overview->Map 7. Add DNS checking to your DNS host: # cd /etc/nagios3/conf.d # vi services_nagios2.cfg - Add at the end a new service definition. In the check_command statement below, replace the "MYTLD" with your tld. # check that dns services are running define service { hosts dns service_description DNS check_command check_dig!www.MYTLD. use generic-service notification_interval 0 ; set > 0 if you want to be renotified } ... Save and quit - Please note that in the above service definition we assign a single host "dns" via the "hosts" statement, rather than assign an entire hostgroup. Why ? Because only the host "dns" is running the DNS service, and assigning a hostgroup here would result in Nagios complaining that others hosts in the hostgroup are not running DNS. Check and restart Nagios. # nagios3 -v /etc/nagios3/nagios.cfg # service nagios3 restart ========== For this class we will STOP HERE. =================== You get the idea. Nagios is a very powerful framework for your organization and is used by many. 8. Now let's define the parent-child relationships for your network. - This is achieved by adding the "parents" statement to the host definitions you have created: define host { use generic-host host_name ... alias ... address ... parents HOSTNAME } - Note that the parents statement can take more than one parameter, in case you have redundant paths through your network. - In fact, you only have one host definition to edit: Remember, that if you have the following topology: DNS NOC(NOC) | | +----+----+ | | GrpX-RTR <- (parent of ISP-RTR) -- also called "gateway" | | ISP-RTR ... then, seen from the point of view of Nagios: - GrpX-RTR is the parent of ISP-RTR - GrpX-RTR doesn't have a parent, since it's on the same LAN as NOC/Nagios and therefore directly reachable (there are no routers between them) Therefore, you only need to add the parents statement to the isp-rtr.cfg file: $ sudo vi ips-rtr.cfg Add the "parents gateway" statement to the host definition define host { use generic-host host_name isp-rtr alias ISP Router address 192.168.96.1 parents gateway } ("gateway" is the name given to your default gateway when nagios is installed on Ubuntu, as you can see in /etc/nagios3/conf.d/host-gateway_nagios3.cfg PART III ----------------------------------------------------------------------- 1.) We will update our Nagios contacts definion, "/etc/nagios3/conf.d/contacts_nagios3.cfg" to add a local user to that will receive mails for certain condition. 2.) Edit the file "/etc/nagios3/conf.d/contacts_nagios2.cfg": $ sudo vi /etc/nagios3/conf.d/contacts_nagios2.cfg Change the email for the "root" contact from: root@localhost to monitoring@localhost (save and quit) 3.) Once you have updated your contacts_nagios2.cfg file, then run the Nagios pre-flight check: $ sudo nagios3 -v /etc/nagios3/nagios.cfg If it all looks good, then restart Nagios: $ sudo service nagios3 restart Now, let's test that things work... Login to the GrpX-rtr with ssh $ ssh tldadmin@10.X.1.254 $ configure # configure terminal # set interface ethernet eth0 disable # commit Wait 5 minutes -- how does Nagios react ? Bring the interface back up # del interface ethernet eth0 disable # commit Wait 5 minutes. What to you observe again ? What about the status map ? PART IV ------- 1. Let's define some checks to handle what we until now have used SWATCH to monitor. We'll limit this to the SSH attempts and the Cisco config changes. To do this, we start by defining two new services: - One for the GrpX-RTR on your network - One for the SSH service on your NOC Let's edit /etc/nagios3/conf.d/services_nagios2.cfg and add these definitions at the end: $ sudo vi /etc/nagios3/conf.d/services_nagios2.cfg define service { hosts gateway service_description CONFIG_ALERT check_command check_ping use generic-service active_checks_enabled 0 passive_checks_enabled 1 max_check_attempts 1 is_volatile 1 flap_detection_enabled 0 notification_interval 0 ; set > 0 if you want to be renotified } define service { hosts noc service_description SSH_ALERT check_command check_ping use generic-service active_checks_enabled 0 passive_checks_enabled 1 max_check_attempts 1 is_volatile 1 flap_detection_enabled 0 notification_interval 0 ; set > 0 if you want to be renotified } ... Save and Quit - Notice how we explicitly override the defaults, enabling passive checks, and disabling active checking. We set the "check_ping" command for the check_command, otherwise Nagios complains, but this command is never called. We'll also talk about the is_volatile, ax_check_attempts and flad_detection_enabled parameters. 2. We need to make Nagios accept commands from external programs. To do this, we edit /etc/nagios3/nagios.cfg. $ sudo vi /etc/nagios3/nagios.cfg - Find the line check_external_commands=0 and change it to: check_external_commands=1 - Save and quit - Then run: $ sudo /etc/init.d/nagios3 stop $ sudo dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw $ sudo dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3 $ sudo /etc/init.d/nagios3 start 3. Finally, we need a program SWATCH can call to submit alerts to NAGIOS. $ sudo vi /usr/local/bin/submit_check_result Put the following lines into it: #!/usr/bin/perl -w use strict; if (@ARGV != 4) { print "usage: $0 host service value extra_output[|performance_info]"; exit 0; } my $host = $ARGV[0]; my $service = $ARGV[1]; my $value = $ARGV[2]; my $mesg = $ARGV[3]; my $time = time; open CMD, ">>/var/lib/nagios3/rw/nagios.cmd" || die; print CMD "[$time] PROCESS_SERVICE_CHECK_RESULT;$host;$service;$value;$mesg\n"; close CMD; - Save and quit, then run: $ sudo chmod +x /usr/local/bin/submit_check_result 4. Last, we need to modify /etc/swatch.conf so that we can now use the submit_check_result script to submit results from SWATCH to Nagios: $ sudo pico /etc/swatch.conf - Find the rules for "Invalid SSH Login Attempts" - Replace the line with "mail=..." with the following: exec /usr/local/bin/submit_check_result noc SSH_ALERT 1 "src: $4" - Save and quit - Restart swatch $ sudo kill -9 `ps ax | grep swatch | grep -v grep | awk '{ print $1 }'` $ sudo swatch -c /etc/swatch.conf --tail-file=/var/log/everything --daemon - Now make sure swatch is running: $ ps ax | grep -i swatch - If swatch is running, you should see a line like: 12274 ? Ss 0:00 /usr/bin/swatch -c /etc/swatch.conf --tail-file=/var/log/everything --daemon 5. A bit later we will implement the SWATCH rules for the "Cisco config" event. We will see this in the next session. After, you can come back here and based on the above example, you will add the SWATCH rules to notify Nagios in case of an Config attack, sending the notification for the "GrpX-rtr" host on the "CONFIG_ALERT" service, using the "exec" command shown above.