Skip to content
 

Automatic Monitoring with Puppet and Nagios

Part of my current push at work right now is trying to get some sort of configuration management system in place, and in a usable state. Part of the reason for wanting to do this is consistency of common configuration among the many systems and virtual machines I manage, especially when several serve essentially the same function (such as webserver VMs). Since I’m a fallible human being, it’s easier for me to get one webserver configured so that the non-content part of the setup always passes our security scans, and then replicated that to all the other webservers. This not only saves me time, and makes me more efficient, but helps when we do system audits, as I can easily prove that all N systems I manage that serve the same role have identical configurations.

Another push I’ve been working on for a few years now is comprehensive monitoring of our systems. When I started at my job, monitoring was being done through a home-grown script, that would run once an hour. While not particularly bad, it had some major shortcomings. Primarily, the hour lag time meant that, if a problem occurred shortly after the script ran, you wouldn’t know about it for nearly an hour. Another problem was that, while the script would make note of system problems, the only notification mechanism was “someone needs to visit the web page it generates”. During a normal workday, that might be OK, but evenings and weekends? So, at least for my benefit, I set up a monitoring system that could not only monitor systems more frequently, but monitor more systems, and send out alerts via email, cell phones, etc.

For reasons I won’t get into here, what I chose to use for configuration management was Puppet, and I’d already chosen Nagios as my monitoring system. So far, my biggest problem with Nagios has been finding the time to add new systems to it, figuring out what services to check, etc. It’s not a particularly difficult thing to do, but in the grand scheme of things, it was just something that always fell by the wayside in the drive to get more systems set up, deal with user problems, and put out the inevitable fires. That is, until recently.

My original Nagios install was done on an aging box that has been serving as a development web server for our group. I did the install shortly after I’d started, so had I been around longer, I might have either waited, put in a request for new hardware, or found a way to get virtualization into our environment sooner. Overall, it worked, but it was in general a pain dealing with the box, a RHEL3 server, since it was so far out of date. Recently, I’ve set up a VM on our XenServer cluster to be our new Nagios box. Since I’ve also been playing with Puppet, I wanted to automate things as much as possible, since every new system created should be managed by Puppet (though reality slightly differs from that).

Fortunately, the groundwork for automating Nagios monitoring with Puppet is already built in to Puppet. It took me a little bit to wrap my head around the concepts, but the example helped and served as a base.

Now, even though I’m still in the process of setting things up, I’ve gotten to the point where my new Nagios server is already monitoring about 30% more information about hosts (422 services vs 308 on the old server), even though the number of hosts is currently about half (48 hosts vs 99 on the old). All of it done automatically, and here’s how.

First, set up stored configurations on your puppetmaster. You’ll need to specify a database in which to store your puppet-collected facts and resources. While the default is SQLite, I ran into problems with concurrent access. Since I’m also currently responsible for the handful of MySQL servers we have, I decided to just use one of those. Create a database and user for puppet to use, then tell the puppetmaster about it. Your [puppetmasterd] section of puppet.conf should look something like this when you’re done:

[puppetmasterd]
templatedir = /var/lib/puppet/templates
storeconfigs = true
dbadapter = mysql
dbuser = the_user_you_set_up
dbpassword = the_password_for_dbuser
dbserver = the_database_server
#dbsocket = /var/run/mysqld/mysqld.sock
downcasefacts = true
 

Your paths may likely be different than mine. If your DB server is running on the same host as the puppetmasterd process, dbserver should be “localhost”, and you’d uncomment and adjust the path of the dbsocket line. The downcasefacts line is set to “true” so that I can make use of the $operatingsystem fact later on without having to muck with changing the case later.

Next, you’ll want to create a nagios module in Puppet. The Exported Resource example linked above served as my template, but I’ve made a few changes to it. My puppet/modules/nagios/manifests/init.pp file currently looks like this:

class nagios {

   package {
      ‘nagios3′:
         ensure  => installed,
         alias   => ‘nagios’,
         ;
   }

   service {
      ‘nagios3′:
         ensure  => running,
         alias   => ‘nagios’,
         hasstatus       => true,
         hasrestart      => true,
         require => Package[nagios],
   }

   # collect resources and populate /etc/nagios/nagios_*.cfg
   Nagios_host <<||>>
   Nagios_service <<||>>
   Nagios_hostextinfo <<||>>

   class target {
      @@nagios_host { $fqdn:
         ensure => present,
         alias => $hostname,
         address => $ipaddress,
         use => "generic-host",
      }

      @@nagios_hostextinfo { $fqdn:
         ensure => present,
         icon_image_alt => $operatingsystem,
         icon_image => "base/$operatingsystem.png",
         statusmap_image => "base/$operatingsystem.gd2",
      }

      @@nagios_service { "check_ping_${hostname}":
         use => "check_ping",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_users_${hostname}":
         use => "remote-nrpe-users",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_load_${hostname}":
         use => "remote-nrpe-load",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_zombie_procs_${hostname}":
         use => "remote-nrpe-zombie-procs",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_total_procs_${hostname}":
         use => "remote-nrpe-total-procs",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_swap_${hostname}":
         use => "remote-nrpe-swap",
         host_name => "$fqdn",
      }

      @@nagios_service { "check_all_disks_${hostname}":
         use => "remote-nrpe-all-disks",
         host_name => "$fqdn",
      }
   }
}
 

To use it, I simply do an include nagios in the node definition for my Nagios server in puppet, and in my basenode node definition, I’ve done an include nagios::target. Each of the @@ lines will collect information for each machine managed by puppet that inherits from basenode. The “collect resources and populate /etc/nagios/nagios_*.cfg” portion is the real magic, however. Each of those lines will cause puppet to collect all the matching resources, and output them to files in /etc/nagios. The only real caveat, which I also noticed in the example I built upon, is that I’m having trouble convincing puppet to reload nagios when the files are updated, which I just brute-force solved with a periodic cronjob to run nagios’ init script with “reload”.

I’m also slowly adding nagios entries for each service that puppet manages in some form. Currently, that means things like apache and ssh. For example, in my apache2 module’s init.pp, I have the following in my class:

   @@nagios_service { "check_http_${hostname}":
      use => "check-http",
      host_name => "$fqdn",
   }

   @@nagios_service { "check_http_processes_${hostname}":
      use => "remote-nrpe-httpd-procs",
      host_name => "$fqdn",
   }
 

This both monitors over-the-wire connections to port 80 on webservers, via the check-http command, but also monitors the number of httpd processes running on each host, via remote-nrpe-httpd-procs.

Similarly, for ssh, I have:

   @@nagios_service { "check_ssh_${hostname}":
      use => "check-ssh",
      host_name => "$fqdn",
   }
 

to monitor whether sshd is accepting connections on my systems.

And that, basically, is how I’m automatically monitoring all puppet-managed hosts in my environment. Whenever I set up a new host, I activate puppet on the host to ensure configurations I care about are synced to my master templates, and now as a bonus, puppet automatically tells nagios to start monitoring the services it knows about on the host. By expending a little extra effort once now, I’ve managed to be lazy later on multiple times over, truly something a Systems Administrator should be doing!

Be Sociable, Share!

25 Comments

  1. Social comments and analytics for this post…

    This post was mentioned on Twitter by magurski: New blog entry: Automatic Monitoring with Puppet and Nagios bit.ly/dy6Kh2View all comments by uberVU - social comments

  2. Zach Peters says:

    I’m not managing large setups like this, but this is a definite instant bookmark for me!

    Thanks for the great write up. View all comments by Zach Peters

  3. [...] This post was mentioned on Twitter by James Turnbull, Matt Simmons, Brandon Burton, John Arundel, patrickdebois and others. patrickdebois said: RT @LordCope: RT @magurski: New blog entry: Automatic Monitoring with Puppet and Nagios bit.ly/dy6Kh2 (via @ripienaar) [...] View all comments by Tweets that mention Automatic Monitoring with Puppet and Nagios – Mike's Place -- Topsy.com

  4. Nice write-up, exactly what I needed to get started with including nagios into my puppet setup. :) View all comments by Nikolay Sturm

  5. Gabès Jean says:

    Hi,

    You should look at Shinken, it’s a enhanced Nagios reimplementation in Python that allow you to have a quick and easy distributed and high availability monitoring environment, and of course with Nagios configuration and plugins compatibility :)

    It’s available (Open Source with a AGPL licence) at www.shinken-monitoring.org with even a demo virtual machine to test it in 5minutes :)

    Jean gabès, Shinken developper View all comments by Gabès Jean

  6. [...] Automatic Monitoring with Puppet and Nagios – Mike’s Place – September 1st %(postalicious-tags)( tags: puppet nagios monitoring sysadmin automation configuration hosting devops )% [...] View all comments by Delicious Bookmarks for September 1st from 14:01 to 14:18 « Lâmôlabs

  7. [...] Automatic Monitoring with Puppet and Nagios – Puppet has excellent support for the open-source Nagios monitoring framework, and you can have Puppet automatically generate Nagios configurations for monitoring your servers. [...] View all comments by Reply to comment | Bitfield Consulting

  8. [...] solution uses a lot from this blog post, and it’s references, my init.pp is modified to reference Icinga and to use [...] View all comments by Integrating icinga/nagios with puppet » Light at the end of the tunnel

  9. Aaron Brown says:

    Why use puppet exported resources instead of utilizing an external node classifier, which doesn’t make the puppet config unreadable? It’s also much more flexible. View all comments by Aaron Brown

  10. Cjeanneret says:

    Nice post!

    For information, we developed a puppet module wrapping naginator stuff: github.com/camptocamp/puppet-nagios
    It allows some more stuff, like exporting resources for more than one monitoring server, clean up resources when node is destroyed and so on.

    Feel free to take/use it, we already use it for three big infrastructures (around 100 srv per infra).

    Cheers,

    C. View all comments by Cjeanneret

  11. [...] to do it manually. If I have to do it a second time, I’ll take the time to either make it a wholly automated process, or at the very least, write a script to do the majority of the work for [...] View all comments by Being “appropriately lazy” « Mike's Place

  12. [...] Automatic Monitoring with Puppet and Nagios « Mike’s Place – Mike gives a quick run down on using Puppet to automate the configuration of hosts in Nagios. [...] View all comments by The Geekery » Bookmarks for July 15th through July 22nd - The Geekery

  13. [...] here. A good example of how to configure Puppet to generate Nagios configuration also gives Mike’s Place. And some ‘googling’ will give other [...] View all comments by Puppet stored configurations and Icinga « Some Softwaremanagement

  14. Danny says:

    Thanks for the post. Maybe I missed it, but I don’t any reference to managing the commands, contacts, contactgroups, escalation, etc. Are you managing those configurations outside of Puppert on the Nagios servers?

    Regards View all comments by Danny

  15. Wouter Verhelst says:

    You said:

    “The only real caveat, which I also noticed in the example I built upon, is that I’m having trouble convincing puppet to reload nagios when the files are updated, which I just brute-force solved with a periodic cronjob to run nagios’ init script with “reload”.”

    I’m sure you already know this after two years, but just in case you don’t: add

    notify => Service['nagios3']

    to each of your resource imports. E.g., you would do:

    Nagios_host <> {
    notify => Service['nagios3']
    }

    This way, when one of your files actually is updated, the nagios3 service will be restarted. View all comments by Wouter Verhelst

  16. Juan Carlo says:

    Hi this is great. But I dont know if this is correct or not. The .cfg files are being populated only on the client ?
    how do we get it to populate on the nagios server ? or do i have to manually copy it over View all comments by Juan Carlo

Leave a Reply