Part of my current push at work right now is trying to get some sort of configuration management system in place, and in a usable state. Part of the reason for wanting to do this is consistency of common configuration among the many systems and virtual machines I manage, especially when several serve essentially the same function (such as webserver VMs). Since I’m a fallible human being, it’s easier for me to get one webserver configured so that the non-content part of the setup always passes our security scans, and then replicated that to all the other webservers. This not only saves me time, and makes me more efficient, but helps when we do system audits, as I can easily prove that all N systems I manage that serve the same role have identical configurations.
Another push I’ve been working on for a few years now is comprehensive monitoring of our systems. When I started at my job, monitoring was being done through a home-grown script, that would run once an hour. While not particularly bad, it had some major shortcomings. Primarily, the hour lag time meant that, if a problem occurred shortly after the script ran, you wouldn’t know about it for nearly an hour. Another problem was that, while the script would make note of system problems, the only notification mechanism was “someone needs to visit the web page it generates”. During a normal workday, that might be OK, but evenings and weekends? So, at least for my benefit, I set up a monitoring system that could not only monitor systems more frequently, but monitor more systems, and send out alerts via email, cell phones, etc.
For reasons I won’t get into here, what I chose to use for configuration management was Puppet, and I’d already chosen Nagios as my monitoring system. So far, my biggest problem with Nagios has been finding the time to add new systems to it, figuring out what services to check, etc. It’s not a particularly difficult thing to do, but in the grand scheme of things, it was just something that always fell by the wayside in the drive to get more systems set up, deal with user problems, and put out the inevitable fires. That is, until recently.
My original Nagios install was done on an aging box that has been serving as a development web server for our group. I did the install shortly after I’d started, so had I been around longer, I might have either waited, put in a request for new hardware, or found a way to get virtualization into our environment sooner. Overall, it worked, but it was in general a pain dealing with the box, a RHEL3 server, since it was so far out of date. Recently, I’ve set up a VM on our XenServer cluster to be our new Nagios box. Since I’ve also been playing with Puppet, I wanted to automate things as much as possible, since every new system created should be managed by Puppet (though reality slightly differs from that).
Fortunately, the groundwork for automating Nagios monitoring with Puppet is already built in to Puppet. It took me a little bit to wrap my head around the concepts, but the example helped and served as a base.
Now, even though I’m still in the process of setting things up, I’ve gotten to the point where my new Nagios server is already monitoring about 30% more information about hosts (422 services vs 308 on the old server), even though the number of hosts is currently about half (48 hosts vs 99 on the old). All of it done automatically, and here’s how.
First, set up stored configurations on your puppetmaster
. You’ll need to specify a database in which to store your puppet-collected facts and resources. While the default is SQLite, I ran into problems with concurrent access. Since I’m also currently responsible for the handful of MySQL servers we have, I decided to just use one of those. Create a database and user for puppet to use, then tell the puppetmaster about it. Your [puppetmasterd]
section of puppet.conf
should look something like this when you’re done:
[puppetmasterd]
templatedir = /var/lib/puppet/templates
storeconfigs = true
dbadapter = mysql
dbuser = the_user_you_set_up
dbpassword = the_password_for_dbuser
dbserver = the_database_server
#dbsocket = /var/run/mysqld/mysqld.sock
downcasefacts = true
Your paths may likely be different than mine. If your DB server is running on the same host as the puppetmasterd process,
dbserver
should be “localhost”, and you’d uncomment and adjust the path of the dbsocket
line. The downcasefacts
line is set to “true” so that I can make use of the $operatingsystem
fact later on without having to muck with changing the case later.
Next, you’ll want to create a nagios
module in Puppet. The Exported Resource example linked above served as my template, but I’ve made a few changes to it. My puppet/modules/nagios/manifests/init.pp
file currently looks like this:
class nagios {
package {
'nagios3':
ensure => installed,
alias => 'nagios',
;
}
service {
'nagios3':
ensure => running,
alias => 'nagios',
hasstatus => true,
hasrestart => true,
require => Package[nagios],
}
# collect resources and populate /etc/nagios/nagios_*.cfg
Nagios_host <<||>>
Nagios_service <<||>>
Nagios_hostextinfo <<||>>
class target {
@@nagios_host { $fqdn:
ensure => present,
alias => $hostname,
address => $ipaddress,
use => "generic-host",
}
@@nagios_hostextinfo { $fqdn:
ensure => present,
icon_image_alt => $operatingsystem,
icon_image => "base/$operatingsystem.png",
statusmap_image => "base/$operatingsystem.gd2",
}
@@nagios_service { "check_ping_${hostname}":
use => "check_ping",
host_name => "$fqdn",
}
@@nagios_service { "check_users_${hostname}":
use => "remote-nrpe-users",
host_name => "$fqdn",
}
@@nagios_service { "check_load_${hostname}":
use => "remote-nrpe-load",
host_name => "$fqdn",
}
@@nagios_service { "check_zombie_procs_${hostname}":
use => "remote-nrpe-zombie-procs",
host_name => "$fqdn",
}
@@nagios_service { "check_total_procs_${hostname}":
use => "remote-nrpe-total-procs",
host_name => "$fqdn",
}
@@nagios_service { "check_swap_${hostname}":
use => "remote-nrpe-swap",
host_name => "$fqdn",
}
@@nagios_service { "check_all_disks_${hostname}":
use => "remote-nrpe-all-disks",
host_name => "$fqdn",
}
}
}
To use it, I simply do an include nagios in the node definition for my Nagios server in puppet, and in my basenode
node definition, I’ve done an include nagios::target. Each of the @@
lines will collect information for each machine managed by puppet that inherits from basenode
. The “collect resources and populate /etc/nagios/nagios_*.cfg” portion is the real magic, however. Each of those lines will cause puppet to collect all the matching resources, and output them to files in /etc/nagios
. The only real caveat, which I also noticed in the example I built upon, is that I’m having trouble convincing puppet to reload nagios when the files are updated, which I just brute-force solved with a periodic cronjob to run nagios’ init script with “reload”.
I’m also slowly adding nagios entries for each service that puppet manages in some form. Currently, that means things like apache and ssh. For example, in my apache2
module’s init.pp
, I have the following in my class:
@@nagios_service { "check_http_${hostname}":
use => "check-http",
host_name => "$fqdn",
}
@@nagios_service { "check_http_processes_${hostname}":
use => "remote-nrpe-httpd-procs",
host_name => "$fqdn",
}
This both monitors over-the-wire connections to port 80 on webservers, via the
check-http
command, but also monitors the number of httpd processes running on each host, via remote-nrpe-httpd-procs
.
Similarly, for ssh, I have:
@@nagios_service { "check_ssh_${hostname}":
use => "check-ssh",
host_name => "$fqdn",
}
to monitor whether
sshd
is accepting connections on my systems.
And that, basically, is how I’m automatically monitoring all puppet-managed hosts in my environment. Whenever I set up a new host, I activate puppet on the host to ensure configurations I care about are synced to my master templates, and now as a bonus, puppet automatically tells nagios to start monitoring the services it knows about on the host. By expending a little extra effort once now, I’ve managed to be lazy later on multiple times over, truly something a Systems Administrator should be doing!
Pingback: Automatic Monitoring with Puppet and Nagios – Mike's Place | Drakz Free Online Service
Pingback: uberVU - social comments
I’m not managing large setups like this, but this is a definite instant bookmark for me!
Thanks for the great write up.
Pingback: Tweets that mention Automatic Monitoring with Puppet and Nagios – Mike's Place -- Topsy.com
Nice write-up, exactly what I needed to get started with including nagios into my puppet setup. 🙂
Pingback: kb.hurricane-ridge.com / Bookmarks for January 28, 2010 through January 29, 2010
Hi,
You should look at Shinken, it’s a enhanced Nagios reimplementation in Python that allow you to have a quick and easy distributed and high availability monitoring environment, and of course with Nagios configuration and plugins compatibility 🙂
It’s available (Open Source with a AGPL licence) at http://www.shinken-monitoring.org with even a demo virtual machine to test it in 5minutes 🙂
Jean gabès, Shinken developper
Pingback: Delicious Bookmarks for September 1st from 14:01 to 14:18 « Lâmôlabs
Pingback: You've been Stumbled!
Pingback: Reply to comment | Bitfield Consulting
Pingback: Integrating icinga/nagios with puppet » Light at the end of the tunnel
Why use puppet exported resources instead of utilizing an external node classifier, which doesn’t make the puppet config unreadable? It’s also much more flexible.
Pingback: High Scalability - High Scalability - Stuff the Internet Says on Scalability For November 29th, 2010
Pingback: Choosing a monitoring system for a dynamically scaling environment: Nagios v. Zabbix - Server Fault
Pingback: örgl mitten in der nacht! | jaja
Nice post!
For information, we developed a puppet module wrapping naginator stuff: https://github.com/camptocamp/puppet-nagios
It allows some more stuff, like exporting resources for more than one monitoring server, clean up resources when node is destroyed and so on.
Feel free to take/use it, we already use it for three big infrastructures (around 100 srv per infra).
Cheers,
C.
Pingback: Choosing a monitoring system for a dynamically scaling environment: Nagios v. Zabbix - monitoring, nagios, cloud-computing, cloud, zabbix - TechQues.com
Pingback: Being “appropriately lazy” « Mike's Place
Pingback: The Geekery » Bookmarks for July 15th through July 22nd - The Geekery
Pingback: Puppet stored configurations and Icinga « Some Softwaremanagement
Thanks for the post. Maybe I missed it, but I don’t any reference to managing the commands, contacts, contactgroups, escalation, etc. Are you managing those configurations outside of Puppert on the Nagios servers?
Regards
You said:
“The only real caveat, which I also noticed in the example I built upon, is that I’m having trouble convincing puppet to reload nagios when the files are updated, which I just brute-force solved with a periodic cronjob to run nagios’ init script with “reloadâ€.”
I’m sure you already know this after two years, but just in case you don’t: add
notify => Service[‘nagios3’]
to each of your resource imports. E.g., you would do:
Nagios_host <> {
notify => Service[‘nagios3’]
}
This way, when one of your files actually is updated, the nagios3 service will be restarted.
Sigh — some bit of your syntax parser ate the syntax of that import stub. It’s not the @@ bit, it’s the other bit you need, obviously.
Pingback: Juju and Nagios, sittin’ in a tree.. (Part 1) | FewBar.com – Make it good
Hi this is great. But I dont know if this is correct or not. The .cfg files are being populated only on the client ?
how do we get it to populate on the nagios server ? or do i have to manually copy it over
Hi
I’m following your instructions but when nagios3 tries to start I get the following error:
Error: Template ‘nrpe-zombie-procs’ specified in service definition could not be not found (config file ‘/etc/nagios3/conf.d/nagios_host.cfg’, starting on line 169)
Error processing object config files!
I think this happens because this service definition is not defined anywhere.
Did I miss something?
Thank you for your time
Regards
Hi,
It seems that this feature its a little buggy:
https://projects.puppetlabs.com/issues/1180
Best regards
Pingback: Puppet push Nagios | Jackie Chen's IT Workshop
Hi
I’ve followed your tutorial but It doesn’t work for me. I don’t understand at all how its work. A mean, my config is the next:
My layout:
– Puppet master (with puppedb) and Nagios server in the same machine
– Remote server named server-1
On Nagios/Puppet Master:
/etc/puppet/modules/nagios/manifests -> Same to yours
/etc/puppet/manifest/site.pp
node ‘server-1’
{
include nagios::target
}
When I run the command “puppet agent -t” on server-1 I receive the catalog from the server but nothing more. There are not new files in server-1 or puppet master with the nagios config. I have checked the permissions and ownership of every folder and it’s fine.
Do you have any idea of what is happening? Do I need to sett any config in server-1?
Thanks.
Regards
This is great, But when I add windows servers to puppet nagios will create the hostname.cfg, but not include the host{} section which kills nagios. Have you seen this?