Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow metrics to be written via python plugin #82

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cdosborn
Copy link

Hey Shawn,

My name is Connor. I am an intern at iPlant Collaborative, we're an NSF funded organization specializing in bio-cyber-infrastructure to democratize access to U.S. supercomputing capabilities. (we help scientists leverage the cloud)

With my monitoring setup, I have a nagios plugin reporting metrics from a hypervisor. As a result the output concerns all the hypervisor's vm's. Instead of returning metrics for a single host, I'm returning stats for multiple hosts. Here's sample output from my plugin:

virt-stats OK - 2 running of 7 on <hypervisor> | <vm 1>=<cpu>;<mem>;<tx>;<rx>; <vm 2> ...

The metrics I would like to send to carbon:

stats.<hypervisor>.<vm 1>.cpu
...
stats.<hypervisor>.<vm 2>.cpu
...

Currently Graphios doesn't permit this flexibility, which is understandable. I need a dynamic graphite prefix.

I have implemented this behavior. The user simply has to write a python module defining get_metrics. The user is responsible for ensuring that the returned path is valid for their given backend. Here's a sample plugin:

def get_metrics(perfdata, nag):
    """ 
        returns a [(<path>, <value>)] where each is the metric to send to carbon
    """

    path = "%s.%s.%s.%s" % (nag.GRAPHITEPREFIX, nag.HOSTNAME, nag.GRAPHITEPOSTFIX, nag.LABEL)
    value = nag.VALUE

    return [ (path, value) ]

The nagios service definition looks like so:

define service {
     ...
     _graphiteplugin /path/to/plugin.py
     ...
}

So far it has been tested with carbon. If this behavior looks like something you want to be a part of graphios, let me know what else you'd like to see.

@shawn-sterling
Copy link
Owner

Hi there. First of all, I think it's pretty badass that your company is letting you work on open source, and second of all I think it's also pretty badass that you are actually doing it. :)

I don't mind this idea, but I think you might run into problems down the road with nagios depending on what you are using to receive data. nrpe and nsca have tiny payload sizes for perfdata by default; you can modify the source and increase to 4096 bytes, but I've run into problems with that as well. If you are using nrdp this shouldn't be a problem.

I think a better way to introduce this would be as a config file option; then the codepath would could be skipped unless it was purposefully enabled.

The loading of the modules will likely need some more exceptions added, specifically IOError and OSError in case the permissions are messed up for the module.

The code at line 359 -> 363 assumes the data returned is in the format you expect, this may cause problems if it isn't.

removing the clearing of whitespace you did on what was line 387 is likely detrimental for most people running this. There is also another merge I'm going to do (parserfix) that will be cleaning up any spaces in the perfdata (windows plugins and their spaces). Not sure what the right solution is here, you may consider something else to separate entries (space may work for you but won't for others).

Lastly grab the python module flake8 if you want travisci to stop complaining. :)

Have a good day.

-Shawn

@cdosborn
Copy link
Author

cdosborn commented Jul 1, 2015

Config is moved out of nagios definitions. (Do you want command line opts for this?)

I added better error handling reporting, 'missing path/missing method/see example'.

In terms of the white-space substitution, I changed it so that if you go the plugin route no substitutions are made. If you're writing a plugin you have complete flexibility (and complete freedom to shoot yourself in the foot).

This can easily be included with other backends. From a cursory look, librato, influxdb would use the same fix as carbon (perhaps a case for base classing some of the backends). I don't have a convenient way to test those systems.

Should be flake8 compatible :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants