Skip to content

Latest commit

 

History

History
126 lines (86 loc) · 3.97 KB

README.md

File metadata and controls

126 lines (86 loc) · 3.97 KB

lumberjack

o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~

QUESTIONS?

If you have questions and cannot find answers, please join the #logstash irc channel on freenode irc or ask on the [email protected] mailing list.

What is this?

A tool to collect logs locally in preparation for processing elsewhere!

Problem: logstash jar releases are too fat for constrained systems.

Solution: lumberjack

Building it

Make sure you have installed FPM (rubygem) and have outgoing FTP access (ftp.openssl.org).

  • compile: make
  • rpm package: make rpm
  • deb package: make deb

Packages install to /opt/lumberjack. Lumberjack builds all necessary dependencies itself, so there should be no run-time dependencies you need.

Running it

Generally: lumberjack.sh --host somehost --port 12345 /var/log/messages

See lumberjack.sh --help for all the flags

Key points:

  • You'll need an ssl ca to verify the server (host) with.
  • You can specify custom fields with the '--field foo=bar'. Any number of these may be specified. I use them to set fields like 'type' and other custom attributes relevant to each log.
  • Any non-flag argument after is considered a file path. You can watch any number of files.

Use with logstash

In logstash, you'll want to use the lumberjack input, something like:

input {
  lumberjack {
    # The port to listen on
    port => 12345

    # The paths to your ssl cert and key
    ssl_certificate => "path/to/ssl.crt"
    ssl_key => "path/to/ssl.key"

    # Set this to whatever you want.
    type => "somelogs"
  }
}

Goals

  • minimize resource usage where possible (cpu, memory, network)
  • secure transmission of logs
  • configurable event data
  • easy to deploy with minimal moving parts.

Simple inputs only:

  • follow files, respect rename/truncation conditions
  • stdin, useful for things like 'varnishlog | lumberjack ...'

Implementation details

Below is valid as of 2012/09/19

Minimize resource usage

  • sets small resource limits (memory, open files) on start up based on the number of files being watched
  • cpu: sleeps when there is nothing to do
  • network/cpu: sleeps if there is a network failure
  • network: uses zlib for compression

secure transmission

  • uses openssl to transport logs. Currently supports verifying the server certificate only (so you know who you are sending to).

configurable event data

  • the protocol lumberjack uses supports sending a string:string map
  • the lumberjack tool lets you specify arbitrary extra data with --field name=value

easy deployment

  • all dependencies are built at compile-time (openssl, jemalloc, etc) because many os distributions lack these dependencies.
  • 'make deb' (or make rpm) will package everything into a single deb (or rpm)
  • bin/lumberjack.sh makes sure the dependencies are found when run in production

future functional features

  • re-evaluate globs periodically to look for new log files
  • track position of in the log

future protocol discussion

I would love to not have a custom protocol, but nothing I've found implements what I need, which is: encrypted, trusted, compressed, latency-resilient, and reliable transport of events.

  • redis development refuses to accept encryption support, would likely reject compression as well.
  • zeromq lacks authentication, encryption, and compression.
  • thrift also lacks authentication, encryption, and compression, and also is an RPC framework, not a streaming system.
  • websockets don't do authentication or compression, but support encrypted channels with SSL. Websockets also require XORing the entire payload of all messages - wasted energy.
  • SPDY is still changing too frequently and is also RPC. Streaming requires custom framing.
  • HTTP is RPC and very high over head for small events (uncompressable headers, etc). Streaming requires custom framing.