o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~
If you have questions and cannot find answers, please join the #logstash irc channel on freenode irc or ask on the [email protected] mailing list.
A tool to collect logs locally in preparation for processing elsewhere!
Problem: logstash jar releases are too fat for constrained systems.
Solution: lumberjack
Make sure you have installed FPM (rubygem) and have outgoing FTP access (ftp.openssl.org).
- compile: make
- rpm package: make rpm
- deb package: make deb
Packages install to /opt/lumberjack. Lumberjack builds all necessary dependencies itself, so there should be no run-time dependencies you need.
Generally: lumberjack.sh --host somehost --port 12345 /var/log/messages
See lumberjack.sh --help
for all the flags
Key points:
- You'll need an ssl ca to verify the server (host) with.
- You can specify custom fields with the '--field foo=bar'. Any number of these may be specified. I use them to set fields like 'type' and other custom attributes relevant to each log.
- Any non-flag argument after is considered a file path. You can watch any number of files.
In logstash, you'll want to use the lumberjack input, something like:
input {
lumberjack {
# The port to listen on
port => 12345
# The paths to your ssl cert and key
ssl_certificate => "path/to/ssl.crt"
ssl_key => "path/to/ssl.key"
# Set this to whatever you want.
type => "somelogs"
}
}
- minimize resource usage where possible (cpu, memory, network)
- secure transmission of logs
- configurable event data
- easy to deploy with minimal moving parts.
Simple inputs only:
- follow files, respect rename/truncation conditions
- stdin, useful for things like 'varnishlog | lumberjack ...'
Below is valid as of 2012/09/19
- sets small resource limits (memory, open files) on start up based on the number of files being watched
- cpu: sleeps when there is nothing to do
- network/cpu: sleeps if there is a network failure
- network: uses zlib for compression
- uses openssl to transport logs. Currently supports verifying the server certificate only (so you know who you are sending to).
- the protocol lumberjack uses supports sending a string:string map
- the lumberjack tool lets you specify arbitrary extra data with
--field name=value
- all dependencies are built at compile-time (openssl, jemalloc, etc) because many os distributions lack these dependencies.
- 'make deb' (or make rpm) will package everything into a single deb (or rpm)
- bin/lumberjack.sh makes sure the dependencies are found when run in production
- re-evaluate globs periodically to look for new log files
- track position of in the log
I would love to not have a custom protocol, but nothing I've found implements what I need, which is: encrypted, trusted, compressed, latency-resilient, and reliable transport of events.
- redis development refuses to accept encryption support, would likely reject compression as well.
- zeromq lacks authentication, encryption, and compression.
- thrift also lacks authentication, encryption, and compression, and also is an RPC framework, not a streaming system.
- websockets don't do authentication or compression, but support encrypted channels with SSL. Websockets also require XORing the entire payload of all messages - wasted energy.
- SPDY is still changing too frequently and is also RPC. Streaming requires custom framing.
- HTTP is RPC and very high over head for small events (uncompressable headers, etc). Streaming requires custom framing.