The Log Pipeline You Can't Have
I've been working on the logging system for some of our products and it brought an article back to mind that I read years ago: Avery Pennarun's The log/event processing pipeline you can't have.
At Pennarun's work all their application logs were written directly to the
kernel ring buffer via /dev/kmsg so they survive crashes. A log uploader
ships them upstream as plain text files. Only then do you parse, structure, and
process them. A few of our services use write-ahead logs to prevent dataloss.
In a world of containers - apart from the single log file to be shipped - we
pretty much have Pennarun's tooling. In Kubernetes we have /var/log/pods,
Docker is /var/lib/docker/containers/. For our simpler projects, we just ship
a single static binary, no containers needed.
For systemd services you get Pennarun's pipeline with almost no effort. Your
process writes to stdout, journald captures it to a durable on-disk store, and
a shipper like journalbeat or promtail forwards the raw entries to a
central system where they are parsed and structured. The only thing you lose
compared to /dev/kmsg is kernel-panic survivability — journald is userspace —
but for typical server workloads that rarely matters.
As always, I appreciate any feedback or if you want to reach out, I'm
@neuralsandwich on twitter and most other places.