Operationally speaking the single most important thing you should be doing is co...

chetanahuja · on April 8, 2016

PacketZoom founder here. Glad you liked the project. Could not agree more with the importance of tracking logs (and metrics... but that's a topic for another post).

To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc. We were in a situation where our production code was fighting for resources against a log collecting facility.

In general, it's best to process the data (to the extent possible) closest to it's point of origin. It's orders of magnitude cheaper to create a well structured log line straight from your production code (where it's just some in-memory manipulation of freshly created strings) rather than in a post-processing step inside a separate process (or machine).

I've spent years dealing with performance problems in global scale production stacks and a surprisingly high number of resource bottlenecks (memory/CPU/Disk IO) etc. are caused by ignoring this simple principle.

I've lost count of the cases where a simple restructuring of the architecture to avoid a marshal/unmarshal step drastically cuts down resource requirement and operational headaches. Unfortunately a whole lot of industry "best practices" (exemplified by the Grok step in Logstash) encourage the opposite behavior.

ktamura · on April 8, 2016

>To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc.

I think there are two different (CPU) performance problems conflated into one:

(1) The cost of parsing logs with something like Grok and Regexp

(2) The cost of marshaling and unmarshaling data

While both do cost CPU time, based on my experience having talked to literally hundreds of Fluentd users (I'm a maintainer and was a core support member for awhile), the cost of (1) dwarfs the cost of (2). (2) is pretty cheap if you use efficient serializers like MessagePack. As for (1), both Logstash and Fluentd support an option to perform zero parsing (In Fluentd, it's "format none"). By using these options, you can bring down CPU time significantly.

All of this being said, it looks like LogZoom isn't a true competitor to Fluentd or Logstash or Heka. It made different performance/functionality trade-offs and by doing less, it saves more CPU time: If you forgo the option of parsing logs at source (and in Logstash and Fluentd's defense, they do a whole lot more), you obviously can save resources. On the flip side, you need to post-process your logs to make them useful, and some other servers downstream will pay for CPU (You might not care about this because your logs have been thrown over the fense and now it's data engineers's job =p)

jsmeaton · on April 8, 2016

I think you make a good point that logs should be transformed closer to the source. I work, primarily, with applications provided by a vendor, with very unstructured log data. Transforming (Grok) these logs is an absolute must, we couldn't look at something that didn't allow transformation. That said, maybe we should be looking at something closer to the source before handing it off to a central location. Are you aware of agent-like daemons that do transformation before handoff?

seanp2k2 · on April 8, 2016

Structured logs are awesome and a great idea. For the next few decades while standards come and go and everyone gets it all implemented across the board, yes it sucks to write grok patterns for the flavor of the week, but once you do it a few times, it takes maybe a few hours of work to get some app cluster with moderately logging flowing into ES with all the right types and all the edge cases accounted for. From there, ELK is such a Swiss Army knife that it's worth the trouble, since then it's e.g. trivial to fire PagerDuty alerts off if you hit some exception-level log lines, or post metrics about your logs, or put them on some queue to flow into some big data pipeline thing.

b0ti · on April 8, 2016

You might want to consider NXLog if you need to do transformation at the source. For us this was an explicit design goal. Moreover it is also lightweight and a lot of people use it in place of other fat and bulky solutions, quite popular with ELK users.

nathwill · on April 8, 2016

We use heka and love it