More

thedevopsguy · on April 27, 2015

Log analytics is a big topic so I'll hit the main points. The approach you take to logging depends on the analysis you want to do after the log event has been recorded. The value of the logs diminishes rapidly as the age of the events get older. Most places want to keep the logs hot for a period ranging from a day to week. After that,the logs are compressed using gzip or Google snappy compression. Even though they are in a compressed form they should still be searchable.

The most commont logging formats I've come across in production environments are:

1.log4j(java) or nlog(.NET)

2.json

3.syslog

Tools that I've used to search ,visualize and analyse log data have been:

1.Elasticsearch, Logstash and Kibana (ELK) stack

2.splunk (commercial)

3.Logscape (commercial)

Changes to the fields representing your data with the database approach is expensive because you are locked in by the schema. The database schema will never fully represent your full understanding of the data. With the tools I've mentioned above you have the option to extract ad-hoc fields at runtime.

Hope this helps.

rjbwork · on April 27, 2015

We're currently evaluating options, but for .NET Serilog is shaping up extremely nicely, and Seq/Logg.ly as log sinks are nice...

Seq is great because you can set up your own instance very near to your servers for low-latency/high-bandwidth logging, which really changes the game in terms of what you can feasibly (perf/financially) log. It also has some decent visualization options, and it's got some great integrations, with a decent plugin architecture to create your own real-time log processing code.

Logg.ly has some amazing GUI/search options.

redwards510 · on April 27, 2015

We've been using Serilog/Seq and we're extremely happy with it. I'm a little surprised that you didn't mention the buzzword "Structured Logging", which is the special sauce that makes Serilog stand out. Instead of concatenating strings with values, you assign keywords to values which you can later search on. For example,

Log.Info("Customer# {customerNumber} completed transaction {transactionId}", customerNumber, transactionId);

Then using the Seq log viewer you can simply click on "transactionId" in the log line and filter by "transactionId = 456" or whatever. It's one of the most exciting advancements I've seen in the .Net logging world.

EDIT: I realized I didn't really answer OPs question regarding space. If you used Serilog, you can set up different sinks to export to, with different options. For example, you could send all your logs to mongodb, and just recent 1 week rolling logs to the Seq server.

mdaniel · on April 27, 2015

When I've heard "structured logging" used, it has been in the context of much more key-value pairs than just having keywords next to values, e.g.

    Log.Info("customerNum={customerNumber} transactionId={transactionId} state=completed", cN, tID)

or the ever popular logstash-y format:

    Log.Info(LogState.Add("state", "completed").Add("customerId", customerId).Add("transactionId", transactionId));

where `LogState` would build up a key-value dict and its `ToString` would emit the logstash JSON format.

I guess the version that works best depends on the tool that is consuming the log text.

rjbwork · on April 29, 2015

In the end, serilog, depending on the sink, makes your log look like the template, and attaches the meta data of your template variable names and replacement values to the message itself.

Nelkins · on April 27, 2015

Are you able to use Serilog for metrics in addition to application events? I'm thinking something like average time for a method to execute, things like that. And if so, what tools do you use to comb through that data (to determine average execution times, for example).

Right now at work everyone just logs to a single CSV with an inconsistent format and it makes me cringe every time I look at it. It's also really difficult to parse.

redwards510 · on April 27, 2015

I recently used the SerilogMetrics [1] NuGet package to determine the elapsed time between method calls. Worked great, although I couldn't figure out how to use my standard logging config that is carried in the static logging object and had to redefine the seq server I wanted those lines logged to in the class itself. This may have just been unfamiliarity on my part.

Your current way does sound like a headache. If your logging lines are in the standard NLog format, you should be able to drop in serilog without many changes.

[1] https://github.com/serilog-metrics/serilog-metrics

rjbwork · on April 29, 2015

We've evaluated loggly, logentries, splunk, and Seq. The first 3 are fine depending on your logging needs. Seq can handle a TON of events thrown at it, and the latest stuff (~1 day old or so) is extremely accessible. The older stuff takes a little longer to search through though.

smoyer · on April 27, 2015

We're currently using Splunk (and may move to the ELK stack) for logging, but some types of "application events" are really more useful as metrics. We're using Ganglia for those metrics and limiting application logs to actions that are needed for audit purposes and for warning and error-level application problems.

Using a system like Ganglia (or the Etsy inspired statsd) is an important idea since the OP's original question included how to limit the size of logged data. These systems provide a natural way to aggregate data.

berkay · on April 27, 2015

I'd recommend looking at Graylog. It uses Elasticsearch under the hood, surrounds it with an application that focuses on log management specifically. https://www.graylog.org/

toomuchtodo · on April 27, 2015

Graylog is absolutely brilliant. Storing several TB of data in it, using it for alerting/monitoring (you can configure streams, which you can think of as constrained views of logging data), etc. Highly recommend it.

samstave · on April 28, 2015

Is graylog free?

toomuchtodo · on April 28, 2015

thedevopsguy · on Oct 22, 2014

My brain is shutting down and I can't parse this phrase.

[1] "If you lack an antigen that 99 per cent of people in the world are positive for, then your blood is considered rare."

[2] "If you lack one that 99.99 per cent of people are positive for, then you have very rare blood."

Surely the author is saying the same thing here?

DaFranker · on Oct 22, 2014

They're two different official categorizations.

"Rare" = 99 in 100 people have this antigen, while you do not.

"Very Rare" = 9999 in 10000 people have this antigen, while you do not.

That's all it says.

If you lack an antigen, but at least 1 in 100 other people also lack this antigen, then your blood is only of the "Rare" category. If you lack an antigen and you'd have to go through 10 000 other people to find someone else lacking this antigen, your blood qualifies for the illustrious title of "Very Rare".

gatehouse · on Oct 22, 2014

They are just quantifying the terms rare and very rare.

Rare is 1 in 100, very rare is 1 in 10000.

ManFromUranus · on Oct 22, 2014

number 2 has an extra .99 percent, so I guess in point 2, only 0.01 percent of people will have it (whatever IT is), vs 1 percent of people having the thing in point 1.

thedevopsguy · on Oct 22, 2014

Missed the extra 9. Read and re-read. sincerely thought he called two identical categories rare and very rare.

Long day

thanks.

dragonwriter · on Oct 22, 2014

Defining different degrees:

[1] 1 in 100 --> rare

[2] 1 in 10,000 --> very rare

thedevopsguy · on Sept 2, 2014

There is some confusion around the article and it may be because of the way it is written. but here's a brief summary. Hope this helps to clarify:

* The theory/hypothesis is not saying avoid pronouns.

* It's about relative frequencies not absolute.

* The pronoun frequency is looked at in different scenarios:

   1. between two people who don't know each other 

   2. between two people who do know each other 

   3. pronoun frequencies of an individual in a diary, blog over a period of time.

* The frequency of pronouns in spoken or written language is an unconscious activity. It's something that is hard to fake, unlike body language.

* The words being compared/counted are primarily social identifiers vs determiners and articles.

thedevopsguy · on Sept 1, 2014

According to Pennebaker status is negotiated at the beginning of a new interaction when the perceived status of the participants is unclear. The frequency of the filler words mark the status/role that has been adopted by the participants. Which is clearly marked in the email examples given.

The article does not imply that this is a learned skill, since the whole theory is based on function words which are accessible at even the most basic proficiency of a language. It is merely describing what happens to our language when we enter a social interaction.

Note the words used aren't necessarily important but rather the function of the words are, i.e are they referencing social objects ( the role of pronouns) or referencing concrete or abstract non-social objects.

thedevopsguy · on July 30, 2014

Haven't seen flats sell as quickly as they are now. Living a 3 minute walk from Liverpool Street station and the flat I wa renting sold by the second viewing. It was on the market for three days but things are cooling off.

thedevopsguy · on July 30, 2014

When I first starting working with docker I'd use the

  docker inspect $CONTAINER |  grep -i VAR

pattern alot until I discovered that you can do use the container name and do go with:

   docker inspect --format '{{ .NetworkSettings.IPAddress }}' replset1

Service discovery and docker is still a pain point with the technology. Serf [1] and etcd [2] are tools that manages a cluster of services and helps solve the problem described in the article.

[1] Serf is by the guys behind Vagrant. - http://www.serfdom.io/

[2] etcd - http://coreos.com/using-coreos/etcd/

thedevopsguy · on July 1, 2014

Ecstasy and water poisoning are related. Ecstasy makes users extremely thirsty. Water poisoning related deaths at raves have been reported. About 6 L is enough to put a lot of people in a very dangerous place.

thedevopsguy · on June 9, 2014

Using a log shipper is probably the best workaround in this scenario

1.) logrotate at the end of the day 2.) have your logshipper watch docker log folder for each container 3) Log shipper ( or collector) ships files as they are updated to a central server and you are free to archive or delete as you wish.

Many of the central logging systems can detect rolled logs, so this set up is not much of a stretch.

I do agree more configurable logging is a bit of an oversight, especially for something like Docker.

thedevopsguy · on June 8, 2014

Remember this book too. It's a classic text in AI. I think what they were trying to say in the intro is that the approach has a significant effect on how much effort is wasted. The engineering effort required to get a plane of the ground using moving wings is considerably greater than using rotational parts like propellers. Imagine how many decades would have been wasted to scale an orthinopter to handle the load of a passanger jet.

This makes me think that AI research isn't going to be a gradual process of research being built on other research but it will be a eureka or an ah-ha moment that changes everything.

thedevopsguy · on June 6, 2014

Elastic Search make their money from Enterprise support licenses. There are so many ways of installing it , scaling and optimizing depending on application. Don't forget the training around. You have IBs like Goldman Sachs using it for log analytics. So the services around that are needed and real. I don't think they will need an enterprise version.