Why you should care about structured logging ?

23 March 2020
Cover image
Mostafa Bleu

When people talk about DevOps they usually focus and give details about everything from planning, development, until deployment to production, after that they just stop and say "The Application is automatically deployed in production and auto scaling is in place ... voilà"

That's quite normal since the biggest challenge we were facing in the software industry was spending a lot of time and money on products that never ships to production, so over the years we focused on improving the processes and tooling around agile, continuous integration, unit testing, packaging and deployment which made us get quite good at shipping code fast and getting it out the door.

The next stage growing up in this ecosystem is getting better at doing production monitoring and diagnostics because the failures we hear about now are downtimes and major bugs due to fast code delivery.

The monitoring tools are the evolution of the printf & console.WriteLine that we were using for debugging. Based on how you want to be monitoring the app, they help keep an eye on the errors that appears on your logs and some of them provide ways to trace those errors back to their causes.

What are logs used for?

Logs are most of the time used for two things:

  1. Search the logs in order to Troubleshoot production problems
  2. Process the logs in order to provide analytics or some sort of business intelligence

Logging is a huge subject with many elements that can be addressed, but for simplicity we will consider the following flow that outlines the steps that a log has to go through :

Generate → Transform → Index → Analyze

There are two commons approaches to handle log data through those steps:

  1. You would collect text and deal with it in the wash using hundreds of Regex expressions, trying to figure out how to extract meaning from that text
  2. You would collect JSON and struggle to read that without your eyes bleeding from all the curly braces on the screen

In the next paragraphs I will try to provide you some information about the second approach, and maybe convince you to start using it.

Structured logging

The example bellow is from a simple dot net application that uses log4net as a logging framework.

2019-02-04T12:23:34Z | INFO | Completed in 12ms while consuming (auth:6ms)

This kind of log is focused on producing a narrative of events that tells a story of what the code is doing at each step; and the problem with that is once you start running a lot of services in parallel with a many users interacting with those services, that narrative is not linear anymore and the events that are describing what's happening in the system are all interlinked. Which makes it really hard to take all of that log and get some sort of meaning out of it.

To improve the utility of this log, we should add some context:

2019-02-04T12:23:34Z | INFO | 34.235.155.83 | 0005 | Completed  in 12ms while consuming (auth:6ms)

Better, but you can start to see the problems with this direction:

  1. This format is arbitrary. Custom parsing is required to extract the attributes.
  2. There's no guarantee this format won't change, breaking downstream consumption.
  3. You have now a strong coupling between the regex used for parsing and the existing logs
  4. Your parser (File beat agent for instance) needs more CPU

Wouldn't it be nice if this log event was represented as a simple JSON document?

{
  "dt": "2019-02-04T12:23:34Z",
  "level": "info",
  "message": "Completed 200 OK in 12ms (auth:6ms)",
  "context": {
    "host": "34.235.155.83",
    "user_id": "0005"
  },
  "response": {
    "status": 200,
    "duration_ms": 12.0,
    "auth_ms": 6.3
  }
}

With structured logs, you are not thinking about the narrative of messages, but instead you are thinking about a stream of events that have properties and those properties are what you are going to use in order to go back to some kind of failure to trace all the related events that happened on the system.

Structured logs makes it easy for tools like Elastic stack to index on particular fields, search and even generate analytics without doing any additional processing to extract data.

Use a different format when you are debugging

To avoid having your eyes bleeding from all the curly braces on the console when you are debugging, add a different configuration that is turned on only in debug mode, or only when a console view is attached to the process.

Most logging frameworks nowadays provide a way to choose the format based on the target, for example you can view text logs on the console and send JSON logs to some log server like Logstash or even a database

Conclusion

Many organizations skip structuring logs because logging is not a top priority which results in logs that are simply not cost effective, generating more noise than anything else. Being able to diagnose problems in production is very important and for that structured logging is a great way to being able to sort and search through the logs.

Further readings