Tuesday 5 July 2022

The Business Debugger: Beer and Observability Escape IT

I was a guest at .conf, Splunk’s annual event for users and analysts. It was both in person and on line. I was one of the several thousand people there in person, about which I will also be writing.


Observability was a big topic, as was Splunk’s journey from log analysis tool to extensible platform.

I find observability quite fascinating. Enterprise IT systems are now so complex and have grown organically for so many years that we no longer know what is happening. They work, most of the time, for some definition of work, however when they break we are often at a loss to know why. Observability is the ability to look into the vast, sprawling patchwork. But this really is just the first step into a new and really exciting business role: observability is escaping from IT into the broader business.

Part of the .conf22 keynote was how Heineken is using Spunk to figure out where lost things go. I’m not sure if that includes socks in the washer, but it definitely included tracking down missing invoices as well as understanding why palettes of beer don’t always end up where they should be. They have used Splunk’s combination of data ingestion, analysis and extensibility to build XOMI (pronounced “show me”) that offers up a comprehensive dashboard of where all sorts of things are across the business, including those missing invoices and the vanishing palettes of beer.

They key point here isn’t really knowing where those lost items happen to be, it’s more about working out which wormholes they managed to fall through to get there. This is the magic, and as a software engineer it suddenly felt very familiar to me. Observability provides the same view point into the state of the overall machine of the business as a debugger does when showing the contents of variables at a breakpoint.

It is axiomatic that once you know what is actually happening it is much easier to fix a problem. Some of the missing invoices were due to software allowing dates to be entered into what should have been purely numeric fields, resulting in the invoice being silently rejected by a more fastidious downstream process. Identifying that missing data entry validation was the root cause would have been an epic task without full observability across the network.

Further, given Splunk’s collection of integrations and ease of adding new connections, it is possible to directly open the appropriate piece of software to fix the problem, at the problem location. This is just like a traditional debugger firing up the IDE from a crash, at the line of code that has failed — but at the macro network level. Incredible power.

This is why observability forms what I think we should call the


. Stuff goes wrong all over businesses, from exceptions constantly firing silently into logs to disappearing product, the combination of data ingestion, analysis and extensibility now provides a powerful tool for sorting it all out. The pandemic and recent supply chain issues has finally put business resilience back on the agenda, and this ability to identify and rapidly correct problems, provides a hugely important tool to build that resilience.

No comments:

Post a Comment