As a software practice, observability is simply having the ability to determine the internal health of various components of your solution - just by examining the externally exposed states.
In the last few years, observability giants such as Newrelic, Datadog and evolving standards such as opentelemetry have led the way in capturing the exposed states in a solution, starting from the code traces to infrastructure metrics. All these tools provide ways to instrument (how to generate), a time series database (how to collect), and a visualization platform (how to observe).
In any organization, the Ops team implements a tool like this and subsequently, dev teams configure their dashboards, alerts and debugging mechanisms.
However, some fundamental issues remain. There are no guarantees on:
From our conversations with partners and customers, we see that the above issues manifest in the following ways.
We at Facets use the term "by design" to indicate that you can truly guarantee certain outcomes in the SDLC pipeline. E.g., if you have promoted a build that will appear in the production environment within a time frame or raise alerts if it doesn't.
In the context of observability by design, it means two things.
First, realizing and treating observability components as "artifacts"
An observability artifact is like a release artifact which can be deployed.
Then, tightly couple these observability artifacts with each phase of your pipeline so they are discovered as well as deployed.
Let's track what changes you need to make in your SDLC stages for observability by design:
While planning features, we should define the relevant SLOs and SLAs.Product stakeholders should also define the Business Metrics that can track the new feature's adoption, usage, and performance. Technical Leads can then define the metrics that track the feature at a finer granularity, for instance, API or database level metrics.
The Open Metrics project introduces "metric discovery" where sources that produce metrics merely expose these in a standard format, and collectors "discover" them. Any packaging mechanism for an application should include the metadata on how the metrics for the application can be discovered. For e.g., if you package your application as helm charts, they must include a ServiceMonitor or support Prometheus scrape annotations for the end user to configure metric discovery.
Visualizations and alerts should be bundled with each component rather than configured for later. For example, you can ship Grafana dashboards as config-maps and alert definitions as PrometheusRules bundled in your helm chart. Popular tools like Newrelic have started addressing this need for dashboards and alerts as code.
CI ensures code quality, and it should also guarantee observability. At this phase, teams ensure that the defined metrics, alerts, and dashboards adhere to benchmarks and standards. It is relatively straightforward to draw up standards for metrics from sources of the same kind. For example, one could enforce that every GRPC application must expose a pre-defined set of metrics.
Provisioning of all the metrics, dashboards, and alerts should be centralized and consistent across all environments. Any new feature development most probably requires a change to these defined metrics, so this process is, by definition, continuous.
All the metrics, dashboards, and alerts that have been set up need to be monitored continuously. Monitoring is required to verify that all the benchmarks and thresholds we set are being met and, indeed, if we are capturing the correct information. Armed with this, stakeholders can feedback the learnings from incidents into the Planning and Development phases to achieve Continuous Feedback.
In summary, we must ensure that observability components are treated as artifacts. Once you devise a mechanism to ship these artifacts rather than configuring them, you will see a steady reduction in recurring mistakes. You can even go further and creatively enforce quality and governance programmatically around these artifacts in your delivery pipeline.
Contact us if you want to know more about how Facets can help introduce observability in your SDLC.
Consult our experts for your DevOps needs by booking a demo