Spring Cloud Sleuth and Spring Cloud Zipkin.

Tracing is simple, in theory. As a request flows from one component to another in a system, through ingress and egress points, tracers add logic where possible to perpetuate a unique trace ID that’s generated when the first request is made. As a request arrives at a component along its journey, a new span ID is assigned for that component and added to the trace. A trace represents the whole journey of a request, and a span is each individual hop along the way, each request. Spans may contain tags, or metadata, that can be used to later contextualize the request. Spans typically contain common tags like start timestamps and stop timestamp, though it’s easy to associate semantically relevant tags like an a business entity ID with a span.
Spring Cloud Sleuth (org.springframework.cloud:spring-cloud-starter-sleuth), once added to the CLASSPATH, automatically instruments common communication channels:
  • requests over messaging technologies like Apache Kafka or RabbitMQ (or any other Spring Cloud Stream binder
  • HTTP headers received at Spring MVC controllers
  • requests that pass through a Netflix Zuul microproxy
  • requests made with the RestTemplate, etc.
Spring Cloud Sleuth sets up useful log formatting for you that prints the trace ID and the span ID. Assuming you’re running Spring Cloud Sleuth-enabled code in a microservice whose spring.application.name is my-service-id, you will see something like this in the logging for your microservice:
2016-02-11 17:12:45.404  INFO [my-service-id,73b62c0f90d11e06,73b62c0f90d11e06,false] 85184 --- [nio-8080-exec-1] com.example.MySimpleComponentMakingARequest     : ...
In that example, my-service-id is the spring.application.name73b62c0f90d11e06 is the trace ID and 73b62c0f90d11e06 is the span ID. This information is very useful. You can publish your logs to log analysis and manipulation tools like Elasticsearch and Splunk. There are various ways to get that data there. Logstash, for example, is a log publisher that will write to ElasticSearch. Cloud Foundry automatically aggregates logs from all instances of a service into a unified log through a tool called the Loggregator that can then be forwarded to any Syslog-compatible service, including tools like Splunk or PaperTrail. Whatever approach you take, you can do interesting queries if you have all the logs, and the trace information, in a single place available for query and analysis.
Spring Cloud Sleuth also makes this information available to any Spring Cloud Sleuth-aware Spring application by simply injecting the SpanAccessor. You can also use this to instrument your own components that aren’t already instrumented by Spring Cloud so that they can perpetuate trace information. Naturally, each tracer is going to be different, but Spring Cloud Sleuth’s code itself (e.g.: TraceFeignClientAutoConfiguration) is suggestive of how a typical tracer works:
 private SpanAccessor spanAccessor;

 Span span = this.spanAccessor.getCurrentSpan();
 template.header(Span.TRACE_ID_NAME, Span.toHex(span.getTraceId()));
 setHeader(template, Span.SPAN_NAME_NAME, span.getName() );
 setHeader(template, Span.SPAN_ID_NAME, Span.toHex(span.getSpanId()));

How Much Data is Enough?

Which requests should be traced? Ideally, you’ll want enough data to see trends reflective of live, operational traffic. You don’t want to overwhelm your logging and analysis infrastructure, though. Some organizations may only keep requests for every thousand requests, or every ten, or every million! By default, the threshold is 10%, or .1, though you may override it by specifying a sampling percentage:
spring.sleuth.sampler.percentage = 0.2
Alternatively, you may register your own Sampler bean definition and programmatically make the decision which requests should be sampled. You can make more intelligent choices about which things to trace, for example, by ignoring successful requests, perhaps checking whether some component is in an error state, or really anything else. The Span given as an argument represents the span for the current in-flight request in the larger trace. You can do interesting and request-type specific types of sampling if you’d like. You might decide to only sample requests that have a 500 HTTP status code, for example. The following Sampler, for example, would trace roughly half of all requests:
Sampler customSampler() {
 return new Sampler() {
  public boolean isSampled(Span span) {
   return Math.random() > .5 ;
Make sure to set realistic expectations for your application and infrastructure. It may well be that the usage patterns for your applications require something more sensitive or less sensitive to detect trends and patterns. This is meant to be operational data; most organizations don’t warehouse this data more than a few days or, at the upper bound, a week.

Seeing the Big Picture with OpenZipkin

Data collection is a start but the goal is to understand the data, not just collect it. In order to appreciate the big picture, we need to get beyond individual events.
For this we’ll use the OpenZipkin project. OpenZipkin is the fully open-source version of Zipkin, a project that originated at Twitter in 2010, and is based on the Google Dapper papers.
Previously, the open-source version of Zipkin evolved at a different pace than the version used internally at Twitter. OpenZipkin represents the synchronization of those efforts: OpenZipkin is Zipkin and when we refer to Zipkin in this post, we’re referring to the version reflected in OpenZipkin.
Zipkin provides a REST API that clients talk to directly. Zipkin even supports a Spring Boot-based implementation of this REST API. Using that is as simple as using Zipkin’s @EnableZipkinServer directly. The Zipkin Server delegates writes to the persistence tier via a SpanStore. Presently, there is support for using MySQL or an in-memory SpanStore out-of-the-box. As an alternative to REST, we can also publish messages to the Zipkin server over a Spring Cloud Stream binder like RabbitMQ or Apache Kafka. We’ll use this option, and org.springframework.cloud:spring-cloud-sleuth-zipkin-stream’s @EnableZipkinStreamServer, to adapt incoming Spring Cloud Stream-based Sleuth Spans into Zipkin’s Spans and then persist them using the SpanStore. You may use whatever Spring Cloud Stream binding you like, but in this case we’ll use Spring Cloud Stream RabbitMQ (org.springframework.cloud:spring-cloud-starter-stream-rabbitmq).
package com.example;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.sleuth.zipkin.stream.EnableZipkinStreamServer;

public class ZipkinQueryServiceApplication {

 public static void main(String[] args) {
  SpringApplication.run(ZipkinQueryServiceApplication.class, args);
Specify a port in application.properties to pin the Zipkin server to a well-known port so that the UI may later talk to it:
My example code uses the Spring Cloud Netflix-powered Eureka service registry in eureka-servie for service registration and discovery, so start that up next.
Our microservices (message-client and message-service) are typical Spring Cloud microservices. I only added org.springframework.cloud:spring-cloud-sleuth-stream and the appropriate Spring Cloud Stream binder to have their Sleuth traces published, out-of-band, to Zipkin for analysis.
The Zipkin web UI makes it easy to analyze and query Zipkin data. You can run the Zipkin Web build in my example or just grab the latest from the Zipkin project’s builds and then run it:
java -jar lib/zipkin-web-all.jar -zipkin.web.port=:9412 -zipkin.web.rootUrl=/ -zipkin.web.query.dest=localhost:9411
When the message-service receives a request it sends a reply message back to the message-client over a Spring Cloud Stream binder which the client then accepts and logs with a Spring Integration messaging endpoint. This is a contrived call sequence to demonstrate some of Spring Cloud Sleuth’s power.
Bring up the UI and then find all the recent traces. You can sort by most recent, longest, etc., for finer-grained control over which results you see.

The OpenTracing Initiative
For Spring-based workloads, distributed tracing couldn’t be easier! However, tracing, by its very nature, is a cross-cutting concern for all services no matter which technology stack they’re implemented in. The OpenTracing initiative is an effort to standardize the vocabulary and concepts of modern tracing for multiple languages and platforms. The OpenTracing API has support from multiple very large organizations as its lead one of the original authors on the original Google Dapper paper. The effort defines language bindings; there are already implementations for JavaScript, Python, Go, etc. We will keep Spring Cloud Sleuth conceptually compatible with this effort and will track it. It is expected, but not implied, that the bindings will as often as not have Zipkin as their backend.
Reference: Source https://spring.io/blog/2016/02/15/distributed-tracing-with-spring-cloud-sleuth-and-spring-cloud-zipkin


Popular Posts