Skip to content

Instantly share code, notes, and snippets.

@lmolkova
Last active February 2, 2025 18:03
Show Gist options
  • Save lmolkova/92fc0e69cee6b57bd6e2d4a74d750384 to your computer and use it in GitHub Desktop.
Save lmolkova/92fc0e69cee6b57bd6e2d4a74d750384 to your computer and use it in GitHub Desktop.
ClienCore Observability

Observability in clientcore

Goals:

  • no required dependencies
  • out-of-the-box log correlation
  • structured logging events with documented semantics
  • optional out-of-the-box support for OTel (when user app has it)
  • enough extensibility to support alternative telemetry libs in the future
  • HTTP call duration metric (counts, failure rate, percentiles are deriveable)
  • Convenience API for generic instrumentation

Key changes

API View

  • logging has moved from io.clientcore.core.util to io.clientcore.core.instrumentation.logging
  • HttpLoggingPolicy is now HttpInstrumentationPolicy and does tracing and logs. Will do metrics too
  • HttpLogOptions is now HttpInstrumentationOptions. It has config options to enable/disable/redact traces and HTTP logs.
  • HttpLogDetailLevel changes:
    • NONE is the default, but when HTTP logging is off, we still record HTTP spans (which go to OTel or to logs)
    • BASIC level no longer exists
    • HEADERS is now the lowest level
    • BODY, BODY_AND_HEADERS are still there

Champion scenarios: user app

Default experience

App does not have OTel and may have slf4j configured, user creates a client makes a call

SampleClient client = new SampleClientBuilder().build();
client.clientCall();

I'd result in the following telemetry:

  • logical operation span for clientCall (depends on codegen for tracing code)
  • HTTP spans printed as debug logs
  • HTTP logs (if enabled)
  • Metrics are not supported in fallback instrumentation
See examples (expand me)

Logs are formatted for readability.

2025-01-14 15:15:24.276 [main] [DEBUG] core.tracing -
{
  "library.version": "1.0.0-beta.2",
  "library.name": "core",
  "server.address": "example.com",
  "span.name": "GET",
  "trace.id": "4ba2cc6cb90b3d38745eb4fb87be9e2f",
  "span.duration": 191.9254,
  "http.request.method": "GET",
  "span.kind": "CLIENT",
  "server.port": 443,
  "span.parent.id": "7fccad8e33c3b3ab",
  "span.id": "15f216b2b76aea55",
  "url.full": "https://example.com",
  "http.response.status_code": 200,
  "event.name": "span.ended"
}

2025-01-14 15:15:24.276 [main] [DEBUG] sample.tracing -
{
  "library.version": null,
  "library.name": "sample",
  "span.name": "clientCall",
  "trace.id": "4ba2cc6cb90b3d38745eb4fb87be9e2f",
  "span.duration": 193.4819,
  "span.kind": "CLIENT",
  "span.id": "7fccad8e33c3b3ab",
  "event.name": "span.ended"
}

Default experience: I want to provide correlation-id

We have a notion of InstrumentationContext - users can provide their custom Ids compatible with W3C Trace-Context.

See Next steps for arbitrary correlation support.

// default InstrumentationContext implementation is not exposed - custom impl is needed
InstrumentationContext context = MyInstrumentationContext.createRandomContext();
System.out.println("My correlation id: " + context.getTraceId());

RequestOptions options = new RequestOptions().setInstrumentationContext(context);
client.clientCall(options);

// we can simplify (provide impl) this eventually
class MyInstrumentationContext implements InstrumentationContext {
    private final String traceId;
    private final String spanId;
    private final String traceFlags;
    private final boolean isValid;

    public static MyInstrumentationContext createRandomContext() {
        ...
    }

    public MyInstrumentationContext(String traceId, String spanId, String traceFlags, boolean isValid) {
        ...
    }

    @Override
    public String getTraceId() { return traceId; }

    @Override
    public String getSpanId() { return spanId; }

    @Override
    public String getTraceFlags() { return traceFlags; }

    @Override
    public boolean isValid() { return isValid; }

    @Override
    public Span getSpan() { return Span.noop(); }
}
Outcome (expand me):
My correlation id: a35466d8980645aeaf2cbfecc706375b

2025-01-14 15:33:09.947 [main] [DEBUG] core.tracing -
{
  "library.version": "1.0.0-beta.2",
  "library.name": "core",
  "server.address": "example.com",
  "span.name": "GET",
  "trace.id": "a35466d8980645aeaf2cbfecc706375b",
  "span.duration": 171.1911,
  "http.request.method": "GET",
  "span.kind": "CLIENT",
  "server.port": 443,
  "span.parent.id": "0788f0267309834e",
  "span.id": "57f6c4b1971b07d4",
  "url.full": "https://example.com",
  "http.response.status_code": 200,
  "event.name": "span.ended"
}

2025-01-14 15:33:09.947 [main] [DEBUG] sample.tracing -
{
  "library.version": null,
  "library.name": "sample",
  "span.name": "clientCall",
  "trace.id": "a35466d8980645aeaf2cbfecc706375b",
  "span.duration": 172.7876,
  "span.kind": "CLIENT",
  "span.parent.id": "c909162b7d834f99",
  "span.id": "0788f0267309834e",
  "event.name": "span.ended"
}

Default experience: add HTTP logs

HttpInstrumentationOptions options = new HttpInstrumentationOptions()
        ..setHttpLogLevel(HttpInstrumentationOptions.HttpLogDetailLevel.BODY_AND_HEADERS);
SampleClient client = new SampleClientBuilder()
        .instrumentationOptions(options)
        .build();

client.clientCall();
Outcome (expand me):
2025-01-14 15:39:31.945 [main] [DEBUG] core.tracing -
{
  "library.version": "1.0.0-beta.2",
  "library.name": "core",
  "server.address": "example.com",
  "span.name": "GET",
  "trace.id": "575cb29222f93b8e52882ae38fc3eebe",
  "span.duration": 175.71,
  "http.request.method": "GET",
  "span.kind": "CLIENT",
  "server.port": 443,
  "span.parent.id": "35653b32412b36c4",
  "span.id": "075d6d0d6c9baf13",
  "url.full": "https://example.com",
  "http.response.status_code": 200,
  "event.name": "span.ended"
}

2025-01-14 15:39:31.946 [main] [INFO] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy -
{
  "date": "Tue, 14 Jan 2025 23:39:31 GMT",
  "http.request.time_to_response": 173.0366,
  "http.response.body.size": 1256,
  "content-length": "1256",
  "server": "ECAcc (sed/58D3)",
  "expires": "Tue, 21 Jan 2025 23:39:31 GMT",
  "http.request.duration": 176.3867,
  "span.id": "075d6d0d6c9baf13",
  "last-modified": "Thu, 17 Oct 2019 07:18:26 GMT",
  "trace.id": "575cb29222f93b8e52882ae38fc3eebe",
  "http.request.method": "GET",
  "http.request.body.content": "<!doctype html>\n<html>...</html>\n",
  "http.request.body.size": 0,
  "content-type": "text/html; charset=UTF-8",
  "etag": "\"3147526947+ident\"",
  "http.request.resend_count": 0,
  "cache-control": "max-age=604800",
  "url.full": "https://example.com",
  "http.response.status_code": 200,
  "event.name": "http.response"
}

2025-01-14 15:39:31.946 [main] [DEBUG] sample.tracing -
{
  "library.version": null,
  "library.name": "sample",
  "span.name": "clientCall",
  "trace.id": "575cb29222f93b8e52882ae38fc3eebe",
  "span.duration": 177.7804,
  "span.kind": "CLIENT",
  "span.id": "35653b32412b36c4",
  "event.name": "span.ended"
}

Experience with OTel + logback + HTTP logs

Let's add otel and logback to the classpath and check out how it'd look like:

  • spans are sent to OTel
  • logs are sent to OTel and to configured logback appenders
HttpInstrumentationOptions options = new HttpInstrumentationOptions()
        .setHttpLogLevel(HttpInstrumentationOptions.HttpLogDetailLevel.HEADERS)
        .setRedactedHeaderNamesLoggingEnabled(false);
SampleClient client = new SampleClientBuilder()
        .instrumentationOptions(options)
        .build();

client.clientCall();

I'd result in the following telemetry:

  • logical operation span for clientCall (depends on codegen for tracing code)
  • HTTP spans printed as debug logs
  • HTTP request duration metric (users can derive request rate, failure rate, latency percentiles, etc)
  • HTTP logs (if enabled)
image image

Champion scenarios: client lib

class SampleClient {
    private final static LibraryInstrumentationOptions LIBRARY_OPTIONS = new LibraryInstrumentationOptions("contoso-sample");
    private final static String SAMPLE_CLIENT_DURATION_METRIC = "contoso.sample.client.operation.duration";
    private final HttpPipeline httpPipeline;
    private final OperationInstrumentation downloadContentInstrumentation;
    private final OperationInstrumentation createInstrumentation;
    private final URI endpoint;

    SampleClient(InstrumentationOptions instrumentationOptions, HttpPipeline httpPipeline, URI endpoint) {
        this.httpPipeline = httpPipeline;
        this.endpoint = endpoint;
        Instrumentation instrumentation = Instrumentation.create(instrumentationOptions, LIBRARY_OPTIONS);

        // BEGIN: io.clientcore.core.telemetry.instrumentation.create
        InstrumentedOperationDetails downloadDetails = new InstrumentedOperationDetails(SAMPLE_CLIENT_DURATION_METRIC,
            "downloadContent").endpoint(endpoint);
        this.downloadContentInstrumentation = instrumentation.createOperationInstrumentation(downloadDetails);
        // END: io.clientcore.core.telemetry.instrumentation.create

        this.createInstrumentation = instrumentation.createOperationInstrumentation(
            new InstrumentedOperationDetails(SAMPLE_CLIENT_DURATION_METRIC, "create")
                .endpoint(endpoint));
    }

    public Response<?> downloadContent() {
        return this.downloadContent(null);
    }

    public Response<?> downloadContent(RequestOptions options) {
        // BEGIN: io.clientcore.core.telemetry.instrumentation.shouldinstrument
        if (!downloadContentInstrumentation.shouldInstrument(options)) {
            return downloadImpl(options);
        }
        // END: io.clientcore.core.telemetry.instrumentation.shouldinstrument

        if (options == null || options == RequestOptions.none()) {
            options = new RequestOptions();
        }

        // BEGIN: io.clientcore.core.telemetry.instrumentation.startscope
        OperationInstrumentation.Scope scope = downloadContentInstrumentation.startScope(options);
        try {
            return downloadImpl(options);
        } catch (RuntimeException t) {
            scope.setError(t);
            throw t;
        } finally {
            scope.close();
        }

        // END: io.clientcore.core.telemetry.instrumentation.startscope
    }

    public Response<?> create(RequestOptions options) {
        if (!createInstrumentation.shouldInstrument(options)) {
            return httpPipeline.send(new HttpRequest(HttpMethod.POST, endpoint));
        }

        if (options == null || options == RequestOptions.none()) {
            options = new RequestOptions();
        }

        OperationInstrumentation.Scope scope = createInstrumentation.startScope(options);
        try {
            return httpPipeline.send(new HttpRequest(HttpMethod.POST, endpoint));
        } catch (RuntimeException t) {
            scope.setError(t);
            throw t;
        } finally {
            scope.close();
        }
    }

    private Response<?> downloadImpl(RequestOptions options) {
        return httpPipeline.send(new HttpRequest(HttpMethod.GET, endpoint));
    }
}

Next steps

Short term:

  • Bug: Make it work with OTel javaagent - OTel is relocated to io.opentelemetry.javaagent.shaded.io.opentelemetry.api.*
  • Bug: Suppress HTTP spans from OTel agent
  • [Done] End of Jan: Simplify library code that'd do traces + metrics together
  • [In-progress]: generic instrumentation convenience API
  • Mid Feb: Instrumentation codegen
  • End of Feb: benchmark and optimize further:
    • attribute key cache
    • attribute collection cache (for http metrics)

Longer term:

  • Allow adding custom properties (explicit MDC)

    InstrumentationContext context = InstrumentationContext.fromMap(Map.of("correlation-id", "foo42"));
    client.clientCall(new RequestOptions().setInstrumentationContext(context));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment