Short version

Any type which may have managed resources must implement Closeable and close any resources it manages.
- For example, a service client might implement Closeable if it creates managed resources such as an ExecutorService. But, it shouldn't close Closeables passed to it such as an HttpClient.
HttpClient instances, since they are shared, must register with Cleaner and register a shutdown hook.
- This will enable HttpClient to close if garbage collected or if the JVM is shutting down.
Response instance must register with Cleaner so they do not leak connections if they aren't consumed before garbage collection.
Allow for ExecutorService to be configured where asynchronous or parallelized execution is needed.
- If not set, default to SharedExecutorService as this is a central configuration point for users.

Long version

Premise

One of the central guiding factors we have used on the Azure SDK team is "integrate cleanly into customer environments." The purpose behind this core principle is that, for a majority, usage of Azure SDKs is a means to an end, or in other terms the usage of what we ship is meant to aid customers in what they're trying to do. And when we don't integrate cleanly into an environment that can result in friction for the customer, possibly to the point where they circumvent usage of Azure SDKs, and, for us, it results in bugs / configuration scenarios we need to troubleshoot (taking time away from core work).

Historically, most friction points have come from dependencies being used by the Azure SDKs. The most common issue being dependency conflicts we cause within a customer environment, followed closely by dependencies we ship with having CVEs that block customers from using the SDK. clientcore is shipping with a new paradigm of no / limited external dependencies, and azure-core is slowly migrating away from common dependency painpoints of Jackson (and in the future Netty too). This form of customer friction, though, hits hard and fast, where if an issue related to this arises it's pretty obvious in compilation failures or with linkage / type errors that happen quickly into an application run.

Which gets us to the purpose of this document, another friction point for customers, which is application management and resource management. These are friction points that arise due to the design of the Azure SDKs themselves and are a vector of issues which are much, much harder to troubleshoot or even know they're happening until everything goes wrong all at once. Historical examples of this can be seen in our HTTP stacks, where we needed to expose ways for customers to configure the EventLoopGroup in Reactor Netty as the threads spawned to run requests didn't have the correct permissions or giving customers a workaround in OkHTTP as one of their thread pools used didn't use daemon threads which left JVM applications lingering until those threads were reaped due to their time-to-live was exceeded. These forms of integration issues are much harder to troubleshoot as they may take a long time to appear or when they fail they lack richer information as they don't have fully formed stack traces due to being in spawned threads.

Overview

The purpose of this document is to lay forth design guidelines for the Azure SDKs which allows for them to better integrate into customer environments when it pertains to lifecycle management. Throughout this we'll cross over a few different key areas of an Azure SDK, ranging from the HTTP stack to higher-level conveniences such as polling and service clients themselves. By the end of this document the aim is to have fully formed plains on how to design and implement Azure SDKs in a way that allows for them to integrate cleanly into customer environments.

Common Configuration

SharedExecutorService

A recent introduction, into both clientcore and azure-core, was SharedExecutorService which is an SDK defined executor service that is sharable across an entire application. This was introduced to eliminate a previous concept created during the Sync Stack migration where SDKs that had service client API call time outs each SDK defined their own shared executor service. This resulted in numerous executor services being created and used within an application, which could lead to thread pool exhaustion or other issues with the number of threads being managed. By having a single shared executor service, we can ensure that every SDK is using the same thread pool and that the number of threads being managed is consistent across all SDKs.

This concept has limited configurability at this time, only allowing the number of threads, their time-to-live, and whether they're virtual threads, this should be extended for the customer to set their own executor service if they so choose. This richer configuration would allow for customers to fine tune this to their use cases or even use executor services they've already defined within their application. For example, a customer may have a thread pool that constructs threads with tracking information that they use to manage their application or threads with specific permissions that they need to run in their environment.

HTTP Stack

Closable

A common question we've had is "does the HttpClient need to be closed when the application is shutting down?". And if it does, at this time we don't have a way to do this as HttpClient doesn't define Closeable. Even though we offer service providers for HttpClient, which means there are multiple different implementations of HttpClient that could be used, we should have knowledge on whether or not the HttpClient needs to be closed. This is a common pattern in Java, where if a class is managing resources that need to be cleaned up, it should implement Closeable so that it can be closed when it's no longer needed. Given that HttpClient is a class that could manage resources, it should implement Closeable so that it can be closed when it's no longer needed.

Closeable does raise concerns though as we've designed HttpClient to be sharable across an application, and if it's closed it can't be used again. This can be alleviated through two commonly used mechanisms. First being that an HttpClient could be wrapped in a utility class that tracks the number of times it has been shared and the number of close calls it has seen to determine when it should truly be closed. Second being that if an HttpClient is closed, it could be recreated by the SDK when it's needed again. This would allow for the SDK to manage the lifecycle of the HttpClient and ensure that it's being used correctly.

Resource sharing

Another point of concern in the HTTP stack is resource sharing. This is knowing whether a specific HttpClient implementation will define its own resources that need to be cleaned up when it's no longer needed. For example, in both Netty and OkHTTP they'll create thread pools to manage network connections and requests concurrently. And how these thread pools are managed can change behaviors of an application, ex if each new instance in Netty or OkHTTP creates a new thread pool, this could lead to thread pool exhaustion or other issues with the number of threads being managed.

To alleviate this concern, we should explicitly define the values in these scenarios (within reason) and provide a way for the customer to set their own resources if they so choose. Or, if we choose so, tie this all into SharedExectuorManager to offer a single point of configuration for all SDKs (but I'd lean towards making that the default that can be overridden).

Changes here could possibly alleviate the Closeable concern as well, but I believe we'd be better off addressing both concerns separately as they're different in nature.

Cleaner / ReferenceManager

Each HttpResponse should be tracked by a Cleaner / ReferenceManager to ensure that when the response is no longer referenced any networking resources still associated with it are explicitly cleaned up and aren't left lingering. This should help prevent connections from being left open or leaked when the response associated with the connection no longer exists.

Service Clients

Closeable

Following a recent pattern change seen in the JDK, I believe it would be best for service clients to implement Closeable as they may have ties to resources that need clean up after their usage, ex HttpClient. The JDK recently began adding Closeable to classes such as ExecutorService as implementations may have resources such as shared thread pools that should be cleaned up when no longer in use.

alzimmermsft/Lifecycle management.md