All of this came to mind reading through python code which passed a logger through the application. I mentally contrasted this pattern with a golang logger I created. This golang logger had an exported package level variable which was initialized as a logger without any functions being called. Said logger could be accessed directly from an import anywhere in the application.
As a result of this consideration, I wrote this document to get out my thoughts loggers wired through vs globally accessible via import.
I am about to add a metrics client to a reoccurring python job. This job already passes the logger from main down through all functions. I am considering passing the metrics client from main down through all functions as well. To match the logger pattern.
I would like input on more pros / cons between wired through loggers/metrics and globally accessible loggers/metrics from imports.
Generally in applications, we want to avoid global state. As global state can be accessed from anywhere, this means that modifying global state can impact the surface area of the entire application. Additionally, because this state can be accessed anywhere, this means that it is not clear what part of the application require any particular state.
The state does not have to be global, it could just be a lot of state which is passed around to a large portion of the application.
This creates coupling over a large percentage of the application, and the cohesion is low.
An example could be a web application which has multiple handlers for different api endpoints. These handlers could require calling other backend services to satisfy an incoming request.
You could have a large state ball, which has access to all clients of services, as well as all backing datastores. This large state ball could then be passed to all handlers.
Note that all code examples in this document are psudocode. The psudocode emits some best practices for brevity.
from conf import get_conf
import server
class Clients:
def __init__(self, conf):
self.notification = NotificationClient(conf.notification_url)
self.user = UserClient(conf.user_url)
self.postgres = PostgresClient(conf.postgres_creds)
class ActionHandler:
def __init_(self, clients):
self.clients = clients
def do_action(self, action, user_id):
user = self.clients.user.get_user(user_id)
result = self.clients.notification.send(user_id, action)
self.clients.postgres.insert_notification_result(result)
if __name__ == "__main__":
conf = get_conf()
clients = Clients(conf)
server.add_handler('/action/' ActionHandler(clients))
server.start()
Or, arguably worse, clients shard global based on imports.
# --- clients.py ---
from conf import get_conf
clients = None
class Clients:
def __init__(self, conf):
self.notification = NotificationClient(conf.notification_url)
self.user = UserClient(conf.users_url)
self.postgres = PostgresClient(conf.postgres_creds)
def init_clients():
global clients
if not clients:
clients = Clients(get_conf())
# --- main.py ---
from clients import clients, init_clients
import server
class ActionHandler:
def __init__(self):
self.clients = clients
def do_action(self, action, user_id):
user = self.clients.user.get_user(user_id)
result = self.clients.notification.send(user_id, action)
self.clients.postgres.insert_notification_result(result)
if __name__ == "__main__":
init_clients()
server.add_handler('/action/', ActionHandler())
server.start()
A better approach might be:
from conf import get_conf
import server
class ActionHandler:
def __init__(self, user_client, notification_client, postgres_client):
self.user = user_client
self.notification = notification_client
self.postgres = postgres_client
def do_action(self, action, user_id):
user = self.user.get_user(user_id)
result = self.notification.send(user_id, action)
self.postgres.insert_notification_result(result)
if __name__ == "__main__":
conf = get_conf()
notification = NotificationClient(conf.notification_url)
user = UserClient(conf.users_url)
postgres = PostgresClient(conf.postgres_creds)
action_handler = ActionHandler(user, notification, postgres)
server.add_handler('/action/', action_handler)
server.start()
The issue above really starts to come about when you have many handlers, all of which require different backing clients.
While shown above, the big ball of state for service clients and datastore clients can be a mess.
Though, for loggers and metrics, it seems to me that in most applications global loggers and metrics seems acceptable. As you expect that anywhere in the application you may want to log or emit a metric. Sure, you may want to pass loggers/metrics to some functions to retain tags and such. Though it seems acceptable to import and call the global logger/metrics with the default tags from anywhere in the application.
We could wire the first initialized logger/metrics through to all classes / functions.
from logger import get_logger
from metrics import get_metrics
from conf import get_conf
import server
class ActionHandler:
def __init__(self, logger, metrics, user_client, notification_client, postgres_client):
self.logger = logger
self.metrics = metrics
self.user = user_client
self.notification = notification_client
self.postgres = postgres_client
def do_action(self, action, user_id):
self.logger.log("doing action")
user = self.user.get_user(user_id)
result = self.notification.send(user_id, action)
self.postgres.insert_notification_result(result)
self.metrics.increment("action.do", ("action", action))
if __name__ == "__main__":
logger = get_logger()
metrics = get_metrics()
conf = get_conf()
notification = NotificationClient(logger, metrics, conf.notification_url)
user = UserClient(logger, metrics, conf.users_url)
postgres = PostgresClient(logger, metrics, conf.postgres_creds)
action_handler = ActionHandler(logger, metrics, user, notification, postgres)
server.add_handler('/action/', action_handler)
server.start()
Or we could use a global logger / metrics import.
from logger import logger, init_logger
from metrics import metrics, init_metrics
from conf import get_conf
import server
class ActionHandler:
def __init__(self, logger, metrics, user_client, notification_client, postgres_client):
self.user = user_client
self.notification = notification_client
self.postgres = postgres_client
def do_action(self, action, user_id):
logger.log("doing action")
user = self.user.get_user(user_id)
result = self.notification.send(user_id, action)
self.postgres.insert_notification_result(result)
metrics.increment("action.do", ("action", action))
if __name__ == "__main__":
init_logger()
init_metrics()
conf = get_conf()
notification = NotificationClient(conf.notification_url)
user = UserClient(conf.users_url)
postgres = PostgresClient(conf.postgres_creds)
action_handler = ActionHandler(user, notification, postgres)
server.add_handler('/action/', action_handler)
server.start()
Upon writing this, wiring these loggers through does not seem that bad, as the Handlers maintain the logger in its state and as such you do not need to wire it to every function.
However, having the logger and metrics to initialize each handlers adds a bit of visual noise / stutter. Which distracts a bit from the useful patter of passing in only needed clients. Again, this will only get more visible the more handlers that you have.
Pros
- Known initialization
- All usages can be traced from main
Cons
- Visual noise / stutter which can obfuscate the unique parameters for a function
Pros
- Accessible without wiring through.
Cons
- Initialization is questionable
- Cannot trace usages purely through code path
Wire through metrics and global metrics from import have the same pros / cons as the loggers pro/cons above. Except that the visualize noise / stutter gets nonlinearly worse.
Again, I write this document receive input how to add metrics to an application, wiring through vs a globally accessible via import.
We chatted about this, and here's a summary of what I had to say in support of "thread everything through":