Skip to content

Instantly share code, notes, and snippets.

@adamryman
Created February 13, 2023 22:13
Show Gist options
  • Save adamryman/0dcc7e3a401440117bd6889b59b41bdb to your computer and use it in GitHub Desktop.
Save adamryman/0dcc7e3a401440117bd6889b59b41bdb to your computer and use it in GitHub Desktop.

Wire through vs global import

Preface

All of this came to mind reading through python code which passed a logger through the application. I mentally contrasted this pattern with a golang logger I created. This golang logger had an exported package level variable which was initialized as a logger without any functions being called. Said logger could be accessed directly from an import anywhere in the application.

As a result of this consideration, I wrote this document to get out my thoughts loggers wired through vs globally accessible via import.

Purpose

I am about to add a metrics client to a reoccurring python job. This job already passes the logger from main down through all functions. I am considering passing the metrics client from main down through all functions as well. To match the logger pattern.

I would like input on more pros / cons between wired through loggers/metrics and globally accessible loggers/metrics from imports.

Intro

Generally in applications, we want to avoid global state. As global state can be accessed from anywhere, this means that modifying global state can impact the surface area of the entire application. Additionally, because this state can be accessed anywhere, this means that it is not clear what part of the application require any particular state.

The state does not have to be global, it could just be a lot of state which is passed around to a large portion of the application.

This creates coupling over a large percentage of the application, and the cohesion is low.

Service and Datastore Clients

An example could be a web application which has multiple handlers for different api endpoints. These handlers could require calling other backend services to satisfy an incoming request.

You could have a large state ball, which has access to all clients of services, as well as all backing datastores. This large state ball could then be passed to all handlers.

Note that all code examples in this document are psudocode. The psudocode emits some best practices for brevity.

from conf import get_conf
import server

class Clients:
   def __init__(self, conf):
        self.notification = NotificationClient(conf.notification_url)
        self.user = UserClient(conf.user_url)
        self.postgres = PostgresClient(conf.postgres_creds)

class ActionHandler:
    def __init_(self, clients):
        self.clients = clients

    def do_action(self, action, user_id):
        user = self.clients.user.get_user(user_id)
        result = self.clients.notification.send(user_id, action)
        self.clients.postgres.insert_notification_result(result)

if __name__ == "__main__":
    conf = get_conf()
    clients = Clients(conf)
    server.add_handler('/action/' ActionHandler(clients))
    server.start()

Or, arguably worse, clients shard global based on imports.

# --- clients.py ---
from conf import get_conf

clients = None

class Clients:
   def __init__(self, conf):
        self.notification = NotificationClient(conf.notification_url)
        self.user = UserClient(conf.users_url)
        self.postgres = PostgresClient(conf.postgres_creds)

def init_clients():
    global clients
    if not clients:
        clients = Clients(get_conf())

# --- main.py ---
from clients import clients, init_clients
import server

class ActionHandler:
    def __init__(self):
        self.clients = clients

    def do_action(self, action, user_id):
        user = self.clients.user.get_user(user_id)
        result = self.clients.notification.send(user_id, action)
        self.clients.postgres.insert_notification_result(result)

if __name__ == "__main__":
    init_clients()
    server.add_handler('/action/', ActionHandler())
    server.start()

A better approach might be:

from conf import get_conf
import server

class ActionHandler:
    def __init__(self, user_client, notification_client, postgres_client):
        self.user = user_client
        self.notification = notification_client
        self.postgres = postgres_client

    def do_action(self, action, user_id):
        user = self.user.get_user(user_id)
        result = self.notification.send(user_id, action)
        self.postgres.insert_notification_result(result)

if __name__ == "__main__":
    conf = get_conf()
    notification = NotificationClient(conf.notification_url)
    user = UserClient(conf.users_url)
    postgres = PostgresClient(conf.postgres_creds)

    action_handler = ActionHandler(user, notification, postgres)
    server.add_handler('/action/', action_handler)
    server.start()

The issue above really starts to come about when you have many handlers, all of which require different backing clients.

What about loggers and metrics?

While shown above, the big ball of state for service clients and datastore clients can be a mess.

Though, for loggers and metrics, it seems to me that in most applications global loggers and metrics seems acceptable. As you expect that anywhere in the application you may want to log or emit a metric. Sure, you may want to pass loggers/metrics to some functions to retain tags and such. Though it seems acceptable to import and call the global logger/metrics with the default tags from anywhere in the application.

We could wire the first initialized logger/metrics through to all classes / functions.

from logger import get_logger
from metrics import get_metrics
from conf import get_conf
import server

class ActionHandler:
    def __init__(self, logger, metrics, user_client, notification_client, postgres_client):
        self.logger = logger
        self.metrics = metrics
        self.user = user_client
        self.notification = notification_client
        self.postgres = postgres_client

    def do_action(self, action, user_id):
        self.logger.log("doing action")
        user = self.user.get_user(user_id)
        result = self.notification.send(user_id, action)
        self.postgres.insert_notification_result(result)
        self.metrics.increment("action.do", ("action", action))

if __name__ == "__main__":
    logger = get_logger()
    metrics = get_metrics()
    conf = get_conf()
    notification = NotificationClient(logger, metrics, conf.notification_url)
    user = UserClient(logger, metrics, conf.users_url)
    postgres = PostgresClient(logger, metrics, conf.postgres_creds)

    action_handler = ActionHandler(logger, metrics, user, notification, postgres)
    server.add_handler('/action/', action_handler)
    server.start()

Or we could use a global logger / metrics import.

from logger import logger, init_logger
from metrics import metrics, init_metrics
from conf import get_conf
import server

class ActionHandler:
    def __init__(self, logger, metrics, user_client, notification_client, postgres_client):
        self.user = user_client
        self.notification = notification_client
        self.postgres = postgres_client

    def do_action(self, action, user_id):
        logger.log("doing action")
        user = self.user.get_user(user_id)
        result = self.notification.send(user_id, action)
        self.postgres.insert_notification_result(result)
        metrics.increment("action.do", ("action", action))

if __name__ == "__main__":
    init_logger()
    init_metrics()
    conf = get_conf()
    notification = NotificationClient(conf.notification_url)
    user = UserClient(conf.users_url)
    postgres = PostgresClient(conf.postgres_creds)

    action_handler = ActionHandler(user, notification, postgres)
    server.add_handler('/action/', action_handler)
    server.start()

Upon writing this, wiring these loggers through does not seem that bad, as the Handlers maintain the logger in its state and as such you do not need to wire it to every function.

However, having the logger and metrics to initialize each handlers adds a bit of visual noise / stutter. Which distracts a bit from the useful patter of passing in only needed clients. Again, this will only get more visible the more handlers that you have.

Wire through logger pro and cons

Pros

  • Known initialization
  • All usages can be traced from main

Cons

  • Visual noise / stutter which can obfuscate the unique parameters for a function

Globally accessible from import logger pros and cons

Pros

  • Accessible without wiring through.

Cons

  • Initialization is questionable
  • Cannot trace usages purely through code path

Wire through metrics and global metrics from import have the same pros / cons as the loggers pro/cons above. Except that the visualize noise / stutter gets nonlinearly worse.

Restate purpose

Again, I write this document receive input how to add metrics to an application, wiring through vs a globally accessible via import.

@lelandbatey
Copy link

We chatted about this, and here's a summary of what I had to say in support of "thread everything through":

  • If we thread everything through, we allow for control e.g. during testing. If we don't thread everything through, we are giving up the ability to clearly swap out behaviors.
  • With everything threaded through, all our dependencies are very clear. If we don't thread through, we don't get rid of the dependency, instead we hide those dependencies from other developers. I argue that hiding dependencies is basically never worth it; sure it can make the code look nice, but it means we give up control and comprehension.
  • Effectively, a global dependency is saying "by making this global, I am saying that we will never, ever want to locally change the behavior of this dependency, and I believe this so strongly that I'm making it functionally impossible." For things like loggers and metrics, where I as a developer want to do all kinds of things to them to help them serve me, it drives me CRAZY to take away that control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment