Software engineering principles, from the book Clean Code by Robert C. Martin, adapted for Python. This is not a style guide. It's a guide to producing readable, reusable, and refactorable software in Python. Not every principle herein has to be strictly followed, and even fewer will be universally agreed upon. These are guidelines and not gospel, but they are codified over many years of collective experience by the authors of Clean Code.
Bad:
import datetime
ymdstr = datetime.date.today().strftime("%y-%m-%d")
Moreover, there is no need to use the type str
for the name.
Good:
import datetime
current_date: str = datetime.date.today().strftime("%y-%m-%d")
Bad: We use multiple names for the same entity here:
def get_user_info(): pass
def get_client_data(): pass
def get_customer_record(): pass
Good: If the entity is the same, you should be consistent in referencing it in your functions:
def get_user_info(): pass
def get_user_data(): pass
def get_user_record(): pass
Even better
Python is (also) an Object-Oriented Programming language. If it makes sense, bundle functions together with the specific implementation of the entity in your code, as instance attributes, property methods, or methods:
from typing import Union, Dict
class Record:
pass
class User:
info : str
@property
def data(self) -> Dict[str, str]:
return {}
def get_record(self) -> Union[Record, None]:
return Record()
We will read more code than we will ever write. It's important that the code we do write is readable and searchable. By not naming variables that end up being meaningful for understanding our program, we hurt our readers. Make your names searchable.
Bad:
import time
# What is the number 86400 for again?
time.sleep(86400)
Good:
import time
# Declare them in the global namespace for the module.
SECONDS_IN_A_DAY = 60 * 60 * 24
time.sleep(SECONDS_IN_A_DAY)
Bad:
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
print(f"{matches[1]}: {matches[2]}")
Not bad:
Better, but still heavily dependent on the regex.
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(.+?)\s*(\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
city, zip_code = matches.groups()
print(f"{city}: {zip_code}")
Good:
Reduce the dependency on the regex by naming the subgroups.
import re
address = "One Infinite Loop, Cupertino 95014"
city_zip_code_regex = r"^[^,\\]+[,\\\s]+(?P<city>.+?)\s*(?P<zip_code>\d{5})?$"
matches = re.match(city_zip_code_regex, address)
if matches:
print(f"{matches['city']}, {matches['zip_code']}")
Don't force your code readers to translate variable names. Explicit is better than implicit.
Bad:
seq = ("Austin", "New York", "San Francisco")
for item in seq:
#do_stuff()
#do_some_other_stuff()
# Wait, what's `item` again?
print(item)
Good:
locations = ("Austin", "New York", "San Francisco")
for location in locations:
#do_stuff()
#do_some_other_stuff()
# ...
print(location)
If the class/object name tells you something, don't repeat that in your variable name.
Bad:
class Car:
car_make: str
car_model: str
car_color: str
Good:
class Car:
make: str
model: str
color: str
Bad
Why write:
import hashlib
def create_micro_brewery(name):
name = "Hipster Brew Co." if name is None else name
slug = hashlib.sha1(name.encode()).hexdigest()
# etc.
...when you can use a default argument? This also makes it crystal clear to the reader that you expect a string for your argument.
Good:
import hashlib
def create_micro_brewery(name: str = "Hipster Brew Co."):
slug = hashlib.sha1(name.encode()).hexdigest()
# etc.
Limiting the number of function parameters is incredibly important because it makes testing your function easier. Having more than three leads to a combinatorial explosion where you have to test tons of different cases with each separate argument.
Zero arguments is the ideal case. One or two arguments is okay, and three should be avoided. Anything more should be consolidated. Usually, if you have more than two arguments then your function is trying to do too much. In cases where it's not, most of the time a higher-level object will suffice as an argument.
Bad:
def create_menu(title, body, button_text, cancellable):
pass
Java-esque:
class Menu:
def __init__(self, config: dict):
self.title = config["title"]
self.body = config["body"]
# ...
menu = Menu(
{
"title": "My Menu",
"body": "Something about my menu",
"button_text": "OK",
"cancellable": False
}
)
This is fine too
class MenuConfig:
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig) -> None:
title = config.title
body = config.body
# ...
config = MenuConfig()
config.title = "My delicious menu"
config.body = "A description of the various items on the menu"
config.button_text = "Order now!"
# The instance attribute overrides the default class attribute.
config.cancellable = True
create_menu(config)
Fancy
from typing import NamedTuple
class MenuConfig(NamedTuple):
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig):
title, body, button_text, cancellable = config
# ...
create_menu(
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!"
)
)
Fancier
from dataclasses import astuple, dataclass
@dataclass
class MenuConfig:
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool = False
def create_menu(config: MenuConfig):
title, body, button_text, cancellable = astuple(config)
# ...
create_menu(
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!"
)
)
Even fancier, Python 3.8+ only
from typing import TypedDict
class MenuConfig(TypedDict):
"""A configuration for the Menu.
Attributes:
title: The title of the Menu.
body: The body of the Menu.
button_text: The text for the button label.
cancellable: Can it be cancelled?
"""
title: str
body: str
button_text: str
cancellable: bool
def create_menu(config: MenuConfig):
title = config["title"]
# ...
create_menu(
# You need to supply all the parameters
MenuConfig(
title="My delicious menu",
body="A description of the various items on the menu",
button_text="Order now!",
cancellable=True
)
)
This is by far the most important rule in software engineering. When functions do more than one thing, they are harder to compose, test, and reason about. When you can isolate a function to just one action, they can be refactored easily and your code will read much cleaner. If you take away anything from this guide other than this, you'll be ahead of many developers.
Bad:
from typing import List
class Client:
active: bool
def email(client: Client) -> None:
pass
def email_clients(clients: List[Client]) -> None:
"""Filter active clients and send them an email.
"""
for client in clients:
if client.active:
email(client)
Good:
from typing import List
class Client:
active: bool
def email(client: Client) -> None:
pass
def get_active_clients(clients: List[Client]) -> List[Client]:
"""Filter active clients.
"""
return [client for client in clients if client.active]
def email_clients(clients: List[Client]) -> None:
"""Send an email to a given list of clients.
"""
for client in get_active_clients(clients):
email(client)
Do you spot an opportunity to use generators now?
Even better
from typing import Generator, Iterator
class Client:
active: bool
def email(client: Client):
pass
def active_clients(clients: Iterator[Client]) -> Generator[Client, None, None]:
"""Only active clients"""
return (client for client in clients if client.active)
def email_client(clients: Iterator[Client]) -> None:
"""Send an email to a given list of clients.
"""
for client in active_clients(clients):
email(client)
Bad:
class Email:
def handle(self) -> None:
pass
message = Email()
# What is this supposed to do again?
message.handle()
Good:
class Email:
def send(self) -> None:
"""Send this message"""
message = Email()
message.send()
When you have more than one level of abstraction your function is usually doing too much. Splitting up functions leads to reusability and easier testing.
Bad:
# type: ignore
def parse_better_js_alternative(code: str) -> None:
regexes = [
# ...
]
statements = code.split('\n')
tokens = []
for regex in regexes:
for statement in statements:
pass
ast = []
for token in tokens:
pass
for node in ast:
pass
Good:
from typing import Tuple, List, Dict
REGEXES: Tuple = (
# ...
)
def parse_better_js_alternative(code: str) -> None:
tokens: List = tokenize(code)
syntax_tree: List = parse(tokens)
for node in syntax_tree:
pass
def tokenize(code: str) -> List:
statements = code.split()
tokens: List[Dict] = []
for regex in REGEXES:
for statement in statements:
pass
return tokens
def parse(tokens: List) -> List:
syntax_tree: List[Dict] = []
for token in tokens:
pass
return syntax_tree
Flags tell your user that this function does more than one thing. Functions should do one thing. Split out your functions if they are following different code paths based on a boolean.
Bad:
from tempfile import gettempdir
from pathlib import Path
def create_file(name: str, temp: bool) -> None:
if temp:
(Path(gettempdir()) / name).touch()
else:
Path(name).touch()
Good:
from tempfile import gettempdir
from pathlib import Path
def create_file(name: str) -> None:
Path(name).touch()
def create_temp_file(name: str) -> None:
(Path(gettempdir()) / name).touch()
A function produces a side effect if it does anything other than take a value in and return another value or values. A side effect could be writing to a file, modifying some global variable, or accidentally wiring all your money to a stranger.
Now, you do need to have side effects in a program on occasion. Like the previous example, you might need to write to a file. What you want to do is to centralize where you are doing this. Don't have several functions and classes that write to a particular file. Have one service that does it. One and only one.
The main point is to avoid common pitfalls like sharing state between objects without any structure, using mutable data types that can be written to by anything, or not centralizing where your side effects occur. If you can do this, you will be happier than the vast majority of other programmers.
Bad:
# type: ignore
# This is a module-level name.
# It's good practice to define these as immutable values, such as a string.
# However...
fullname = "Ryan McDermott"
def split_into_first_and_last_name() -> None:
# The use of the global keyword here is changing the meaning of the
# the following line. This function is now mutating the module-level
# state and introducing a side-effect!
global fullname
fullname = fullname.split()
split_into_first_and_last_name()
# MyPy will spot the problem, complaining about 'Incompatible types in
# assignment: (expression has type "List[str]", variable has type "str")'
print(fullname) # ["Ryan", "McDermott"]
# OK. It worked the first time, but what will happen if we call the
# function again?
Good:
from typing import List, AnyStr
def split_into_first_and_last_name(name: AnyStr) -> List[AnyStr]:
return name.split()
fullname = "Ryan McDermott"
name, surname = split_into_first_and_last_name(fullname)
print(name, surname) # => Ryan McDermott
Also Good :
from dataclasses import dataclass
@dataclass
class Person:
name: str
@property
def name_as_first_and_last(self) -> list:
return self.name.split()
# The reason why we create instances of classes is to manage state!
person = Person("Ryan McDermott")
print(person.name) # => "Ryan McDermott"
print(person.name_as_first_and_last) # => ["Ryan", "McDermott"]
As stated by Robert C. Martin:
A class should have one, and only one, reason to change.
These "reasons to change" are inherently the responsibilities handled by a class or function.
In the following example, we create an HTML element representing a comment with a version number.
Bad :
from importlib import metadata
class VersionCommentElement:
"""An element that renders an HTML comment with the program's version number
"""
def get_version(self) -> str:
"""Get the package version"""
return metadata.version("pip")
def render(self) -> None:
print(f'<!-- Version: {self.get_version()} -->')
VersionCommentElement().render()
This class has two responsibilities:
- Retrieving the version of a Python package.
- Rendering it into an HTML comment element.
Any changes to one of these responsibilities risk affecting the other. We can refactor the class to segregate the responsibilities.
Good :
from importlib import metadata
def get_version(pkg_name:str) -> str:
"""Retrieve the version of a given package"""
return metadata.version(pkg_name)
class VersionCommentElement:
"""An element that renders an HTML comment with the program's version number
"""
def __init__(self, version: str):
self.version = version
def render(self) -> None:
print(f'<!-- Version: {self.version} -->')
VersionCommentElement(get_version("pip")).render()
The result is that the class now only handles the rendering. It receives the version during instantiation, and the text is retrieved by a separate get_version()
function. Changing the class won't affect the other, and vice-versa, as long as the contract between them doesn't change (i.e., the function returns a string, and the class's __init__
method accepts a string).
Also, the get_version()
function is now reusable elsewhere.
“Entities should be open for extension, but closed for modification.” Uncle Bob.
Objects should be open for extension but closed for modification. It should be possible to enhance the functionality provided by an object (e.g., a class) without altering its internal contract. An object achieves this capability when it is written in a way that allows for easy extension.
In the next example, we attempt to implement a simple web framework that handles HTTP requests and returns a response. The View
class has a .get()
method that will be called when the HTTP server receives a GET request.
The View
class is intentionally simple and returns text/plain
responses. We also want HTML responses based on template files, so we subclass it with TemplateView
.
Bad:
from dataclasses import dataclass
@dataclass
class Response:
"""An HTTP response"""
status: int
content_type: str
body: str
class View:
"""A simple view that returns plain text responses"""
def get(self, request) -> Response:
"""Handle a GET request and return a message in the response"""
return Response(
status=200,
content_type='text/plain',
body="Welcome to my web site"
)
class TemplateView(View):
"""A view that returns HTML responses based on a template file."""
def get(self, request) -> Response:
"""Handle a GET request and return an HTML document in the response"""
with open("index.html") as fd:
return Response(
status=200,
content_type='text/html',
body=fd.read()
)
The TemplateView
class modifies the internal behavior of its parent to add more advanced functionality. By doing so, TemplateView
relies on its parent's .get()
implementation not changing, which must now be frozen in time. For example, we cannot introduce some additional checks across all classes derived from View
because this behavior has been overridden in at least one subclass, and we would need to update it too.
Let's refactor our classes to fix this issue and allow the View
class to be cleanly extended (not modified).
Good :
from dataclasses import dataclass
@dataclass
class Response:
"""An HTTP response"""
status: int
content_type: str
body: str
class View:
"""A simple view that returns plain text responses"""
content_type = "text/plain"
def render_body(self) -> str:
"""Render the message body of the response"""
return "Welcome to my web site"
def get(self, request) -> Response:
"""Handle a GET request and return a message in the response"""
return Response(
status=200,
content_type=self.content_type,
body=self.render_body()
)
class TemplateView(View):
"""A view that returns HTML responses based on a template file."""
content_type = "text/html"
template_file = "index.html"
def render_body(self) -> str:
"""Render the message body as HTML"""
with open(self.template_file) as fd:
return fd.read()
Note that we had to override render_body()
to change the source of the body, but this method has a single, well-defined responsibility that explicitly invites subclasses to override it. It's designed to allow extension by subclasses.
Another good way to harness the power of inheritance and object composition is to use Mixins.
Mixins are simple, root classes that are used exclusively with other related classes. They are "mixed" into the target class using multiple inheritance to alter the target's behavior.
A few rules:
- Mixins must inherit from
object
. - Mixins always come before the target class. For example:
class Foo(MixinA, MixinB, TargetClass): ...
Also good :
from dataclasses import dataclass, field
from typing import Protocol
@dataclass
class Response:
"""An HTTP response"""
status: int
content_type: str
body: str
headers: dict = field(default_factory=dict)
class View:
"""A simple view that returns plain text responses"""
content_type = "text/plain"
def render_body(self) -> str:
"""Render the message body of the response"""
return "Welcome to my web site"
def get(self, request) -> Response:
"""Handle a GET request and return a message in the response"""
return Response(
status=200,
content_type=self.content_type,
body=self.render_body()
)
class TemplateRenderMixin:
"""A mixin class for views that render HTML documents using a template file
Not to be used by itself!
"""
template_file: str = ""
def render_body(self) -> str:
"""Render the message body as HTML"""
if not self.template_file:
raise ValueError("The path to a template file must be given.")
with open(self.template_file) as fd:
return fd.read()
class ContentLengthMixin:
"""A mixin class for views that injects a Content-Length header in the
response
Not to be used by itself!
"""
def get(self, request) -> Response:
"""Introspect and amend the response to inject the new header"""
response = super().get(request) # type: ignore
response.headers['Content-Length'] = len(response.body)
return response
class TemplateView(TemplateRenderMixin, ContentLengthMixin, View):
"""A view that returns HTML responses based on a template file."""
content_type = "text/html"
template_file = "index.html"
As you can see, mixins facilitate object composition by packaging similar functionality into a reusable class with a single responsibility and allowing them to be segregated. The popular Django framework makes extensive use of mixins to build its class-based views.
“ Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it. “ Uncle Bob.
This principle is named after Barbara Liskov, who collaborated with computer scientist Jeannette Wing on the paper "A behavioral notion of subtyping (1994)". A central tenet of the paper is that "a subtype must preserve the behavior of its supertype's methods and also keep all of its supertype's invariant and history properties."
This means that a function accepting a supertype should also accept its subtypes without any modification.
Can you spot the problem in the following code?
Bad :
from dataclasses import dataclass
@dataclass
class Response:
"""An HTTP response"""
status: int
content_type: str
body: str
class View:
"""A simple view that returns plain text responses"""
content_type = "text/plain"
def render_body(self) -> str:
"""Render the message body of the response"""
return "Welcome to my web site"
def get(self, request) -> Response:
"""Handle a GET request and return a message in the response"""
return Response(
status=200,
content_type=self.content_type,
body=self.render_body()
)
class TemplateView(View):
"""A view that returns HTML responses based on a template file."""
content_type = "text/html"
def get(self, request, template_file: str) -> Response: # type: ignore
"""Render the message body as HTML"""
with open(template_file) as fd:
return Response(
status=200,
content_type=self.content_type,
body=fd.read()
)
def render(view: View, request) -> Response:
"""Render a View"""
return view.get(request)
The render()
function is expected to work well with View
and its subclass TemplateView
. However, the subclass breaks compatibility by changing the signature of the .get()
method.
The function will raise a TypeError
when used with TemplateView
.
If we want the render()
function to work with any subtype of View
, we must ensure that we do not break its public protocol. But how do we know what constitutes the protocol of a class? Type hinters like mypy will raise warnings like this if they spot such errors:
error: Signature of "get" incompatible with supertype "View"
<string>:36: note: Superclass:
<string>:36: note: def get(self, request: Any) -> Response
<string>:36: note: Subclass:
<string>:36: note: def get(self, request: Any, template_file: str) -> Response
“Keep interfaces small so that users don’t end up depending on things they don’t need.” Uncle Bob.
Several popular object-oriented programming languages like Go and Java have the concept of interfaces. Interfaces define the public methods and attributes of an object without implementing them. They are useful when we don't want to couple a function's signature to a specific object. Python does not have interfaces. It has Abstract Base Classes instead, which are slightly different but can serve the same purpose.
Good
from abc import ABCMeta, abstractmethod
# Define the Abstract Class for a generic Greeter object
class Greeter(metaclass=ABCMeta):
"""An object that can perform a greeting action."""
@staticmethod
@abstractmethod
def greet(name: str) -> None:
"""Display a greeting for the user with the given name"""
class FriendlyActor(Greeter):
"""An actor that greets the user with a friendly salutation"""
@staticmethod
def greet(name: str) -> None:
"""Greet a person by name"""
print(f"Hello {name}!")
def welcome_user(user_name: str, actor: Greeter):
"""Welcome a user with a given name using the provided actor"""
actor.greet(user_name)
welcome_user("Barbara", FriendlyActor())
Now consider this scenario: we have a number of PDF documents that we authored, and we want to serve them to visitors of our website. We use a Python web framework and might be tempted to design a class that handles these documents, so we design a comprehensive abstract base class for our document.
Bad
import abc
class Persistable(metaclass=abc.ABCMeta):
"""Serialize a file to data and back"""
@property
@abc.abstractmethod
def data(self) -> bytes:
"""The raw data of the file"""
@classmethod
@abc.abstractmethod
def load(cls, name: str):
"""Load the file from disk"""
@abc.abstractmethod
def save(self) -> None:
"""Save the file to disk"""
# We just want to serve the documents, so our concrete PDF document
# implementation just needs to implement the `.load()` method and have
# a public attribute named `data`.
class PDFDocument(Persistable):
"""A PDF document"""
@property
def data(self) -> bytes:
"""The raw bytes of the PDF document"""
... # Code goes here - omitted for brevity
@classmethod
def load(cls, name: str):
"""Load the file from the local filesystem"""
... # Code goes here - omitted for brevity
def view(request):
"""A web view that handles a GET request for a document"""
requested_name = request.qs['name'] # We want to validate this!
return PDFDocument.load(requested_name).data
But we can't! An error is raised if we don't implement the .save()
method:
Can't instantiate abstract class PDFDocument with abstract method save.
This is annoying. We don't really need to implement .save()
here. We could write a method that does nothing or raises NotImplementedError
, but that's just extra code we need to maintain.
Meanwhile, if we remove .save()
from the abstract class, we'll need to reimplement it later when we want users to be able to submit their own documents, putting us back at square one.
The problem is that we wrote an interface that includes functionality we don't currently need or use.
The solution is to break the interface into smaller pieces that handle separate concerns.
Good
import abc
class DataCarrier(metaclass=abc.ABCMeta):
"""Carries a data payload"""
@property
def data(self):
...
class Loadable(DataCarrier):
"""Can load data from storage by name"""
@classmethod
@abc.abstractmethod
def load(cls, name: str):
...
class Saveable(DataCarrier):
"""Can save data to storage"""
@abc.abstractmethod
def save(self) -> None:
...
class PDFDocument(Loadable):
"""A PDF document"""
@property
def data(self) -> bytes:
"""The raw bytes of the PDF document"""
... # Code goes here - omitted for brevity
@classmethod
def load(cls, name: str):
"""Load the file from the local filesystem"""
... # Code goes here - omitted for brevity
def view(request):
"""A web view that handles a GET request for a document"""
requested_name = request.qs['name'] # We want to validate this!
return PDFDocument.load(requested_name).data
“Depend on abstractions, not on concretions.”, Uncle Bob.
Imagine we wanted to create a web view that returns an HTTP response containing rows from a CSV file we generated. We want to use the CSV writer provided by Python's standard library.
Bad
import csv
from io import StringIO
class StreamingHttpResponse:
"""A streaming HTTP response"""
... # implementation code goes here
def some_view(request):
rows = (
['First row', 'Foo', 'Bar', 'Baz'],
['Second row', 'A', 'B', 'C', '"Testing"', "Here's a quote"]
)
# Define a generator to stream data directly to the client
def stream():
buffer_ = StringIO()
writer = csv.writer(buffer_, delimiter=';', quotechar='"')
for row in rows:
writer.writerow(row)
buffer_.seek(0)
data = buffer_.read()
buffer_.seek(0)
buffer_.truncate()
yield data
# Create the streaming response object with the appropriate CSV header.
response = StreamingHttpResponse(stream(), content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
return response
Our first implementation works around the CSV writer's interface by manipulating a StringIO
object (which is file-like) and performing several low-level operations to extract rows from the writer.
A better approach is to leverage the fact that the writer requires an object with a .write()
method to do its work.
Why not pass it a dummy object that immediately returns the newly generated row, so the StreamingHttpResponse
class can immediately stream it back to the client?
Good
import csv
class Echo:
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
rows = (
['First row', 'Foo', 'Bar', 'Baz'],
['Second row', 'A', 'B', 'C', '"Testing"', "Here's a quote"]
)
writer = csv.writer(Echo(), delimiter=';', quotechar='"')
return StreamingHttpResponse(
(writer.writerow(row) for row in rows),
content_type="text/csv",
headers={'Content-Disposition': 'attachment; filename="somefilename.csv"'},
)
Much better, and it works like magic! The reason why this implementation is better than the previous one should be obvious: less code and more efficiency to achieve the same result. We decided to leverage the fact that the writer class depends on the .write()
abstraction of the object it receives, without caring about the low-level details of how the method works.
This example is adapted from a submission to the Django documentation by this author.
Try to understand the DRY principle.
Do your best to avoid writing the same code over and over again. Repetition is bad because it means if you need to make a change, you have to make it in more than one place.
Imagine you run a restaurant and you keep track of your inventory: all your tomatoes, onions, garlic, spices, etc. If you have multiple lists that you keep this track in, then you have to update all of those lists whenever you serve a dish with tomatoes in it. If you only have one list, there's only one place to update!
Often you repeat yourself because you have two or more things that are slightly different, but share a lot in common, but their differences force you to have two or more separate functions that do much of the same things. Removing code duplication means creating an abstraction that can handle this set of different things with just one function/module/class.
Getting the abstraction right is critical, that's why you should follow the SOLID principles discussed in the Classes section. Bad abstractions can be worse than code duplication, so be careful! With that said, if you can make a good abstraction, do it! Don't repeat yourself, otherwise, you'll find yourself updating multiple places anytime you want to change one thing.
Bad:
from typing import List, Dict
from dataclasses import dataclass
@dataclass
class Developer:
def __init__(self, experience: float, github_link: str) -> None:
self._experience = experience
self._github_link = github_link
@property
def experience(self) -> float:
return self._experience
@property
def github_link(self) -> str:
return self._github_link
@dataclass
class Manager:
def __init__(self, experience: float, github_link: str) -> None:
self._experience = experience
self._github_link = github_link
@property
def experience(self) -> float:
return self._experience
@property
def github_link(self) -> str:
return self._github_link
def get_developer_list(developers: List[Developer]) -> List[Dict]:
developers_list = []
for developer in developers:
developers_list.append({
'experience' : developer.experience,
'github_link' : developer.github_link
})
return developers_list
def get_manager_list(managers: List[Manager]) -> List[Dict]:
managers_list = []
for manager in managers:
managers_list.append({
'experience' : manager.experience,
'github_link' : manager.github_link
})
return managers_list
## create list objects of developers
company_developers = [
Developer(experience=2.5, github_link='https://github.com/1'),
Developer(experience=1.5, github_link='https://github.com/2')
]
company_developers_list = get_developer_list(developers=company_developers)
## create list objects of managers
company_managers = [
Manager(experience=4.5, github_link='https://github.com/3'),
Manager(experience=5.7, github_link='https://github.com/4')
]
company_managers_list = get_manager_list(managers=company_managers)
Good:
from typing import List, Dict
from dataclasses import dataclass
@dataclass
class Employee:
def __init__(self, experience: float, github_link: str) -> None:
self._experience = experience
self._github_link = github_link
@property
def experience(self) -> float:
return self._experience
@property
def github_link(self) -> str:
return self._github_link
def get_employee_list(employees: List[Employee]) -> List[Dict]:
employees_list = []
for employee in employees:
employees_list.append({
'experience' : employee.experience,
'github_link' : employee.github_link
})
return employees_list
## create list objects of developers
company_developers = [
Employee(experience=2.5, github_link='https://github.com/1'),
Employee(experience=1.5, github_link='https://github.com/2')
]
company_developers_list = get_employee_list(employees=company_developers)
## create list objects of managers
company_managers = [
Employee(experience=4.5, github_link='https://github.com/3'),
Employee(experience=5.7, github_link='https://github.com/4')
]
company_managers_list = get_employee_list(employees=company_managers)