Branch: Tech debt

Poor quality notes, by Mike

This is some notes I took during our discussion; at the start I did try to attribute names to quotes, but as it went on a) people interjected more and it was harder to give direct attribution and b) I got a few peoples' names confused early on and that messed the rest up!

We began with the question of "define technical debt" and in fact this was the only question Dave really got to ask, as our discussion progressed naturally from trying to define it to trying to identify causes.

A few early point were made:

Measured decision, understand the interest rate. "Different kinds of debt need different kinds of remedies"
"Can crash the business if it's let to go long enough"
"Articulate business benefit of a need for a rebuild" - it allows more features, but had to block out time

During one person's annecdote (I didn't get to note it because I was busy listening) someone asked the very poignant question "what went wrong?"

"Just an unfortunate decision to use this framework"
"Forever making decisions about stuff we don't have enough information about"
Get the concept [of tech debt] across to people who are drawn to want to use new shiny things
"All the stories of tech debt I see are about bad design leading to coupling" that can increase the 'blast radius' of problems/bugs that occur in the system

The concept of emergent design came up - as somewhat of a critique, where we looked at the fact this had become popular and potentially led to problems with systems that were designed as they were being built. The concept of "data truth", how designed data maps to the business case, was raised as a problem of utilising emergent design:

"Before starting, design the data model"

We then looked at ways we can recover from this kind of failure - is a project "doomed" once we get the initial design wrong, fated for a costly rebuild? The general view was that this is a possibility but not a certainty, and the sooner this is addressed, the better. So we got onto code refactoring:

"Do code refactoring as a pattern of modifying types rather than aesthetics - define the type of coupling. E.g. [a common flaw of] microservices relying on internal private IDs of entities in other microservices"

We returned later to the idea of coupling in code (but I include it here to make a better read), because knowing how to decouple tends to be key if we do want to refactor and reduce aformentioned 'blast radius'. Some resources mentioned were a git repository tool that tracks of how often files change together - i.e. if two files always change in the same commit/Pull Request, then it's likely they are closely related, even if neither file mentions the other one explicitly.

There was also a recommendation at this point for a book, Your Code as a Crime Scene by Adam Tornhill [sp] (link at end)

We then revisited the comments from earlier about making decisions on tools, frameworks or patterns to use as a process of following "what's shiny", or what's new, what's being talked about, what less experienced members of the team are keen to play with, rather than making deliberate choices about what the best tools to use are (and there was a general feeling that tried and tested tools tend to win with all else being even):

"People don't stick around long enough - [this means] bad decisions don't affect them"
"Tech change gets exacerbated by PR and changes of ecosystems" - meaning that advertising or promotion of a given tool/stack is often seen as more authoritative than the existing knowledge base of the team
"[I] worked with a company that made use of more coding languages than they had developers"
"There's not enough communication between teams"

At this point we started to try and figure out how to explain tech debt (even though we still had a range of quite different definitions) to non-technical management. The concept being that even if we have many concepts under one umbrella, we still need to be able to advocate for technical change that's not just "build a new feature", but which therefore is less clear in terms of impact when viewed by non-technical management. How do we make a case that this is important for business priorities, namely that it can positively affect the bottom line of the business:

"Risk is a nebulous concept - but we can ask how many delays have we had?" (and subsequently, what's the cost of that delay)
"It's possible for tech debt to increase due to pure inactivity" (this was a really valuable point - in this lens, it's more like going overdrawn because you don't manage your bank account well, rather than deliberately taking out a loan)
"Are decisions made out of fear, or opportunity?" and we also discussed who gets to make these decisions - are technical people excluded or included in the final choice
For larger businesses it was also mentioned that multiple units can end up building the same thing - depending on scale there may be good reasons for this, but often this is accidental and due to lack of communication, leading to teams duplicating effort and eventually colliding with each other as they try and branch their tools out across the organisation (this requires a certain company size to really take effect, but isolated team members in smaller teams may also encounter this)

We eventually tried to get out some actual definitions - I am biased as to the formatting because I proposed three options, but whilst we then debated each, it does seem like there's agreement that all three are problematic, even if different people do or don't view certain parts of this in the "tech debt" calculus (I've numbered the list but there's no specific order):

Debt incurred as a known "shortcut" to release a feature or bug fix sooner than not using said shortcut. This was then referred to as "negligent debt", i.e. it's caused by active negligence, rather than caused by other external factors (post-meeting observation: our discussions touched on this the least by far, but it's probably the first thing that would come to mind for most developers when asked about tech debt)
Debit incurred due to decisions made with inadequate data or for wrong reasons. Also caused as the business evolves and moves away from initial assumptions. (I think this comes out as one of the hardest to solve but potentially also the most impactive to get right, and it has the nice benefit of involving the whole business, rather than just being something "engineering has to do")
Debt incurred as a result of inertia or inactivity. We also labelled this business debt, because there's been a choice to pursue other things instead of maintenance. (This could be similar to not keeping track of accounts receivable - you have a good amount of business but if you don't collect, your cashflow can suffer badly enough to derail even a profitable business)

After going through this we still had to get a definition of "debt" as that appears in all 3 rules. We arrived at any technical aspect that causes drag, i.e. it has an effect on all future work, resulting in slower work, and ultimately a direct cost to the business from indirect factors.

"Find the north star and share with the team"

M1ke/tech-debt.md

Branch: Tech debt