2017-11-25

Taking On Technical Debt

Technical debt is a contentious topic. Building product quickly is at first a very desirable goal, but oftentimes that means taking on technical debt which can slow down a team’s ability to build product in the future. Figuring out the right balance between building quickly in the short term vs building quickly in the long term is quite challenging.

The basic principle is a simple ROI calculation. With financial debt, you take it when you believe the money you can make with that debt exceeds the cost you pay in interest. Let’s say I could make $5000 by borrowing $1000 and paying $200 in interest over a year. That’s a great return! I’ll take that opportunity every time I can.

Unfortunately, technical debt does not come with clear cut terms. You don’t know how much interest you’ll be paying and you don’t even know what the principal of the debt actually is. Things get more complicated because technical debt can increase the risk of software bugs. Some bugs are relatively harmless. Some bugs can end companies. There’s no way to accurately calculate the risk in these situations.

For this reason, I don’t think anyone can provide a formula that let’s you know when you should or should not take on technical debt. It is almost always going to be a judgement call with some level of uncertainty. What we can do is explain how we went about making these calls. I will do my best to explain my reasoning for the decisions around technical debt I took on my current project.

Some important context: I’m still trying to find product/market fit so the code is likely to change dramatically because the product is going to change dramatically. It makes no sense to optimize my code for development speed 6 months from now when in reality a lot of the code may not even be around 3 months from now.

On the other hand, finding product market fit isn’t an effort that is often achieved in a weekend hackathon. It’s an effort that can take months. There is some level of optimization I can do in order to make sure I’m not throwing away all my code every time the product changes.

So here is the catch-22. Ideally I split up my code in ways where I can maximize the pieces that can be reused, but I don’t know what pieces will be reused because I don’t know how the product could change yet. It’d be nice if my first build is the right direction, but the whole point of finding product market fit is understanding that it takes time to find the right direction for a product.

There’s a classic software design principle where you make sure pieces of the system are separated in logical chunks. An example is that in an e-commerce site, the code for the recommendation engine and the code for the checkout process are completely separate. Even if recommendations are shown on a checkout page, the backend systems are isolated into reusable “services” that the checkout page uses. This allows the results from the recommendation engine to be copied over to any number of pages with a minimal amount of code being written.

Some parts of Maleega are a little less clearcut. A recommendation engine and a checkout page are very different concepts. I’m building an email product and I have decided to reuse the threading concept made popular by many other products. While I can’t conceive of diverging from that path at this point, I can’t rule it out either. That means I need to keep the concept of emails and threads separate in my system even though they are very closely related concepts. I can use another piece of code to combine these two concepts since they are related in this version of the product. If the product changes, it is this piece of code that needs to be rewritten. The code for emails and threads can stay the same, ready to be reused or thrown out.

This is much easier said than done. The challenge isn’t technical. Writing code to do this is easy. The challenge is that programmers are trained to worry about efficiency and doing this goes against that training.

An example of this is the use of JOINs in Mysql. There’s a lot of efficiency to be gained by using one database query to retrieve a thread and a number of emails in that thread. Think of it as sending a series of gifts to a friend. Sending one large package saves time in packing each package and tracking each package’s delivery. Sending lots of little packages is more costly.

The cost of doing everything in a large database query though is it breaks this separation. Using the package analogy, you probably wouldn’t send jewelry in the same package as a bunch of books. But sometimes the efficiency gains are too tempting to ignore. One exception is made.

Then another exception. Then another. Sooner or later you can’t tell that the code was separated at one point. Everything has merged into some giant blob. The code may be “organized” into different files, but it functions as a blob nonetheless.

The cost of keeping code cleanly separated relies on embracing the need for some inefficiency to maintain it. Fortunately this is usually ok. The inefficiency this brings up early on is barely noticeable. Databases can handle hundreds of queries per second. Going from 1 query to 2-5 queries is irrelevant when there are only a few users in the system. I will live without Mysql JOINs.

In the case where you are lucky enough to have enough users where this becomes a problem, you will also have to worry about having all your users on a single database anyway. By maintaining this separation in the code, each isolated chunk can be replaced. Maybe it gets turned into a microservice. Maybe it gets converted to a different database. Maybe it gets replaced with a third party. The development cost is isolated to that chunk. The whole system doesn’t need to be rewritten.

Now if you’re thinking “This doesn’t sound like taking on technical debt at all!”, you’re half right. The part where technical debt is not being taken is the design of the system at a high level. While the chunks maybe be rearranged when the product changes, most of the chunks should still be reuseable.

The part where technical debt can be taken is with these chunks.

For example, I wanted to get an alpha release for Maleega out as soon I could over the summer. Getting an alpha out in 4-6 weeks means cutting a lot of corners. One was on data storage. Mysql databases aren’t suited for storing email content. It does meta data really well, but not big blobs of text. There is technically a way to do it, but it doesn’t scale to millions of emails very nicely. Alpha releases don’t need to worry about scale though and I wanted to get an alpha release out as soon as possible.

One way I could have built the alpha release was to put the email content in a column with the meta data. This ties the two pieces of data together though. All my code would be written assuming that this data is tied together. This would make swapping the data store for this data out much more complicated, especially if JOINs are used with other concepts like threads.

The alternative was a separate table to store the data and my email table just references that as an id. This adds what seems like unnecessary overhead, but it also only took about 10 minutes of coding. It let me put off hours of research, planning, and coding for a scalable datastore. And by giving the new table a stupid name like “temporary_data_store”, it was a constant reminder that I should consider replacing it (which I did while preparing for the beta release). I allowed myself to put off hours of work for a few months by spending 10 minutes. Not a bad cost of technical debt.

Another example is my sanitization code. All of that is separated in an isolated section of my system. Good thing to because it is EXTREMELY inefficient. So much so that I would bet I would fail a job interview if I used it as my solution to an interview question. Once again though, the inefficiency is barely noticeable to a user. Web pages are expected to load in less than 100 milliseconds, but even gmail takes a few hundred millis to send an email. Adding 100 ms of processing for sanitization is not a huge deal.

Could it be 10 ms? Probably. Do I care right now? Not at all. By isolating the code I give myself the option to rebuild it at a time when it will matter. The cost will simply be the time I would have spent to build it properly in the first place. The wasted time would have been the time spent building a crappy solution. I spent a couple hours to allow me to put off many more hours of work for months (maybe even years).

It seems silly, but many times this decision is not made. All the code would be mixed in and built on top of over years. The cost of replacement isn’t merely the time it would have taken to rewrite a single piece of the system (in this case sanitization). The cost of replacement would be rewriting years of code to extract the sanitization code and making sure nothing else is broken when it is replaced. More often than not, this isn’t a conscious decision. In fact it is actually the lack of a decision being considered.

Technical debt is often unavoidable, but sometimes it is desirable to make conscious decisions to take on more of it. The key is making sure the benefits of technical debt outweigh the cost. That means making sure the cost of cleaning it up is either fixed or grows very, very slowly.

The most effective way to do that in my opinion is to spend time thinking things through at a high level. Thinking is a lot faster than writing code and making sure it works. By avoiding shortcuts in this thought process and making good decisions, one is actually capable of taking on more technical debt without severely impacting future development speed. That debt just happens to be planned rather than taken accidentally.

Hi there! I hope you enjoyed this post.

I keep this blog around for posterity, but have since moved on. An explanation can be found here

I still write though and if you'd like to read my more recent work, feel free to subscribe to my substack.

Professor Beekums Blog

Taking On Technical Debt