8 min read
Humans have only been grappling with the art and science of computer programming for roughly half a century. Compared to most arts and sciences, computer science is in many ways still just a toddler, walking into walls, tripping over its own feet, and occasionally throwing food across the table. As a consequence of its relative youth, I don’t believe we have a consensus yet on what a proper definition of “good code” is, as that definition continues to evolve. Some will say “good code” is code with 100% test coverage. Others will say it’s super fast and has a killer performance and will run acceptably on 10 year old hardware. While these are all laudable goals for software developers, however I venture to throw another target into the mix: maintainability. Specifically, “good code” is code that is easily and readily maintainable by an organization (not just by its author!) and will live for longer than just the sprint it was written in. The following are some things I’ve discovered in my career as an engineer at big companies and small, in the USA and abroad, that seem to correlate with maintainable, “good” software.
Commandment #1: Treat Your Code the Way You Want Other’s Code to Treat You
I’m far from the first person to write that the primary audience for your code is not the compiler/computer, but whomever next has to read, understand, maintain, and enhance the code (which will not necessarily be you 6 months from now). Any engineer worth their pay can produce code that “works”; what distinguishes a superb engineer is that they can write maintainable code efficiently that supports a business long term, and have the skill to solve problems simply and in a clear and maintainable way.
In any programming language, it is possible to write good code or bad code. Assuming we judge a programming language by how well it facilitates writing good code (it should at least be one of the top criteria, anyway), any programming language can be “good” or “bad” depending on how it is used (or abused).
An example of a language that by many is considered ‘clean’ and readable is Python. The language itself enforces some level of white space discipline and the built in APIs are plentiful and fairly consistent. That said, it’s possible to create unspeakable monsters. For example, one can define a class and define/redefine/undefine any and every method on that class during runtime (often referred to as monkey patching). This technique naturally leads to at best an inconsistent API and at worst an impossible to debug monster. One might naively think,”sure, but nobody does that!” Unfortunately they do, and it doesn’t take long browsing pypi before you run into substantial (and popular!) libraries that (ab)use monkey patching extensively as the core of their APIs. I recently used a networking library whose entire API changes depending on the network state of an object. Imagine, for example, calling
client.connect() and sometimes getting a
MethodDoesNotExist error instead of
Commandment #2: Good Code Is Easily Read and Understood, in Part and in Whole
Good code is easily read and understood, in part and in whole, by others (as well as by the author in the future, trying to avoid the “Did I really write that?” syndrome).
By “in part” I mean that, if I open up some module or function in the code, I should be able to understand what it does without having to also read the entire rest of the codebase. It should be as intuitive and self-documenting as possible.
Code that constantly references minute details that affect behavior from other (seemingly irrelevant) portions of the codebase is like reading a book where you have to reference the footnotes or an appendix at the end of every sentence. You’d never get through the first page!
Some other thoughts on “local” readability:
Names matter. Activate Thinking Fast and Slow’ssystem 2 way in which the brain forms thoughts and put some actual, careful thought into variable and method names. The few extra seconds can pay significant dividends. A well-named variable can make the code much more intuitive, whereas a poorly-named variable can lead to headfakes and confusion.
Cleverness is the enemy. When using fancy techniques, paradigms, or operations (such as list comprehensions or ternary operators), be careful to use them in a way that makes your code more readable, not just shorter.
Consistency is a good thing. Consistency in style, both in terms of how you place braces but also in terms of operations, improves readability greatly.
Separation of concerns. A given project manages an innumerable number of locally important assumptions at various points in the codebase. Expose each part of the codebase to as few of those concerns as possible. Say you had a people management system where a person object may sometimes have a null last name. To somebody writing code in a page that displays person objects, that could be really awkward! And unless you maintain a handbook of “Awkward and non obvious assumptions our codebase has” (I know I don’t) your display page programmer is not going to know last names can be null and is probably going to write code with a null pointer exception in it when the last name-being null case shows up. Instead handle these cases with well thought out APIs and contracts that different pieces of your codebase use to interact with each other.
Commandment #3: Good Code Has a Well Thought-out Layout and Architecture to Make Managing State Obvious
State is the enemy. Why? Because it is the single most complex part of any application and needs to be dealt with very deliberately and thoughtfully. Common problems include database inconsistencies, partial UI updates where new data isn’t reflected everywhere, out of order operations, or just mind numbingly complex code with if statements and branches everywhere leading to difficult to read and even harder to maintain code. Putting state on a pedestal to be treated with great care, and being extremely consistent and deliberate with regard to how state is accessed and modified, dramatically simplifies your codebase. Some languages (Haskell for example) enforce this at a programmatic and syntactic level. You’d be amazed how much the clarity of your codebase can improve if you have libraries of pure functions that access no external state, and then a small surface area of stateful code which references the outside pure functionality.
Commandment #4: Good Code Doesn’t Reinvent the Wheel, it Stands on the Shoulders of Giants
Before potentially reinventing a wheel, think about how common the problem is you’re trying to solve or the function is you’re trying to perform. Somebody may have already implemented a solution you can leverage. Take the time to think about and research any such options, if appropriate and available.
That said, a completely reasonable counter-argument is that dependencies don’t come for “free” without any downside. By using a 3rd party or open source library that adds some interesting functionality, you are making the commitment to, and becoming dependent upon, that library. That’s a big commitment; if it’s a giant library and you only need a small bit of functionality do you really want the burden of updating the whole library if you upgrade, for example, to Python 3.x? And moreover, if you encounter a bug or want to enhance the functionality, you’re either dependent on the author (or vendor) to supply the fix or enhancement, or, if it’s open source, find yourself in the position of exploring a (potentially substantial) codebase you’re completely unfamiliar with trying to fix or modify an obscure bit of functionality.
Certainly the more well used the code you’re dependent upon is, the less likely you’ll have to invest time yourself into maintenance. The bottom line is that it’s worthwhile for you to do your own research and make your own evaluation of whether or not to include outside technology and how much maintenance that particular technology will add to your stack.
Below are some of the more common examples of things you should probably not be reinventing in the modern age in your project (unless these ARE your projects).
Figure out which of CAP you need for your project, then chose the database with the right properties. Database doesn’t just mean MySQL anymore, you can chose from:
- “Traditional” Schema’ed SQL: Postgres / MySQL / MariaDB / MemSQL / Amazon RDS, etc.
- Key Value Stores: Redis / Memcache / Riak
- NoSQL: MongoDB/Cassandra
- Hosted DBs: AWS RDS / DynamoDB / AppEngine Datastore
- Heavy lifting: Amazon MR / Hadoop (Hive/Pig) / Cloudera / Google Big Query
- Crazy stuff: Erlang’s Mnesia, iOS’s Core Data
Data Abstraction Layers
You should, in most circumstances, not be writing raw queries to whatever database you happen to chose to use. More likely than not, there exists a library to sit in between the DB and your application code, separating the concerns of managing concurrent database sessions and details of the schema from your main code. At the very least, you should never have raw queries or SQL inline in the middle of your application code. Rather, wrap it in a function and centralize all the functions in a file called something really obvious (e.g., “queries.py”). A line like
users = load_users() , for example, is infinitely easier to read than
users = db.query(“SELECT username, foo, bar from users LIMIT 10 ORDER BY ID”). This type of centralization also makes it much easier to have consistent style in your queries, and limits the number of places to go to change the queries should the schema change.
Other Common Libraries and Tools to Consider Leveraging
- Queuing or Pub/Sub Services. Take your pick of AMQP providers, ZeroMQ, RabbitMQ, Amazon SQS
- Storage. Amazon S3, Google Cloud Storage
- Monitoring: Graphite/Hosted Graphite, AWS Cloud Watch, New Relic
- Log Collection / Aggregation. Loggly, Splunk
- Auto Scaling. Heroku, AWS Beanstalk, AppEngine, AWS Opsworks, Digital Ocean
Commandment #5: Don’t Cross the Streams!
There are many good models for programming design, pub/sub, actors, MVC etc. Choose whichever you like best, and stick to it. Different kinds of logic dealing with different kinds of data should be physically isolated in the codebase (again, this separation of concerns concept and reducing cognitive load on the future-reader). The code which updates your UI should be physically distinct from the code that calculates what goes into the UI, for example.
Commandment #6: When Possible, Let the Computer Do the Work
If the compiler can catch logical errors in your code and prevent either bad behavior, bugs, or outright crashes, we absolutely should take advantage of that. Of course, some languages have compilers that make this easier than others. Haskell, for example, has a famously strict compiler that results in programmers spending most of their effort just getting code to compile. Once it compiles though, “it just works”. For those of you who’ve either never written in a strongly typed functional language this may seem ridiculous or impossible, but don’t take my word for it. Seriously, click on some of these links, it’s absolutely possible to live in a world without runtime errors. And it really is that magical.
Admittedly, not every language has a compiler or a syntax that lends itself to much (or in some cases any!) compile-time checking. For those that don’t, take a few minutes to research what optional strictness checks you can enable in your project and evaluate if they make sense for you. A short, non-comprehensive list of some common ones I’ve used lately for languages with lenient runtimes include:
This is by no means an exhaustive or the perfect list of commandments for producing “good” (i.e., easily maintainable) code. That said, if every codebase I ever had to pick up in the future followed even half of the concepts in this list, I will have many fewer gray hairs and might even be able to add an extra 5 years on the end of my life. And I’ll certainly find work more enjoyable and less stressful.