Why Code Is Hard to Read

Here's why understanding what computer source code is actually trying to accomplish at a high level is so challenging, even for experienced software developers.

Artists work with imagery and pictures. Singers work with music and songs. Writers and poets work with the arrangement of words.

Software developers work with the arrangement of computer code.

That’s what we software developers deal with in our day to day activities. Writing code.

Actually, that’s not exactly true. In fact, it’s mostly NOT true, ironically.

The actual act of writing new code, or what we software developers often refer to as “greenfield software development”, is actually quite rare.

Greenfield development is the scenario where a software programmer has the freedom to design and write software from scratch.

Most software developers pine for greenfield development opportunities.

Why?

Because greenfield software projects mean a software developer doesn’t have to worry about legacy problems.

Most of the time, a software developer doesn’t have the choice of working on brand new software projects. What usually happens is a software developer joins a team where all the initial software development projects have already taken place.

So the majority of a software developer’s day to day activities revolve around keeping an existing software application up and running, fixing existing bugs, or making additional enhancements.

I can personally count on one hand the number of greenfield opportunities I’ve had through my entire twenty-year software programming career.

So if we establish that most software developers will work with brownfield projects, it inevitably means you will need to learn how to work with source code that was written by other software developers.

Furthermore, it’s actually very likely those original developers are no longer around.

Since many software developers are responsible for code written by other developers, it makes the need to successfully comprehend the meaning of source code extremely important.

What exactly do I mean when I say “comprehend the meaning of source code”?

Well, for any of you English majors out there, there were two primary things you did for classwork and assignments. One, you read lots and lots of literature. From cover to cover.

But that was only half of the equation. Not only did you have to read literature, you were expected to come up with commentary, conclusions, and theories about the literature you just read.

In other words, once you finished reading a piece of literature, you had to begin analyzing what you just read and attempt to make higher level conclusions and assertions about WHY the author wrote the literature in the particular way it was written down.

As strange as it may sound, this same skill is just as important, perhaps even more, as a software developer.

Some of you may be confused with this … why does reading computer source code require similar kinds of comprehension skills that English majors apply when reading literature?

It boils down to the very nature of computer programming languages.

Why Knowing How to Read Code Isn't Enough

It’s necessary to first get a clear understanding of what a computer programming language is and why we use them.

A computer’s native “tongue” doesn’t resemble human languages whatsoever.

The only two things any computer on the planet today can natively understand are two numbers. 1 and 0.

It’s referred to as a “binary” language.

If I want a computer to understand my English greeting, “Hello world!”, I need to translate that phrase into the following binary language:

01001000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100 00100001

And believe it or not, the first generation computer programmers had no choice but to read and write computer programs in binary language.

I don’t know about you, but I would have been ready for the rubber room if I had to write computer code in binary language for a living.

It quickly became apparent that computer programmers wanted to communicate with computers with languages that were much easier for humans to comprehend. We humans prefer communicating with each other through human languages, not in numbers or binary language.

The earliest programming languages, while still cryptic and hard to understand in today’s computer language standards, were still hugely easier to comprehend and understand than native binary language.

And slowly but surely, computer programming language syntax evolved and improved by moving towards resembling human languages.

It is the job of the computer compiler to translate human-readable source code back down to the native binary language that a computer can understand.

I can attest to the fact that modern day computer programming languages have vastly improved in terms of readability, functionality, and power, compared to the early first generation programming languages.

As a software developer, I’ve appreciated the design that goes into modern-day programming languages. They are very expressive yet very readable. The syntax has improved so much over the year, that I’ll wager even non-programmers, if given some source code to examine, could at least make some educated guesses as to what the code is trying to do.

With that said, at the end of the day, computer source code still is not the equivalent of human languages.

As a software developer, I have the skillset and know how to read computer source code.

But just knowing how to read source code isn’t enough to be a successful software developer.

When I’m reading other people’s source code — which often happens when I work on brownfield software projects — what reading source code doesn’t do, is tell me exactly what the INTENT of the original programmer was when they originally wrote the code.

I can’t speak to the experience of other software developers who inherit their own brownfield software projects, but at least for me, I often find myself in the situation where the original software programmer is no longer around to ask questions about how the original code works.

Even worse, there is usually no supporting documentation or wikis that explain at a high-level what the source code is supposed to do, and how its general design and architecture works.

I find it absolutely crucial and necessary to understand WHY code was written in the particular way it was written and architected. I also need the 30,000-foot level view of how an application solves the particular business problem it was originally supposed to solve.

I’m actually going through this process right now with one of my current projects. I’ve been tasked to significantly refactor and enhance the functionality for one of our core applications which my team currently supports.

None of the original software programmers, architects or original business analysts/product owners are around to ask questions about the application.

Oftentimes, the only source of truth about an application is the source code itself. The source code ultimately serves as the only documentation about the program.

I’ve been slogging through the source code. I know how to READ the source code. But since I was not the original programmer, I’m not privy to any of the original design and architectural decisions that went into the implementation of the code.

So how do I go about reaching my goal of understanding the codebase?

How to Learn an Unfamiliar Codebase

The closest analogy I can think of is the way I used to solve jigsaw puzzles as a kid.

When I first start reading an unfamiliar codebase, it’s much like dumping a giant box of unsorted jigsaw puzzles onto a tabletop. There is no rhyme or reason to any of the pieces. They’re all randomly scattered all over the tabletop.

I always started by identifying and sorting all the corner and edge jigsaw pieces of the overall jigsaw puzzle. Once I identified all the major corner and edge pieces of the puzzle, I would start finding all the internal jigsaw pieces that would fill in the internal region of the puzzle.

Much in the same way I solve a jigsaw puzzle, I’m identifying the “corner and edge” pieces of the codebase.

1. I sift through any code that deals with persisting data to a back-end database or repository.

2. Then I identify and sift through code that deals with any business logic and validation rules.

3. Then finally I go through any code that deals with the visual presentation layer of the application (assuming the application is customer facing).

4. Once I get a general rough idea of where each major layer of the application is located, I start mentally “zooming in” to the specific code files that belong to each of these major layers.

5. And I build a wiki to start capturing what I think the original overall architecture and design of the application was designed around.

6. I start filling in other important details about what particular technologies, libraries, and frameworks are used in each of these layers of the application.

7. And most importantly, I start conversing with other developers and subject matter experts who I think can help me to understand any technical or business related gaps in knowledge.

Believe me, it can be very tedious and laborious. But there simply isn’t any “silver bullet” quick and easy way to learn an unfamiliar codebase. Even experienced software developers find it difficult to get up to speed with foreign code.

But hey, I had no illusions thinking I could instantly understand the dense and cryptic novels of William Faulkner either.

But hey, at least you had Cliff’s Notes for cryptic novels! Where’s my Cliff’s Notes for cryptic code?

— cue chirping crickets —

Why Code Is Hard to Read

Why Knowing How to Read Code Isn't Enough

How to Learn an Unfamiliar Codebase

Recent Posts

Topics

You May Also Like...

KPI Best Practices for Development Teams

Best Practices for Working in an Agile Culture

Managing Up: 14 Strategies for Working with Upper Management More Efficiently

How Can We Help