Since the beginning of my professional career, I have consistently been using the same commenting style in my code, which turns out to be quite distinctive. I can trace that style back to Alsys SA. There is a really good reason I still follow that commenting style to this day…

Rule 0: Base rules on actual experiments

Alsys SA was a company providing the best Ada compilers in their day, and the only ones that were themselves written in Ada.

The company had a rather extensive coding style. I’ve reduced it quite a bit in my own projects. But as far as usage of comments are concerned, my style today remains inspired by the Alsys coding guidelines, and for a good reason : it’s the only time I was ever presented with a coding style that was backed by some actual scientific evidence.

The way I remember it, someone had actually studied how people read code using eye-tracking cameras. Alsys derived key coding style guidelines from that study. The rules were intended to minimize eye movement, and make it easy for the eye to quickly locate important pieces of the code. In short, the coding standard was designed to make reading easier and faster based on actual scientific data. And that matters, because code is read much more often than it is written.

I will be trying to reconstruct what I learned then based on memory and personal experience. When I quote something as being from Alsys, I may be wrong: it’s been 27 years or so since I was taught these tricks, and the boundaries of what I inherited and what I chose to do myself is a bit hazy… Also, I do not have access to eye tracking software or to the original study, so my explanations below are nothing more than a reconstruction based on personal experience reading code.

Rule 1: Visual “boxes” to scan the code

The first observation was that the eye first scans for large-scale structures. While most programming languages have “structures”, they provide only minute evidence on screen that the eye can look for. So it is a good idea to supplement it with visual cues that indicate where the high-level structures are.

Below is an example at reduced resolution of a capture of the source code for a program using the “boxes” approach on the left, and of a program not using it on the right.

Use boxes to delineate the structure of your code

The boxes on the left make it easier to quickly scan the code looking for high-level code structures, using highly visible visual cues. The eye movement is made much easier and faster as a result. It may look something like this:

The image above is not actual eye tracking data, unfortunately, only a “feeling” of how I feel my eyes moving while I’m looking at both code examples. Notice how the visual cues provided by the boxes make it easier for the eye to follow the code organisation, and avoid back and forth exploratory movements.

The code structure highlighted by the boxes do not need to be fixed at all, and indeed should be adapted to each specific project. Such boxes can in particular be used to delineate some organisational element that the source code itself will not provide. For example, you may group the types together, the helper macros together, etc. And then someone scanning the code who is looking for a data type can quickly zoom to the “types” section.

There are a number of arguments against comments in code. These arguments do not, in general, apply to boxes. A high-level structure comment like “Types” never gets outdated and never turns into a bad code smell.

Distinct visual appearance for distinct structure levels

It is easy and necessary to create multiple visual appearances for boxes, for example by using different surrounding characters giving the box its texture, or changing the number of source code lines the box uses.

Distinctive box sizes and appearances can be used to highlight different levels of the structure of your source code. Think of them as the code equivalents of “Heading 1“, “Heading 2” and “Heading 3” in a word processor. Just like you want the Heading 1 to stand out visually from Heading 2, and Heading 2 to stand out visually from Heading 3, in your code, you want higher-level comments to have a visually distinctive shape.

Boxes in my own code

In my own code, I typically use three levels of boxes: headers, organisation and program structures. I use specific Emacs macros to automatically generate each type of box.

Headers describe a source file in details. They are large-ish boxes (about 20 lines), using the asterisk (*) as a delimiting character. Their layout nowadays includes three sub-boxes, although this layout has evolved quite a bit over time, and so is not very consistent across my various source files:

  1. The top sub-box contains the file name and project name.
  2. The largest sub-box is a high-level description of what the file is about.
  3. The bottom sub-box contains copyright, license and revision information.

The header may optionally be followed by a detailed comment explaining how the file operates. This is only useful if such explanation is not easily derived from what the file contains. Below is an example from the ELFE / XL parser, which has such a long-winded comment because XL parsing is quite unusual compared to the other programming languages. Notice the difference between the what section and the how section.

Header

 

Organisation comment boxes are 5 lines high and use an equal sign (=) as delimiting character. They are used to identify high-level sections in the source code, e.g. to group items of the same kind, or things that belong to the same functional group. Out of the 5 lines, only one contains text, and this one-liner describes what the section is about. It may look something like this:

Category Comment

If detailed comments are required on how the code operates, it will go below the box, just like it goes below the header box.

Program structure comment boxes are 3 lines high and use a minus sign (-) as delimiting character. They are used to identify regular top-level program structures, such as functions or classes. Out of the three lines, only one contains text, and as for organisation boxes, this one-liner describes what the section is about. It may look something like this:

Program Structures

As usual, if detailed comments about the program structure are necessary, such as information on how the structure operates, they should go below the box.

You may notice that the box is at a location in the code that seasoned programmers may find a bit puzzling, namely between interface for the program structure and its implementation. In other words, the code looks like this:

comment-below-function

and specifically not like this (despite this layout being much more common):

comment-above-function

There is a reason for that, and it’s the subject of the next rule.

Rule 2: Layout code to accelerate eye scanning

The whole point of the comment boxes and of the general layout is to make it easier and faster to read code. As a result, it is useful to add even more visual cues that makes reading code faster.

Problems with traditional commenting styles

A typical comment layout uses a large comment above the function, with no real guidelines regarding what’s actually inside the comment.

This makes reading the code somewhat difficult, because the relevant information is dispersed and hard to locate. While you are looking for information, your eye is going around in some kind of roundabout movement, going back and forth without much to hang on. It is not until you have actually read and processed the comment that you figure out where your eye should go next.

scanning-regular-c-style.jpeg

In the above example, the interface information is in the comment that is above it, so based on my personal experience, the eye goes first to the function name, using visual cues such as color. Then the eye moves up searching for some description, which is often long-winded and hard to grasp. Then it moves back down to find other aspects of the interface such as function parameters. Finally, it moves back up again, this time searching for more details in the comment.

Things can even get worse in case where the closest and largest comment has no relationship with the function you are looking at. In that case, the same strategy will fail. You explore the comment at the top, only to realise that it talks about some other function. So you go back down searching for the next interface, to verify that this is what the comment is talking about. Then you search downwards to find a matching identifier, and from there the comment. This gives a long and convoluted search path, where several of the steps actually take some analytical effort.

scanning-across-functions.jpeg

Separate interface and implementation

This is the reason why in the Alsys commenting style, the program structure comment box lays between the interface and implementation of a code structure. This facilitates scanning. By putting a visually identifiable one-liner between the interface and the implementation, you can now use different strategies depending on what you are searching for. Simply put, your eye has an easily recognisable visual marker to use as a starting point for the most common search cases.

Searching for interfaces: If you are interested in the interfaces. e.g. because you are looking at function prototypes in order to know what the arguments are, your eye would go like this:

scanning-interface.jpeg

Searching for descriptions: If you are interested in what each structure does, then you can use the one-liners as follows:

scanning-description.jpeg

Looking for implementations: Finally, if you are looking for a specific function in order to modify it, your eye might look at the function internal structure like this:

scanning-details.jpeg

In each case, it is quite easy for the eye to find what it’s looking for with minimum effort.

Align the code

Another way to visually facilitate code scanning is to visually align the code. Emacs has a very handy function for this called align, and it’s only a matter of binding it to the right key. This is particularly useful if there is a large number of variable declarations:

align.png

The benefits of doing things that way is that you normally scan variable names more than variable types, and if you align them… they all line up (duh!), which makes it easier. For the very same reason, associated comments if any should also align:

align-comments.png

Use consistent spacing between structures

The spacing between program structures should also be kept consistent. In my own code, I use two empty lines between similar structures (e.g. between two functions), and three empty lines before a section comment.

Rule 3: Understand the rules

Like all rules, the Alsys commenting guidelines are only here to help us, not to constrain us. If you decide to use them, you need to really understand the rationale so that you follow them appropriately.

One-liners are exactly one line long…

This may sound obvious. Yet, experimentally, keeping one-liners has been one of the rules that is most often violated by contributors new to these guidelines. Often, well-intentioned programmers will think that two lines are better than one. This is not true.

There are several reasons for sticking to one-liners:

  • Visual consistency is greatly enhanced if all boxes have the same shape and size. So it matters that the comment part has the same size. As a matter of fact, you may have noticed that headers have a larger box to accommodate multiple lines of comment, but I still try to keep a consistent size across files.
  • If you can’t tell what a program structure does in one line (about half a tweet), then you may have an issue that needs addressing. Maybe your function does too much, or maybe you are not explaining what it does but something else, like how it does it (see below).
  • Shorter descriptions are less likely to become stale than longer ones.

If you need additional space to express yourself, there is plenty of it below the one-liner.

Do not mix how and what

Often, developers who have trouble sticking with one-liners also have trouble sticking with the what, and instead describe the how. Contrast these two candidate descriptions for the standard C strlen function:

  1. Compute the length of a string
  2. Skip one character forward until you reach a null character, and count the characters that have been skipped.

Clearly, the second description is longer, yet it gives less information about what the function does to a reader interested in the interface. It may be relevant information below the one-liner box, however:

size_t strlen(const char *str)
// -----------------------------------------------------------
//    Compute the length of a string
// -----------------------------------------------------------
//    Since C strings are null-terminated, this function
//    skips forward in the string until it reaches a null,
//    and then returns the number of non-null characters
//    skipped.
{
    ...
}

Do not hesitate to be redundant

Another things that may seem a bit surprising is that with this commenting scheme, it is OK to be redundant if that is necessary to provide the required visual cues. For example, the following is considered better than no comment at all:

Foo::Foo()
// -----------------------------------------------------------
//    Constructor for class Foo
// -----------------------------------------------------------

And there may not be much more to say about a constructor.

Summary

The commenting style I use, derived from the Alsys coding guidelines, provides a number of visual cues that help the eye quickly locate interesting sections of the code. It features three kinds of comment blocks, used for file headers, code sections and program structures, as well as abundant use of whitespace to align the code and provide additional visual cues.

Following these guidelines requires very little additional effort, in particular with the help of configurable tools such as Emacs. Yet this shows consideration for the readers of your code, including yourself.

Advertisements