[LLVMdev] Stricter adherence to coding standards in LLVM?

Fri Feb 11 16:17:55 PST 2011

I notice that there's a lot of code in LLVM core libraries that is wildly
inconsistent in formatting. Two particular things come to mind:

-- Capitalization of method names. The LLVM guidelines say one thing, but a
lot of recent code (such as IRBuilder) consistently does the opposite of
what the guidelines say. One has to wonder which source of truth is more
authoritative - after all, the docs could always be out of date.

-- Whitespace around * and & in variable declarations. I see some files that
consistently put the space after, some files put it before, while some files
use one convention for parameters, and a different convention for member
variables and local variables. Within a single file, the usage is pretty
consistent, so you know that the people who wrote the code were really
trying to adhere to a standard.

Now, common wisdom for open source projects is that inconsistent legacy
formatting is left alone, and new code adheres to whatever style is used in
nearby code. Otherwise it creates difficulty for people who want to compare
revisions in the source control. Occasionally there is a need to diff
revisions that are widely spaced in time, and a global reformatting makes
this hard if the reformatting change falls somewhere within the timespan of
interest. The downside of this approach is that you're stuck with the
inconsistent formatting forever.

In my own open source projects, I take the opposite approach, which is to
nip any inconsistencies in the bud as early and aggressively as possible, so
that they won't cause problems later on. A person who needs to compare two
revisions is far more likely to be interested in recent revisions than
really old ones, so it's better to do large changes sooner rather than later
- the farther in the past we can push back the reformatting the better.

There is also the problem of backwards compatibility with existing users of
the code. Obviously it's bad form to break user code for trivial stylistic
reasons. However, if there is some other API change with broad impact that's
going in, it might make sense to fix the names at the same time, given that
folks are going to have to update their code anyway. (Although that's
unlikely with a codebase as mature as LLVM). Another approach is to
deprecate the old API for a given class entirely, and replace it with some
newer, shinier version that just happens to conform to the letter of the
conventions, so that people can continue to use the old class until they get
around to upgrading.

In the case of whitespace around * and &, that is semantically neutral, and
the impact on the diffs are small, so a global fix to these would incur less
pain overall. The main danger is that the discussion of proper style is
likely to turn into a major religious war, making it not worth the effort to
even raise the topic.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110211/f2e45dda/attachment.html>