[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Thu May 2 09:43:36 PDT 2019

I’m not a fan of C and C++ supporting anything but 8 bits per byte. Realistically, C and C++ on such targets are different languages from 8-bit-per-byte C and C++, and therefore code isn’t portable from one to the other. I intend to propose that C++23 support only 8 bits per byte, ditto C. I’m therefore not a fan of teaching clang about this.

Separately, teaching LLVM about unusual-sized bytes seems fine to me, if the maintenance burden is low enough and the targets are supported in-tree and are maintained. I agree that you can’t just plop in a target without support, so it makes sense to first clean things up and then land a target. However, I don’t think a mock target makes sense. I’d much rather see a real target.

Are we only talking about powers-of-two here, or “anything goes”? What restrictions are you proposing to impose?

I’m really not convinced by this “magic number” argument. 8 really isn’t that bad to see.

> On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>   A. This RFC outlines a proposal regarding non-8-bit-byte support that
>      got positive reception at a Round Table at EuroLLVM19. The general
>      topic has been brought up several times before and one good overview
>      can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> 
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
> 
> Overview:
> 
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree 
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8 pointers.
> 
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
> 
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
> 
> Rationale:
> 
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
> 
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
> 
> Refactoring examples:
> https://reviews.llvm.org/D61432
> 
> Best Regards,
> Jesper
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev