[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Fri May 3 03:33:00 PDT 2019

Hi Jesper,

We (ASIP Designer team, Synopsys) are also interested in a cleaner approach without those magic constants.

Instead of 'n-bit bytes', we tend to talk about 'word addressing' and 'addressable unit size'.

We support C/C++ for various architectures that our customers create with our tools.
Some of those have multiple memories where the addressable unit size depends on the memory (address space).

NOTE: the addressable unit size can be any value (like 10 or 41 or 2048)

We also keep the bits in 'char' separate from the addressable unit size. As such, an 8 bit char can
have a 'storage size' of 24 bits in one memory and 64 bits in another.
(but a char can also be 24 bits, or 10 or something else).

I think that a cleaner abstraction in datalayout is in general useful.

We use:
   unsigned getAddresssableUnitSize(unsigned AddressSpace=0);

which, for the main llvm, could be implemented as :
   unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; }

Greetings,

Jeroen Dobbelaere

> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Jesper
> Antonsson via llvm-dev
> Sent: Thursday, May 2, 2019 14:21
> To: llvm-dev at lists.llvm.org
> Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
> 
>    A. This RFC outlines a proposal regarding non-8-bit-byte support that
>       got positive reception at a Round Table at EuroLLVM19. The general
>       topic has been brought up several times before and one good overview
>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
> 
> Overview:
> 
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8 pointers.
> 
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
> 
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
> 
> Rationale:
> 
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
> 
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
> 
> Refactoring examples:
> https://reviews.llvm.org/D61432
> 
> Best Regards,
> Jesper