[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Fri May 3 07:51:57 PDT 2019

Hi Jeroen,

Thanks, these are interesting differences. The CHAR_BIT and byte
relation is established in the C standard and I would prefer the byte
terminology. It means the same thing as addressable unit but is a bit
shorter and probably more widely known.

Do you suggest any in-tree changes at some call sites using the
suggested AdressSpace parameter or would we rely on the default value
always? I could definitely include this, but if the parameter is never
used, perhaps we can leave it out-of-tree for now?

Best Regards,
Jesper

On Fri, 2019-05-03 at 10:33 +0000, Jeroen Dobbelaere wrote:
> Hi Jesper,
> 
> We (ASIP Designer team, Synopsys) are also interested in a cleaner
> approach without those magic constants.
> 
> Instead of 'n-bit bytes', we tend to talk about 'word addressing' and
> 'addressable unit size'.
> 
> We support C/C++ for various architectures that our customers create
> with our tools.
> Some of those have multiple memories where the addressable unit size
> depends on the memory (address space).
> 
> NOTE: the addressable unit size can be any value (like 10 or 41 or
> 2048)
>  
> We also keep the bits in 'char' separate from the addressable unit
> size. As such, an 8 bit char can
> have a 'storage size' of 24 bits in one memory and 64 bits in
> another.
> (but a char can also be 24 bits, or 10 or something else).
> 
> I think that a cleaner abstraction in datalayout is in general
> useful.
> 
> We use:
>    unsigned getAddresssableUnitSize(unsigned AddressSpace=0);
> 
> which, for the main llvm, could be implemented as :
>    unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; }
> 
> Greetings,
> 
> Jeroen Dobbelaere
> 
> 
> > -----Original Message-----
> > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
> > Jesper
> > Antonsson via llvm-dev
> > Sent: Thursday, May 2, 2019 14:21
> > To: llvm-dev at lists.llvm.org
> > Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit
> > bytes
> > 
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > 
https://protect2.fireeye.com/url?k=2d1c7421-7195ae31-2d1c34ba-0cc47ad93e2a-4cae3fa5d066a45a&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> 
>