[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Fri May 3 04:22:44 PDT 2019

On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:

> Hi Jesper,
> 
> thank you for working on this. My company (Codasip) would definitely
> be interested in having this feature upstream. I think that this is
> actually important for a suprisingly large number of people who
> currently have to maintain their changes downstream. I have a couple
> of questions and comments:
> 
> 1. Do you plan on supporting truly arbitrary values as the byte size
> or are there in fact going to be limitations (e.g. the value has to
> be a multiple of 8 and lower or equal to 64)? I recall that we had a
> customer asking about 36-bit bytes.

We plan on supporting arbitrary sizes with a lower limit of 8, not
necessarily power-of-two or multiples of 8. I have to admit that I
haven't thought very much about what the upper limit might be. We might
leave it up to other interested parties to explore that and if we
receive suggestions on how to generalize also in that respect, we'll
certainly consider them.

> 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> "char" is also 16 bits wide? If yes then how to do you define types
> like int8_t from stdint.h?

Yes, char is the same. The int8_t type is optional according to the
standard and we don't define it for our OOT target. The int_least8_t is
required, but we just define it to be byte sized. 

> 3. Have you thought about the possibility to support different byte
> sizes for data and code?

Not really, but I saw that Jeroen Dobbelaere just suggested supporting
memory spaces with different byte sizes.

> 4. I realize that this is a separate issue but fully supporting non-
> 8-bit bytes requires also changes to other parts of a typical
> toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> maintain out-of-tree changes in this area as well?

That's true, we do. I've also seen some community interest in those
areas, e.g. from Embecosm:
https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/

and from within Ericsson:
https://www.youtube.com/watch?v=HAqtEZmci70

Thanks,
Jesper

> Thank you,
> Pavel
> 
> On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >       got positive reception at a Round Table at EuroLLVM19. The
> > general
> >       topic has been brought up several times before and one good
> > overview
> >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree 
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev