[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Jesper Antonsson via llvm-dev llvm-dev at lists.llvm.org
Wed May 8 00:52:04 PDT 2019


On Tue, 2019-05-07 at 14:23 +0900, Rui Ueyama wrote:
> From: Jesper Antonsson via llvm-dev <llvm-dev at lists.llvm.org>
> Date: Fri, May 3, 2019 at 8:23 PM
> To: snobl at codasip.com
> Cc: llvm-dev at lists.llvm.org
> 
> > On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:
> > 
> > > Hi Jesper,
> > > 
> > > thank you for working on this. My company (Codasip) would
> > definitely
> > > be interested in having this feature upstream. I think that this
> > is
> > > actually important for a suprisingly large number of people who
> > > currently have to maintain their changes downstream. I have a
> > couple
> > > of questions and comments:
> > > 
> > > 1. Do you plan on supporting truly arbitrary values as the byte
> > size
> > > or are there in fact going to be limitations (e.g. the value has
> > to
> > > be a multiple of 8 and lower or equal to 64)? I recall that we
> > had a
> > > customer asking about 36-bit bytes.
> > 
> > We plan on supporting arbitrary sizes with a lower limit of 8, not
> > necessarily power-of-two or multiples of 8. I have to admit that I
> > haven't thought very much about what the upper limit might be. We
> > might
> > leave it up to other interested parties to explore that and if we
> > receive suggestions on how to generalize also in that respect,
> > we'll
> > certainly consider them.
> > 
> > > 2. If you define a byte to be e.g. 16 bits wide, does it mean
> > that
> > > "char" is also 16 bits wide? If yes then how to do you define
> > types
> > > like int8_t from stdint.h?
> > 
> > Yes, char is the same. The int8_t type is optional according to the
> > standard and we don't define it for our OOT target. The
> > int_least8_t is
> > required, but we just define it to be byte sized. 
> > 
> > > 3. Have you thought about the possibility to support different
> > byte
> > > sizes for data and code?
> > 
> > Not really, but I saw that Jeroen Dobbelaere just suggested
> > supporting
> > memory spaces with different byte sizes.
> > 
> > > 4. I realize that this is a separate issue but fully supporting
> > non-
> > > 8-bit bytes requires also changes to other parts of a typical
> > > toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> > > maintain out-of-tree changes in this area as well?
> > 
> > That's true, we do. I've also seen some community interest in those
> > areas, e.g. from Embecosm:
> > https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/
> > 
> > and from within Ericsson:
> > https://www.youtube.com/watch?v=HAqtEZmci70
> 
> What are you using for the executable file format for machines whose
> byte size is not 8? Looks like the ELF spec assumes that a byte is 8
> bits long.

We use ELF. Architectures can have a different byte-size to the on-disk 
representation in ELF/DWARF, and the ELF/DWARF specs are not good at
differentiating between octets and bytes. Thus it's probably easier to
keep ELF/DWARF in the 8-bit byte world and we have to convert from
machine byte width to 8-bit bytes/octets at some point. This might be
one additional reason to use the "addressable unit" terminology
instead.

> 
> > Thanks,
> > Jesper
> > 
> > 
> > > Thank you,
> > > Pavel
> > > 
> > > On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> > > llvm-dev at lists.llvm.org> wrote:
> > > >    A. This RFC outlines a proposal regarding non-8-bit-byte
> > support
> > > > that
> > > >       got positive reception at a Round Table at EuroLLVM19.
> > The
> > > > general
> > > >       topic has been brought up several times before and one
> > good
> > > > overview
> > > >       can be found in a FOSDEM 2017 presentation by Jones and
> > Cook:
> > > > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > > > 
> > > > In a nutshell, the proposal is for the llvm community to
> > > > allow/encourage interested parties to gradually remove "magic
> > > > numbers",
> > > > e.g. assumptions on the size of bytes from the codebase.
> > Overview,
> > > > rationale and some example refactorings follows.
> > > > 
> > > > Overview:
> > > > 
> > > > LLVM currently assumes 8-bit bytes, while there exist a few
> > out-of-
> > > > tree 
> > > > llvm targets that utilize bytes of other sizes, including our
> > > > (Ericsson's) proprietary target. The main issues are the magic
> > > > number 8
> > > > and "/8" and "*8" all over the place and the use of i8
> > pointers.
> > > > 
> > > > There's considerable agreement that the use of magic numbers is
> > not
> > > > good coding style, and removing these ones would be of
> > particular
> > > > benefit, even though the effort would not be complete and no
> > in-
> > > > tree
> > > > target with tests exist to guarantee that all gains are
> > maintained.
> > > > 
> > > > Ericsson is willing to drive this effort. During EuroLLVM19,
> > there
> > > > seemed to be sufficient positive interest from other companies
> > for
> > > > us
> > > > to expect help with reviewing patch sets. Ericsson has been
> > > > performing
> > > > nightly integration towards top-of-tree with this backend for
> > > > years,
> > > > catching and fixing new 8-bit-byte continuously. Thus we're
> > able to
> > > > commit to doing similar upstream fixes for the long haul in a
> > no-
> > > > drama
> > > > way.
> > > > 
> > > > Rationale:
> > > > 
> > > > Benefits of moving toward a byte-size agnostic llvm include:
> > > > * Less magic numbers in the codebase.
> > > > * A reduced effort to maintain out-of-tree targets with non-8-
> > bit
> > > > bytes
> > > > as contributors follow the established patterns. (One company
> > has
> > > > told
> > > > us that they created but eventually gave up on a 16-bit byte
> > target
> > > > due
> > > > to too-high integration burden.)
> > > > * A reduction in duplicate efforts as some of the adaptation
> > work
> > > > would
> > > > happen in-tree rather than in several out-of-tree targets.
> > > > * For up-and-coming targets that have non-8-bit-byte sizes,
> > time to
> > > > market using llvm would be far quicker.
> > > > * A higher probability of LLVM being the compiler of choice for
> > > > such
> > > > targets.
> > > > * Eventually, as the patch set required to make llvm fully byte
> > > > size
> > > > agnostic becomes small enough, the effort to provide a mock in-
> > tree
> > > > target with some other byte size should be surmountable.
> > > > 
> > > > As cons, one could see a burden for the in-tree community to
> > > > maintain
> > > > whatever gains that have been had. However the onus should be
> > on
> > > > interested parties to mend any bit-rot. The impact of not
> > having as
> > > > much magic numbers and such should if anything make the code
> > more
> > > > easy
> > > > to understand. The permission to go ahead would be under the
> > > > condition
> > > > that significant added complexities are avoided. Another con
> > would
> > > > be
> > > > added compilation time e.g. in cases where the byte size is a
> > run-
> > > > time
> > > > variable rather than a constant. However, this cost seems
> > > > negligible in
> > > > practice.
> > > > 
> > > > Refactoring examples:
> > > > https://reviews.llvm.org/D61432
> > > > 
> > > > Best Regards,
> > > > Jesper
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list