[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Mon May 6 22:23:27 PDT 2019


*From: *Jesper Antonsson via llvm-dev <llvm-dev at lists.llvm.org>
*Date: *Fri, May 3, 2019 at 8:23 PM
*To: *snobl at codasip.com
*Cc: *llvm-dev at lists.llvm.org

On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:
>
> > Hi Jesper,
> >
> > thank you for working on this. My company (Codasip) would definitely
> > be interested in having this feature upstream. I think that this is
> > actually important for a suprisingly large number of people who
> > currently have to maintain their changes downstream. I have a couple
> > of questions and comments:
> >
> > 1. Do you plan on supporting truly arbitrary values as the byte size
> > or are there in fact going to be limitations (e.g. the value has to
> > be a multiple of 8 and lower or equal to 64)? I recall that we had a
> > customer asking about 36-bit bytes.
>
> We plan on supporting arbitrary sizes with a lower limit of 8, not
> necessarily power-of-two or multiples of 8. I have to admit that I
> haven't thought very much about what the upper limit might be. We might
> leave it up to other interested parties to explore that and if we
> receive suggestions on how to generalize also in that respect, we'll
> certainly consider them.
>
> > 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> > "char" is also 16 bits wide? If yes then how to do you define types
> > like int8_t from stdint.h?
>
> Yes, char is the same. The int8_t type is optional according to the
> standard and we don't define it for our OOT target. The int_least8_t is
> required, but we just define it to be byte sized.
>
> > 3. Have you thought about the possibility to support different byte
> > sizes for data and code?
>
> Not really, but I saw that Jeroen Dobbelaere just suggested supporting
> memory spaces with different byte sizes.
>
> > 4. I realize that this is a separate issue but fully supporting non-
> > 8-bit bytes requires also changes to other parts of a typical
> > toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> > maintain out-of-tree changes in this area as well?
>
> That's true, we do. I've also seen some community interest in those
> areas, e.g. from Embecosm:
> https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/
>
> and from within Ericsson:
> https://www.youtube.com/watch?v=HAqtEZmci70


What are you using for the executable file format for machines whose byte
size is not 8? Looks like the ELF spec assumes that a byte is 8 bits long.


> Thanks,
> Jesper
>
>
> > Thank you,
> > Pavel
> >
> > On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> > >    A. This RFC outlines a proposal regarding non-8-bit-byte support
> > > that
> > >       got positive reception at a Round Table at EuroLLVM19. The
> > > general
> > >       topic has been brought up several times before and one good
> > > overview
> > >       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > > https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > >
> > > In a nutshell, the proposal is for the llvm community to
> > > allow/encourage interested parties to gradually remove "magic
> > > numbers",
> > > e.g. assumptions on the size of bytes from the codebase. Overview,
> > > rationale and some example refactorings follows.
> > >
> > > Overview:
> > >
> > > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > > tree
> > > llvm targets that utilize bytes of other sizes, including our
> > > (Ericsson's) proprietary target. The main issues are the magic
> > > number 8
> > > and "/8" and "*8" all over the place and the use of i8 pointers.
> > >
> > > There's considerable agreement that the use of magic numbers is not
> > > good coding style, and removing these ones would be of particular
> > > benefit, even though the effort would not be complete and no in-
> > > tree
> > > target with tests exist to guarantee that all gains are maintained.
> > >
> > > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > > seemed to be sufficient positive interest from other companies for
> > > us
> > > to expect help with reviewing patch sets. Ericsson has been
> > > performing
> > > nightly integration towards top-of-tree with this backend for
> > > years,
> > > catching and fixing new 8-bit-byte continuously. Thus we're able to
> > > commit to doing similar upstream fixes for the long haul in a no-
> > > drama
> > > way.
> > >
> > > Rationale:
> > >
> > > Benefits of moving toward a byte-size agnostic llvm include:
> > > * Less magic numbers in the codebase.
> > > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > > bytes
> > > as contributors follow the established patterns. (One company has
> > > told
> > > us that they created but eventually gave up on a 16-bit byte target
> > > due
> > > to too-high integration burden.)
> > > * A reduction in duplicate efforts as some of the adaptation work
> > > would
> > > happen in-tree rather than in several out-of-tree targets.
> > > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > > market using llvm would be far quicker.
> > > * A higher probability of LLVM being the compiler of choice for
> > > such
> > > targets.
> > > * Eventually, as the patch set required to make llvm fully byte
> > > size
> > > agnostic becomes small enough, the effort to provide a mock in-tree
> > > target with some other byte size should be surmountable.
> > >
> > > As cons, one could see a burden for the in-tree community to
> > > maintain
> > > whatever gains that have been had. However the onus should be on
> > > interested parties to mend any bit-rot. The impact of not having as
> > > much magic numbers and such should if anything make the code more
> > > easy
> > > to understand. The permission to go ahead would be under the
> > > condition
> > > that significant added complexities are avoided. Another con would
> > > be
> > > added compilation time e.g. in cases where the byte size is a run-
> > > time
> > > variable rather than a constant. However, this cost seems
> > > negligible in
> > > practice.
> > >
> > > Refactoring examples:
> > > https://reviews.llvm.org/D61432
> > >
> > > Best Regards,
> > > Jesper
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190507/7a02c714/attachment-0001.html>


More information about the llvm-dev mailing list