[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Thu May 2 10:54:46 PDT 2019

Hi Jesper,

thank you for working on this. My company (Codasip) would definitely be
interested in having this feature upstream. I think that this is actually
important for a suprisingly large number of people who currently have to
maintain their changes downstream. I have a couple of questions and
comments:

1. Do you plan on supporting truly arbitrary values as the byte size or are
there in fact going to be limitations (e.g. the value has to be a multiple
of 8 and lower or equal to 64)? I recall that we had a customer asking
about 36-bit bytes.
2. If you define a byte to be e.g. 16 bits wide, does it mean that "char"
is also 16 bits wide? If yes then how to do you define types like int8_t
from stdint.h?
3. Have you thought about the possibility to support different byte sizes
for data and code?
4. I realize that this is a separate issue but fully supporting non-8-bit
bytes requires also changes to other parts of a typical toolchain, namely
linker (ld/lld) and debugger (gdb/lldb). Do you maintain out-of-tree
changes in this area as well?

Thank you,
Pavel

On Thu, May 2, 2019 at 2:20 PM Jesper Antonsson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>    A. This RFC outlines a proposal regarding non-8-bit-byte support that
>       got positive reception at a Round Table at EuroLLVM19. The general
>       topic has been brought up several times before and one good overview
>       can be found in a FOSDEM 2017 presentation by Jones and Cook:
> https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>
> In a nutshell, the proposal is for the llvm community to
> allow/encourage interested parties to gradually remove "magic numbers",
> e.g. assumptions on the size of bytes from the codebase. Overview,
> rationale and some example refactorings follows.
>
> Overview:
>
> LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
> llvm targets that utilize bytes of other sizes, including our
> (Ericsson's) proprietary target. The main issues are the magic number 8
> and "/8" and "*8" all over the place and the use of i8 pointers.
>
> There's considerable agreement that the use of magic numbers is not
> good coding style, and removing these ones would be of particular
> benefit, even though the effort would not be complete and no in-tree
> target with tests exist to guarantee that all gains are maintained.
>
> Ericsson is willing to drive this effort. During EuroLLVM19, there
> seemed to be sufficient positive interest from other companies for us
> to expect help with reviewing patch sets. Ericsson has been performing
> nightly integration towards top-of-tree with this backend for years,
> catching and fixing new 8-bit-byte continuously. Thus we're able to
> commit to doing similar upstream fixes for the long haul in a no-drama
> way.
>
> Rationale:
>
> Benefits of moving toward a byte-size agnostic llvm include:
> * Less magic numbers in the codebase.
> * A reduced effort to maintain out-of-tree targets with non-8-bit bytes
> as contributors follow the established patterns. (One company has told
> us that they created but eventually gave up on a 16-bit byte target due
> to too-high integration burden.)
> * A reduction in duplicate efforts as some of the adaptation work would
> happen in-tree rather than in several out-of-tree targets.
> * For up-and-coming targets that have non-8-bit-byte sizes, time to
> market using llvm would be far quicker.
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
>
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more easy
> to understand. The permission to go ahead would be under the condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-time
> variable rather than a constant. However, this cost seems negligible in
> practice.
>
> Refactoring examples:
> https://reviews.llvm.org/D61432
>
> Best Regards,
> Jesper
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190502/c077685a/attachment-0001.html>