[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Thu May 9 12:10:01 PDT 2019

On 3 May 2019, at 5:18, Jesper Antonsson via llvm-dev wrote:

> On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from
>> 8-bit-per-byte C and C++, and therefore code isn’t portable from 
>> one
>> to the other. I intend to propose that C++23 support only 8 bits per
>> byte, ditto C. I’m therefore not a fan of teaching clang about 
>> this.
>
> On portability, the same is true for byte order and more. Also, the
> standard is what it is currently and the non-8-bit byte targets do
> exist. However, we don't suggest clang changes for now.

Clang already largely does not make assumptions about 8-bit bytes
outside of LLVM IR generation.  I'm sure assumptions continue to
sneak in here and there, but the bulk of this work is already done
for the frontend.

John.

>
>> Separately, teaching LLVM about unusual-sized bytes seems fine to me,
>> if the maintenance burden is low enough and the targets are supported
>> in-tree and are maintained. I agree that you can’t just plop in a
>> target without support, so it makes sense to first clean things up
>> and then land a target. However, I don’t think a mock target makes
>> sense. I’d much rather see a real target.
>
> I'd also much rather see a real target. Hopefully, the cleanup will
> make it more likely to happen.
>
>> Are we only talking about powers-of-two here, or “anything goes”?
>> What restrictions are you proposing to impose?
>
> We're proposing "anything goes" larger than 8 as that's what the
> standards says, and as we've talked to people having non-power-of-two
> architectures. Also, we feel there's no major disadvantage of going 
> for
> that. Yes, we can't use masks and shifts the same way, but we feel 
> that
> won't have a big impact. (However, our target has 16-bit bytes, so if
> the community would rather see powers-of-two, we could live with 
> that.)
>
>> I’m really not convinced by this “magic number” argument. 8 
>> really
>> isn’t that bad to see.
>
> Though the meaning isn't always clear, i.e. if it's handling bytes or
> octets. And perhaps it doesn't have to be, for as long as you have an
> 8-bit byte architecture, but when you start to clean up for another
> architecture, it becomes a pain and is not always obvious. Especially
> not when you're mucking around with Dwarf. Also, there's "& 7", ">> 3"
> as well for instance. Not that bad either (as magic numbers often
> aren't in context), but if you grep for them, you often have to look a
> bit extra to see if it's e.g. a flag, a byte or an octet.
>
> Regards,
> Jesper
>
>
>>
>>
>>> On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>   A. This RFC outlines a proposal regarding non-8-bit-byte support
>>> that
>>>      got positive reception at a Round Table at EuroLLVM19. The
>>> general
>>>      topic has been brought up several times before and one good
>>> overview
>>>      can be found in a FOSDEM 2017 presentation by Jones and Cook:
>>>
> https://protect2.fireeye.com/url?k=b58a506e-e9015b52-b58a10f5-86ef624f95b6-937d68ba77c32042&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
>>>
>>> In a nutshell, the proposal is for the llvm community to
>>> allow/encourage interested parties to gradually remove "magic
>>> numbers",
>>> e.g. assumptions on the size of bytes from the codebase. Overview,
>>> rationale and some example refactorings follows.
>>>
>>> Overview:
>>>
>>> LLVM currently assumes 8-bit bytes, while there exist a few out-of-
>>> tree
>>> llvm targets that utilize bytes of other sizes, including our
>>> (Ericsson's) proprietary target. The main issues are the magic
>>> number 8
>>> and "/8" and "*8" all over the place and the use of i8 pointers.
>>>
>>> There's considerable agreement that the use of magic numbers is not
>>> good coding style, and removing these ones would be of particular
>>> benefit, even though the effort would not be complete and no in-
>>> tree
>>> target with tests exist to guarantee that all gains are maintained.
>>>
>>> Ericsson is willing to drive this effort. During EuroLLVM19, there
>>> seemed to be sufficient positive interest from other companies for
>>> us
>>> to expect help with reviewing patch sets. Ericsson has been
>>> performing
>>> nightly integration towards top-of-tree with this backend for
>>> years,
>>> catching and fixing new 8-bit-byte continuously. Thus we're able to
>>> commit to doing similar upstream fixes for the long haul in a no-
>>> drama
>>> way.
>>>
>>> Rationale:
>>>
>>> Benefits of moving toward a byte-size agnostic llvm include:
>>> * Less magic numbers in the codebase.
>>> * A reduced effort to maintain out-of-tree targets with non-8-bit
>>> bytes
>>> as contributors follow the established patterns. (One company has
>>> told
>>> us that they created but eventually gave up on a 16-bit byte target
>>> due
>>> to too-high integration burden.)
>>> * A reduction in duplicate efforts as some of the adaptation work
>>> would
>>> happen in-tree rather than in several out-of-tree targets.
>>> * For up-and-coming targets that have non-8-bit-byte sizes, time to
>>> market using llvm would be far quicker.
>>> * A higher probability of LLVM being the compiler of choice for
>>> such
>>> targets.
>>> * Eventually, as the patch set required to make llvm fully byte
>>> size
>>> agnostic becomes small enough, the effort to provide a mock in-tree
>>> target with some other byte size should be surmountable.
>>>
>>> As cons, one could see a burden for the in-tree community to
>>> maintain
>>> whatever gains that have been had. However the onus should be on
>>> interested parties to mend any bit-rot. The impact of not having as
>>> much magic numbers and such should if anything make the code more
>>> easy
>>> to understand. The permission to go ahead would be under the
>>> condition
>>> that significant added complexities are avoided. Another con would
>>> be
>>> added compilation time e.g. in cases where the byte size is a run-
>>> time
>>> variable rather than a constant. However, this cost seems
>>> negligible in
>>> practice.
>>>
>>> Refactoring examples:
>>> https://reviews.llvm.org/D61432
>>>
>>> Best Regards,
>>> Jesper
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev