[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Thu May 9 12:10:01 PDT 2019
On 3 May 2019, at 5:18, Jesper Antonsson via llvm-dev wrote:
> On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from
>> 8-bit-per-byte C and C++, and therefore code isn’t portable from
>> to the other. I intend to propose that C++23 support only 8 bits per
>> byte, ditto C. I’m therefore not a fan of teaching clang about
> On portability, the same is true for byte order and more. Also, the
> standard is what it is currently and the non-8-bit byte targets do
> exist. However, we don't suggest clang changes for now.
Clang already largely does not make assumptions about 8-bit bytes
outside of LLVM IR generation. I'm sure assumptions continue to
sneak in here and there, but the bulk of this work is already done
for the frontend.
>> Separately, teaching LLVM about unusual-sized bytes seems fine to me,
>> if the maintenance burden is low enough and the targets are supported
>> in-tree and are maintained. I agree that you can’t just plop in a
>> target without support, so it makes sense to first clean things up
>> and then land a target. However, I don’t think a mock target makes
>> sense. I’d much rather see a real target.
> I'd also much rather see a real target. Hopefully, the cleanup will
> make it more likely to happen.
>> Are we only talking about powers-of-two here, or “anything goes”?
>> What restrictions are you proposing to impose?
> We're proposing "anything goes" larger than 8 as that's what the
> standards says, and as we've talked to people having non-power-of-two
> architectures. Also, we feel there's no major disadvantage of going
> that. Yes, we can't use masks and shifts the same way, but we feel
> won't have a big impact. (However, our target has 16-bit bytes, so if
> the community would rather see powers-of-two, we could live with
>> I’m really not convinced by this “magic number” argument. 8
>> isn’t that bad to see.
> Though the meaning isn't always clear, i.e. if it's handling bytes or
> octets. And perhaps it doesn't have to be, for as long as you have an
> 8-bit byte architecture, but when you start to clean up for another
> architecture, it becomes a pain and is not always obvious. Especially
> not when you're mucking around with Dwarf. Also, there's "& 7", ">> 3"
> as well for instance. Not that bad either (as magic numbers often
> aren't in context), but if you grep for them, you often have to look a
> bit extra to see if it's e.g. a flag, a byte or an octet.
>>> On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> A. This RFC outlines a proposal regarding non-8-bit-byte support
>>> got positive reception at a Round Table at EuroLLVM19. The
>>> topic has been brought up several times before and one good
>>> can be found in a FOSDEM 2017 presentation by Jones and Cook:
>>> In a nutshell, the proposal is for the llvm community to
>>> allow/encourage interested parties to gradually remove "magic
>>> e.g. assumptions on the size of bytes from the codebase. Overview,
>>> rationale and some example refactorings follows.
>>> LLVM currently assumes 8-bit bytes, while there exist a few out-of-
>>> llvm targets that utilize bytes of other sizes, including our
>>> (Ericsson's) proprietary target. The main issues are the magic
>>> number 8
>>> and "/8" and "*8" all over the place and the use of i8 pointers.
>>> There's considerable agreement that the use of magic numbers is not
>>> good coding style, and removing these ones would be of particular
>>> benefit, even though the effort would not be complete and no in-
>>> target with tests exist to guarantee that all gains are maintained.
>>> Ericsson is willing to drive this effort. During EuroLLVM19, there
>>> seemed to be sufficient positive interest from other companies for
>>> to expect help with reviewing patch sets. Ericsson has been
>>> nightly integration towards top-of-tree with this backend for
>>> catching and fixing new 8-bit-byte continuously. Thus we're able to
>>> commit to doing similar upstream fixes for the long haul in a no-
>>> Benefits of moving toward a byte-size agnostic llvm include:
>>> * Less magic numbers in the codebase.
>>> * A reduced effort to maintain out-of-tree targets with non-8-bit
>>> as contributors follow the established patterns. (One company has
>>> us that they created but eventually gave up on a 16-bit byte target
>>> to too-high integration burden.)
>>> * A reduction in duplicate efforts as some of the adaptation work
>>> happen in-tree rather than in several out-of-tree targets.
>>> * For up-and-coming targets that have non-8-bit-byte sizes, time to
>>> market using llvm would be far quicker.
>>> * A higher probability of LLVM being the compiler of choice for
>>> * Eventually, as the patch set required to make llvm fully byte
>>> agnostic becomes small enough, the effort to provide a mock in-tree
>>> target with some other byte size should be surmountable.
>>> As cons, one could see a burden for the in-tree community to
>>> whatever gains that have been had. However the onus should be on
>>> interested parties to mend any bit-rot. The impact of not having as
>>> much magic numbers and such should if anything make the code more
>>> to understand. The permission to go ahead would be under the
>>> that significant added complexities are avoided. Another con would
>>> added compilation time e.g. in cases where the byte size is a run-
>>> variable rather than a constant. However, this cost seems
>>> negligible in
>>> Refactoring examples:
>>> Best Regards,
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev