[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

Jesper Antonsson via llvm-dev llvm-dev at lists.llvm.org
Fri May 3 02:18:15 PDT 2019


On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
> Realistically, C and C++ on such targets are different languages from
> 8-bit-per-byte C and C++, and therefore code isn’t portable from one
> to the other. I intend to propose that C++23 support only 8 bits per
> byte, ditto C. I’m therefore not a fan of teaching clang about this.

On portability, the same is true for byte order and more. Also, the
standard is what it is currently and the non-8-bit byte targets do
exist. However, we don't suggest clang changes for now.

> Separately, teaching LLVM about unusual-sized bytes seems fine to me,
> if the maintenance burden is low enough and the targets are supported
> in-tree and are maintained. I agree that you can’t just plop in a
> target without support, so it makes sense to first clean things up
> and then land a target. However, I don’t think a mock target makes
> sense. I’d much rather see a real target.

I'd also much rather see a real target. Hopefully, the cleanup will
make it more likely to happen.

> Are we only talking about powers-of-two here, or “anything goes”?
> What restrictions are you proposing to impose?

We're proposing "anything goes" larger than 8 as that's what the
standards says, and as we've talked to people having non-power-of-two
architectures. Also, we feel there's no major disadvantage of going for
that. Yes, we can't use masks and shifts the same way, but we feel that
won't have a big impact. (However, our target has 16-bit bytes, so if
the community would rather see powers-of-two, we could live with that.)

> I’m really not convinced by this “magic number” argument. 8 really
> isn’t that bad to see.

Though the meaning isn't always clear, i.e. if it's handling bytes or
octets. And perhaps it doesn't have to be, for as long as you have an
8-bit byte architecture, but when you start to clean up for another
architecture, it becomes a pain and is not always obvious. Especially
not when you're mucking around with Dwarf. Also, there's "& 7", ">> 3"
as well for instance. Not that bad either (as magic numbers often
aren't in context), but if you grep for them, you often have to look a
bit extra to see if it's e.g. a flag, a byte or an octet.

Regards,
Jesper


> 
> 
> > On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> > 
> >   A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> >      got positive reception at a Round Table at EuroLLVM19. The
> > general
> >      topic has been brought up several times before and one good
> > overview
> >      can be found in a FOSDEM 2017 presentation by Jones and Cook:
> > 
https://protect2.fireeye.com/url?k=b58a506e-e9015b52-b58a10f5-86ef624f95b6-937d68ba77c32042&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> > 
> > In a nutshell, the proposal is for the llvm community to
> > allow/encourage interested parties to gradually remove "magic
> > numbers",
> > e.g. assumptions on the size of bytes from the codebase. Overview,
> > rationale and some example refactorings follows.
> > 
> > Overview:
> > 
> > LLVM currently assumes 8-bit bytes, while there exist a few out-of-
> > tree 
> > llvm targets that utilize bytes of other sizes, including our
> > (Ericsson's) proprietary target. The main issues are the magic
> > number 8
> > and "/8" and "*8" all over the place and the use of i8 pointers.
> > 
> > There's considerable agreement that the use of magic numbers is not
> > good coding style, and removing these ones would be of particular
> > benefit, even though the effort would not be complete and no in-
> > tree
> > target with tests exist to guarantee that all gains are maintained.
> > 
> > Ericsson is willing to drive this effort. During EuroLLVM19, there
> > seemed to be sufficient positive interest from other companies for
> > us
> > to expect help with reviewing patch sets. Ericsson has been
> > performing
> > nightly integration towards top-of-tree with this backend for
> > years,
> > catching and fixing new 8-bit-byte continuously. Thus we're able to
> > commit to doing similar upstream fixes for the long haul in a no-
> > drama
> > way.
> > 
> > Rationale:
> > 
> > Benefits of moving toward a byte-size agnostic llvm include:
> > * Less magic numbers in the codebase.
> > * A reduced effort to maintain out-of-tree targets with non-8-bit
> > bytes
> > as contributors follow the established patterns. (One company has
> > told
> > us that they created but eventually gave up on a 16-bit byte target
> > due
> > to too-high integration burden.)
> > * A reduction in duplicate efforts as some of the adaptation work
> > would
> > happen in-tree rather than in several out-of-tree targets.
> > * For up-and-coming targets that have non-8-bit-byte sizes, time to
> > market using llvm would be far quicker.
> > * A higher probability of LLVM being the compiler of choice for
> > such
> > targets.
> > * Eventually, as the patch set required to make llvm fully byte
> > size
> > agnostic becomes small enough, the effort to provide a mock in-tree
> > target with some other byte size should be surmountable.
> > 
> > As cons, one could see a burden for the in-tree community to
> > maintain
> > whatever gains that have been had. However the onus should be on
> > interested parties to mend any bit-rot. The impact of not having as
> > much magic numbers and such should if anything make the code more
> > easy
> > to understand. The permission to go ahead would be under the
> > condition
> > that significant added complexities are avoided. Another con would
> > be
> > added compilation time e.g. in cases where the byte size is a run-
> > time
> > variable rather than a constant. However, this cost seems
> > negligible in
> > practice.
> > 
> > Refactoring examples:
> > https://reviews.llvm.org/D61432
> > 
> > Best Regards,
> > Jesper
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> 


More information about the llvm-dev mailing list