[llvm-dev] RFC: On non 8-bit bytes and the target for it

Tue Oct 29 12:11:13 PDT 2019

Thanks, Chris, for supporting the idea to have non-8-bits byte in LLVM.

I want to clarify the scope and then analyze the options we have.

The scope:
1. BitsPerByte or similar variable should be introduced to data layout;
include/CodeGen/ValueTypes.h and some other generic headers also need to be
updated and probably become dependent on the data layout.
2. Magic number 8 should be replaced with BitsPerByte. We found that 8 is
used as "size of a byte in bits" in Selection DAG, asm printer, analysis
and transformation passes. Some of the passes are currently independent of
any target specific information. In downstream, we changed about ten passes
before our testing succeeded, but we might have missed some cases due to
the incompleteness of our tests.
3. &255 and other bits manipulations. We didn't catch many of that with our
downstream testing. But again, at the moment, our tests are not
sufficiently good for any claims here.
4. The concept of byte should probably be introduced to Type.h. The
assumption that Type::getInt8Ty returns type for a byte is baked into the
code generator, builtins (notably memcpy and memset) and more than ten
analysis and transformation passes.

Noteworthy to say, that these changes should apply to the upcoming patches
as well to the existing ones, and if we decide to move on, and developers
should no longer assume that byte is 8-bits wide with an exception for
target-dependent pieces of code.

The options we have.
1. Perform 1 - 4 w/o any testing in upstream. It seems a very fragile
solution to me. Without any non-8-bit target in upstream, it's unlikely
that contributors will differentiate between getInt8Ty() and getByteTy().
So I guess that after a couple of months, we'll get a mix of 8s and
BitsPerBytes in code, and none of the tests will be regressed. The remedy
is probably an active contributor from downstream who is on top of the
trunk and checks new patches against its tests daily.
2. Test with a dummy target. It might work if we have a group of
contributors who is willing to rewrite and upstream some of their
downstream tests as well as to design and implement the target itself. The
issue here might be in functional tests, so we'd probably need to implement
a dummy virtual machine to run them because lit tests are unlikely to catch
all issues from paragraphs (2) and (3) of the scope described.
3. TON labs can provide its crazy target or some lightweight version of it.
>From the testing point of view, it works similar to the second solution,
but it doesn't require any inventions. I could create a separate RFC about
the target to find out if the community thinks it's appropriate.

--
Kind regards, Dmitry.

On Sat, Oct 26, 2019 at 4:56 AM Chris Lattner <clattner at nondot.org> wrote:
>
>
>
> > On Oct 24, 2019, at 4:02 PM, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> >
> > On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> >> I’d like to understand what programming model you see programmers
using. You don’t need 257 bits per byte if you only offer 257 bit integers.
Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN
already, and your backend would legalize everything to exactly this type
and nothing else, right? Would it be sufficient to expose something like
int<unsigned Size> with Size=257 for your programming environment?
> >
> > To add to what JF says:
> >
> > Typically, a byte means some combination of:
> >
> > 1. The smallest unit that can be indexed in memory (irrelevant for you,
you have no memory).
> > 2. The smallest unit that can be stored in a register in such a way
that its representation is opaque to software (i.e. you can't tell the bit
order of a byte in a multi-byte word).  For you, it's not clear if this is
257 bits or something smaller.
> > 3. The smallest unit that is used to build complex types in software.
Since you have no memory, it's not clear that you can build structs or
arrays, and therefore this doesn't seem to apply.
> >
> > From your description of your VM, it doesn't sound as if you can
translate from any language with a vaguely C-like abstract machine, so I'm
not certain why the size of a byte actually matters to you.  LLVM IR has a
quite C-like abstract machine, and several of these features seem like they
will be problematic for you.  There is quite a limited subset of LLVM IR
that can be expressed for your VM and it would be helpful if you could
enumerate what you expect to be able to support (and why going via LLVM is
useful, given that you are unlikely to be able to take advantage of any
existing front ends, many optimisations, or most of the target-agnostic
code generator.
>
> Right.  A 257-bit target is a bit crazy, but there are lots of other
targets that only have 16-bit or 32-bit addressable memory.   I’ve heard
various people saying that they all have out-of-tree patches to support
non-8-bit-byte targets, but because there is no in-tree target that uses
them, it is very difficult to merge these patches up stream.
>
> I for one would love to see some of these patches get upstreamed.  If the
only problem is one of testing, then maybe we could make a virtual target
exist, or maybe we could accept the patches without test cases (so long as
they doesn’t break 8-bit-byte targets obviously).
>
> -Chris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191029/0229ae65/attachment.html>