[llvm-dev] RFC: On non 8-bit bytes and the target for it

Thu Oct 31 04:17:37 PDT 2019

David, just to clarify a misconception I might have introduced, we do not
have linear memory in the sense that all data is stored as a trie. We do
support arrays, structures and GEPs, however, as well as all relevant
features in C by modeling memory.

So regarding concepts of byte, all 5 statements you gave are true for our
target. Either due to the specification or because of performance (gas
consumption) issues. But if there are architectures that need less from the
notion of byte, we should try to figure out the common denominator. It's
probably ok to be less restrictive about a byte.

--
Kind regards, Dmitry

On Wed, Oct 30, 2019 at 5:19 PM David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 30/10/2019 10:07, Jeroen Dobbelaere via llvm-dev wrote:
> > We (Synopsys ASIP Designer team) and our customers tend to disagree: our
> customers do create plenty of cpu architectures
> > with non-8-bit characters (and non-8-bit addressable memories). We are
> able to provide them with a working c/c++ compiler solution.
> > Maybe some support libraries are not supported out of the box, but for
> these kind of architectures that is acceptable.
> > (Besides that, llvm is also more than just c/c++)
>
> My main concern in this discussion is that we're conflating several
> concepts of a 'byte':
>
>   - The smallest unit that can be loaded / stored at a time.
>
>   - The smallest unit that can be addressed with a raw pointer in a
> specific address space.
>
>   - The largest unit whose encoding is opaque to anything above the ISA.
>
>   - The type used to represent `char` in C.
>
>   - The type that has a size that all other types are a multiple of.
>
> In POSIX C (which imposes some extra constraints not found in ISO C),
> when lowered to LLVM IR, all of these are the same type:
>
>   - Loads and stores of values smaller than i8 or not a multiple of i8
> may be widened to a multiple of i8.  Bitfield fields that are smaller
> than i8 must use i8 or wider operations and masking.
>
>   - GEP indexes are not well defined for anything that is not a multiple
> of i8.
>
>   - There is no defined bit order of i8 (or bit order for larger types,
> only an assumption that, for example, i32 is 4 i8s in a specific order
> specified by the data layout).
>
>   - char is lowered to i8.
>
>   - All ABI-visible types have a size that is a multiple of 8 bits.
>
> It's not clear to me that saying 'a byte is 257 bits' means changing all
> of these to 257 or changing only some of them to 257 (which?).  For
> example, when compiling C for 16-byte-addressible historic
> architectures, typically:
>
>   - char is 8 bytes.
>
>   - char* and void* is represented as a pointer plus a 1-bit offset
> (sometimes encoded in the low bit, so the load / store sequence is a
> right shift one, a load, and then a mask or mask and shift depending on
> the low bit).
>
>   - Other pointer types are 16-bit aligned.
>
> IBM's 36-bit word machines use a broadly similar strategy, though with
> some important differences and I would imagine that most Synopsis cores
> are going to use some variation on this approach.
>
> This probably involves a quite different design to a model with 257-bit
> registers, but most of the concerns don't exist if you don't have memory
> that can store byte arrays and so involve very different design decisions.
>
> TL;DR: A proposal for supporting non-8-bit bytes needs to explain what
> their expected lowerings are and what they mean by a byte.
>
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/97bc5fa3/attachment.html>