[llvm-dev] RFC: On non 8-bit bytes and the target for it

Fri Oct 25 04:44:39 PDT 2019

Just to clarify, the VM doesn't have memory indeed, but we emulate the
memory with dictionaries (address -> value) which are native to TVM. Thus
you can work with arrays and structures in TVM.
However, access to a dictionary is very expensive in terms of gas (fee you
pay for a contract execution in the blockchain). We really don't want to
have unaligned memory access things like that. Aside from that, our "ALU"
only support 257-bit operations and handling overflows of smaller types is
an additional expense for a user. So we set sizeof(char) == sizeof(short)
== sizeof(int) == sizeof(long) == sizeof(long long) == 1 byte == 257 bits
in C. Luckily, the C spec allows it. We do not have a specification
requirement of doing so, but we found it natural from implementation and
user experience point of view.

Our goal is to allow using general-purpose languages to develop smart
contracts since we believe it was a shortcoming of Etherium to focus solely
on Solidity. That why we decided to use LLVM. As for the LLVM specification
coverage, at the moment we support operations with memory (they are
probably not well tested yet, but there is a bunch of tests on arrays) and
structures, all integer arithmetic and bitwise operations, control-flow
instruction excluding exception handling stuff and indirectbr, comparisons,
extensions and truncations (we do have smaller values than i257 that are
stored in persistent memory, where a user pays for data storage; but
persistent memory is a different story, it will likely to become a
different address space in future, but now it's only accessible through
intrinsics). We also support memcpy and memset in non-persistent memory.

As for Slices, Builders and the rest, we aren't that crazy to really
propose them being upstreamed - it's very specific to our VM. It's an
implementation detail at the moment - we did introduced these entities as
types, basically because of time pressure on the project. We want to switch
to opaque types if it's possible without losing the correctness of our
backend. If it's impossible well, we will probably start looking for a way
to change the framework so that a target could introduce it's own type, but
I really hope it won't be the case.

So the scope of the changes we'd like to introduce:
1. Getting rid of byte size assumption in LLVM and Clang (adding byte size
to data layout, removing magic number 8 (where it means size of byte) from
LLVM and Clang, introducing the notion of byte for memcpy and memset). The
C spec doesn't have this constraint, so I'm not sure that LLVM should be
more restrictive here.
2. Adding support for stack machines in the backend (generalizing
algorithms of converting register-based instruction to stack-based ones,
the generic implementation of scheduling appropriate for a stack machine
and implementation of stack-aware (i.e. configurable) reassociation). It
was discussed during BoF talk at the recent conference. We are going to
summarize the results soon.
3. The backend itself.

So basically, we believe that (1) is beneficial for Embecosm, Ericsson and
other companies that were actively involved in the previous iterations of
non-8-bits byte discussion in the past. (3) fixes the main concern of the
community: the testability of these changes. (2) benefits WebAssembly and
further stack machines implemented in LLVM.

--
Kind regards, Dmitry

On Fri, Oct 25, 2019 at 1:02 AM David Chisnall <David.Chisnall at cl.cam.ac.uk>
wrote:

> On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> > I’d like to understand what programming model you see programmers using.
> > You don’t need 257 bits per byte if you only offer 257 bit integers.
> > Rather, bytes aren’t really a thing at that point. LLVM kinda handles iN
> > already, and your backend would legalize everything to exactly this type
> > and nothing else, right? Would it be sufficient to expose something like
> > int<unsigned Size> with Size=257 for your programming environment?
>
> To add to what JF says:
>
> Typically, a byte means some combination of:
>
> 1. The smallest unit that can be indexed in memory (irrelevant for you,
> you have no memory).
> 2. The smallest unit that can be stored in a register in such a way that
> its representation is opaque to software (i.e. you can't tell the bit
> order of a byte in a multi-byte word).  For you, it's not clear if this
> is 257 bits or something smaller.
> 3. The smallest unit that is used to build complex types in software.
> Since you have no memory, it's not clear that you can build structs or
> arrays, and therefore this doesn't seem to apply.
>
>  From your description of your VM, it doesn't sound as if you can
> translate from any language with a vaguely C-like abstract machine, so
> I'm not certain why the size of a byte actually matters to you.  LLVM IR
> has a quite C-like abstract machine, and several of these features seem
> like they will be problematic for you.  There is quite a limited subset
> of LLVM IR that can be expressed for your VM and it would be helpful if
> you could enumerate what you expect to be able to support (and why going
> via LLVM is useful, given that you are unlikely to be able to take
> advantage of any existing front ends, many optimisations, or most of the
> target-agnostic code generator.
>
> David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191025/2afe6ac5/attachment.html>