[llvm-dev] RFC: On non 8-bit bytes and the target for it

Fri Oct 25 08:46:37 PDT 2019

Hi Dmitriy,

I can confirm that Ericsson remains interested in the byte-size issue.
We would be more than happy to contribute/collaborate on patches,
suggestions and reviews in that area, should your upstreaming effort
win community approval.

Best regards, Jesper

On Fri, 2019-10-25 at 13:44 +0200, Dmitriy Borisenkov via llvm-dev
wrote:
> Just to clarify, the VM doesn't have memory indeed, but we emulate
> the memory with dictionaries (address -> value) which are native to
> TVM. Thus you can work with arrays and structures in TVM.
> However, access to a dictionary is very expensive in terms of gas
> (fee you pay for a contract execution in the blockchain). We really
> don't want to have unaligned memory access things like that. Aside
> from that, our "ALU" only support 257-bit operations and handling
> overflows of smaller types is an additional expense for a user. So we
> set sizeof(char) == sizeof(short) == sizeof(int) == sizeof(long) ==
> sizeof(long long) == 1 byte == 257 bits in C. Luckily, the C spec
> allows it. We do not have a specification requirement of doing so,
> but we found it natural from implementation and user experience point
> of view.
> 
> Our goal is to allow using general-purpose languages to develop smart
> contracts since we believe it was a shortcoming of Etherium to focus
> solely on Solidity. That why we decided to use LLVM. As for the LLVM
> specification coverage, at the moment we support operations with
> memory (they are probably not well tested yet, but there is a bunch
> of tests on arrays) and structures, all integer arithmetic and
> bitwise operations, control-flow instruction excluding exception
> handling stuff and indirectbr, comparisons, extensions and
> truncations (we do have smaller values than i257 that are stored in
> persistent memory, where a user pays for data storage; but persistent
> memory is a different story, it will likely to become a different
> address space in future, but now it's only accessible through
> intrinsics). We also support memcpy and memset in non-persistent
> memory.
> 
> As for Slices, Builders and the rest, we aren't that crazy to really
> propose them being upstreamed - it's very specific to our VM. It's an
> implementation detail at the moment - we did introduced these
> entities as types, basically because of time pressure on the project.
> We want to switch to opaque types if it's possible without losing the
> correctness of our backend. If it's impossible well, we will probably
> start looking for a way to change the framework so that a target
> could introduce it's own type, but I really hope it won't be the
> case.
> 
> So the scope of the changes we'd like to introduce:
> 1. Getting rid of byte size assumption in LLVM and Clang (adding byte
> size to data layout, removing magic number 8 (where it means size of
> byte) from LLVM and Clang, introducing the notion of byte for memcpy
> and memset). The C spec doesn't have this constraint, so I'm not sure
> that LLVM should be more restrictive here.
> 2. Adding support for stack machines in the backend (generalizing
> algorithms of converting register-based instruction to stack-based
> ones, the generic implementation of scheduling appropriate for a
> stack machine and implementation of stack-aware (i.e. configurable)
> reassociation). It was discussed during BoF talk at the recent
> conference. We are going to summarize the results soon.
> 3. The backend itself.
> 
> So basically, we believe that (1) is beneficial for Embecosm,
> Ericsson and other companies that were actively involved in the
> previous iterations of non-8-bits byte discussion in the past. (3)
> fixes the main concern of the community: the testability of these
> changes. (2) benefits WebAssembly and further stack machines
> implemented in LLVM.
> 
> --
> Kind regards, Dmitry
> 
> On Fri, Oct 25, 2019 at 1:02 AM David Chisnall <
> David.Chisnall at cl.cam.ac.uk> wrote:
> > On 24/10/2019 14:21, JF Bastien via llvm-dev wrote:
> > > I’d like to understand what programming model you see programmers
> > using. 
> > > You don’t need 257 bits per byte if you only offer 257 bit
> > integers. 
> > > Rather, bytes aren’t really a thing at that point. LLVM kinda
> > handles iN 
> > > already, and your backend would legalize everything to exactly
> > this type 
> > > and nothing else, right? Would it be sufficient to expose
> > something like 
> > > int<unsigned Size> with Size=257 for your programming
> > environment?
> > 
> > To add to what JF says:
> > 
> > Typically, a byte means some combination of:
> > 
> > 1. The smallest unit that can be indexed in memory (irrelevant for
> > you, 
> > you have no memory).
> > 2. The smallest unit that can be stored in a register in such a way
> > that 
> > its representation is opaque to software (i.e. you can't tell the
> > bit 
> > order of a byte in a multi-byte word).  For you, it's not clear if
> > this 
> > is 257 bits or something smaller.
> > 3. The smallest unit that is used to build complex types in
> > software. 
> > Since you have no memory, it's not clear that you can build structs
> > or 
> > arrays, and therefore this doesn't seem to apply.
> > 
> >  From your description of your VM, it doesn't sound as if you can 
> > translate from any language with a vaguely C-like abstract machine,
> > so 
> > I'm not certain why the size of a byte actually matters to you. 
> > LLVM IR 
> > has a quite C-like abstract machine, and several of these features
> > seem 
> > like they will be problematic for you.  There is quite a limited
> > subset 
> > of LLVM IR that can be expressed for your VM and it would be
> > helpful if 
> > you could enumerate what you expect to be able to support (and why
> > going 
> > via LLVM is useful, given that you are unlikely to be able to take 
> > advantage of any existing front ends, many optimisations, or most
> > of the 
> > target-agnostic code generator.
> > 
> > David
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> 
https://protect2.fireeye.com/v1/url?k=3cfa75d3-60705739-3cfa3548-0cc47ad93e32-a226b272d7cbf41b&q=1&e=e79c42bc-f473-4130-bb12-0a88373d4c99&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev