[llvm-dev] RFC: On non 8-bit bytes and the target for it

Dmitriy Borisenkov via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 23 02:16:41 PDT 2019

This RFC is to ask whether the community is interested in further
discussion of iN bytes support. Last time the issue was on the agenda in
May and the discussion was triggered by Jesper Antonsson's patches (see

It seems that, while some downstream areas benefit from non-8-bit bytes
support, this feature is barely maintainable given the lack of utilization
targets in the upstream. The reason why I would like to again raise the
matter is that we, the TON Labs team, would like to upstream our backend

The backend generates code for TON virtual machine designed to run smart
contracts in TON blockchain (see the original specifications for TVM and
TON respectively at <https://test.ton.org/tvm.pdf>
https://test.ton.org/tvm.pdf and at <https://test.ton.org/tblkch.pdf>

The target has the following key particularities:

   - stack-based virtual machine
   - 257-bit wide integers, signed magnitude representation
   - no float point arithmetic support
   - persistent storage
   - no "native" memory; modeling is possible by costly
   - presence of custom types (it is exactly the reason for upstreaming)

Given that the TVM only operates with 257 bits wide numbers, we changed
LLVM in downstream to get a 257 bits byte. At the moment, we have a hacky
implementation with a new byte size hardcoded. For a reference: the scope
was to change approximately 20 files in LLVM and about a dozen in Clang.
Later on, we plan to integrate the new byte size with data layout according
to <https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/>
https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/. And if the
community decides to move on, we will upstream and maintain it.

We realize that a 257 bits byte is quite unusual, but for smart contracts
it is ok to have at least 256 bits numbers. The leading VM for smart
contracts, Ethereum VM, introduced this practice and other blockchain VMs
followed. Thus, while TVM might be the first LLVM-based target for
blockchain that needs the feature, it is not necessarily the last one. We
also found mentions of 12, 16 and 24 bits wide bytes in non-8-bits byte
discussions in the past (in reverse chronological order:

Our Toolchain is going to be based only on OSS. It allows using the backend
without getting any proprietary software. Also, we hope that implementation
for a target similar to TVM would help to generalize some concepts in LLVM
and to make the whole framework better suit non-mainstream architectures.

Aside from non-i8 bytes, we would like to bring stack machine support in
the Target Independent Code generator. The matter will be discussed at the
developers' meeting, see

LLVM and Clang for TVM are available at (
https://github.com/tonlabs/TON-Compiler). It is currently under LLVM 7 and
it can only produce assembler; we have not specified our object file format
yet). Moreover, we have introduced custom IR types to model Tuples, Slices,
Builders, Cells from the specification. We are going to do an LLVM update
and consider using opaque types before starting to upstream.

Kind regards, Dmitry Borisenkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191023/082336f6/attachment.html>

More information about the llvm-dev mailing list