[cfe-dev] An unlimited-width integer type

Fri Nov 8 00:25:32 PST 2019

Hi,

This is roughly the model that Smalltalk provides for integers, and I 
did implement a Smalltalk compiler using LLVM some time ago...

A few thoughts come to mind:

1. C and related languages encode fixed-width types in their abstract 
machine (in C, the width of types is baked by the time the preprocessor 
has run for a lot of code, which is why the EFI bytecode design was such 
a bad idea).  Your VM wouldn't be useable from any existing LLVM front 
ends without some significant changes, which is not necessarily a reason 
*not* to do this, but may limit the utility.

2. LLVM has fixed-width types and intrinsics for checking overflow.  If 
you expect that *most* of the operations in your source language(s) will 
fit into an LLVM fixed-width integer type that corresponds to a machine 
register, then you may find that LLVM does quite a good job of 
optimising your fast paths (though leaves your slow paths untouched) if 
you implement a compiler for your abstract machine *to* LLVM.  Note that 
deferring overflow checks for as long as possible and failing back to 
the fast path to redo calculations will generate better code for this.

3. If you're willing to do some extra work, you could try introducing an 
iUnlimited type and see how invasive the change is.  There are a number 
of places where optimisations query bit widths and other places where 
APInt and friends encode a fixed width for a particular compile-time 
evaluation.  There are probably other places where known-bits analyses 
could allow you to transform iUnlimited to a fixed-width type.  You 
could then provide a default lowering that mapped them to pointers and 
calls to a bignum library, so existing back ends could use them.  I can 
imagine that this would be useful to some front ends that would prefer 
to defer lowering of variable-width integers until after some 
optimisations.  I can't guarantee that this would be something that 
upstream would accept, but I'd love to see someone try the experiment.

4. You may find that doing this in MLIR works better.  MLIR has an open 
type system, so you can add a new type for your integers quite easily, 
though then you won't benefit from any LLVM optimisations.  It's not 
clear that you could necessarily use the generic LLVM backend 
infrastructure though, so a direct translation from MLIR to your 
bytecode may preferable.

David

On 07/11/2019 17:41, Levi Aul via cfe-dev wrote:
> Hi, I'm considering writing an LLVM backend to target an abstract 
> machine, encoding to a bytecode ISA.
> 
> This abstract machine has integers, but not in the sense of having 
> fixed-size machine words; rather, each abstract-machine register and 
> memory location can losslessly hold onto a distinct (non-aliasing) 
> unlimited bit-width value, and math/bitwise ops will operate losslessly 
> on these values to produce new values of unlimited bit-width, without 
> any need for overflow into neighbouring registers/memory locations.
> 
> A VM implementing this abstract machine would, of course, use some kind 
> of bignum library to implement these semantics, with each such integer 
> value actually being a pointer to a VM-heap-allocated bignum; or, 
> perhaps, the VM would do load-time optimization to strength-reduce some 
> of the bignums to regular host-CPU integer types.
> 
> But, from the perspective of a compiler targeting the abstract machine 
> itself, the ISA just "has" registers and ops of an 'iUNLIMITED' type.
> 
> Does LLVM, and specifically the LLVM IR type system, support anything 
> like this? If not, would it be the work of days, or of months, to make 
> it happen?
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>