[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Fri Aug 7 14:35:34 PDT 2015

Hello.

>> It is stack VM, and one designed to utilize all the advantages of the assembly language implementation.
> This sounds very, very familiar.  Are you willing to share which 
> VM/language you're working on?
I would like to because I think the additional context would be helpful...let me ask for permission first.

>> doing this using using C (by C function calls, CPS) led to significant performance loss.
> I assume the continuation is a tail call?  If so, have you examined 
> where the performance loss originated?  An obvious tail call in C code 
> being compiled by modern Clang should be code generated as a tail call.  
> You might be stumbling across an implementation limit and adjusting your 
> input slightly might bypass that.
Yes, both executor and it's continuation are tail call. And we tried gcc 4.8, 4.9, 5.1 
and clang 3.6 C compilers. All are good to infer a tail call if the function call is explicit. But if 
a continuation function pointer is popped up from VM stack, neither of these were able to produce a jump,
leading to machine stack overflow. Not sure why clang wasn't able to do this, because while using 
LLVM IR it works (using a test code, of course).

>> We are considering extending LLVM by creating a special calling convention which 
>> forces a function (using this convention) to pass args in registers and to
>> be force tail-call optimized.
> You absolutely will need a custom calling convention for the register 
> assignments and such.  If your source IR uses musttail, in principal you 
> shouldn't need to do anything special for the tail calls provided you're 
> running on one of the architectures where musttail has been implemented.
musttail has some limitations (ex., the caller and callee prototypes must match), but tail 
with -tailcallopt work just fine.
Looks like we were on the right track with that question. Thanks!

> Creating a new calling convention is easy.  It will require a custom 
> LLVM build to get started since you have to change td and cpp files in 
> the target.  For examples, see the existing ones in 
> Target/X86/X86CallingConv.td
Yes, we already have a custom build. Thanks!

>> 2. Because the existing VM runtime is written in x86 assembly, and doesn't do function calls, it uses ESP register for VM stack purposes
>> (again, it is not in use for low-level calls). We want to do the same.
>This will be tricky... Do you absolutely absolutely need this?
We still have to support x86 32-bit and this arch has a lack of GP registers, and 
a) esp register becomes almost unused, 
b) we will have to do stack operations using other regs, which may lead to more spilling. 
So, it's good to have the esp reg doing what is has to...but for VM stack.
We don't want to keep original x86 assembly version along with our new one llvm-based (hopefully we will make it).

>> We think that it could be implemented as intrinsics as well? Or perhaps we should create intrinsics for arbitrary machine stack access?
>> We tried, for example, stacksave-sub-store-stackrestore sequence, but it never folds into a single push operation.
> I would suggest just implementing your virtual machine stack as a normal 
> bit of memory.  There's no reason that the compiler needs to know that 
> this is the VM stack versus some other buffer.  You will need to provide 
> aliasing facts, but that's a much smaller extension*.  Trying to much 
> with the frames at runtime using the intrinsics is going to end very badly.
> * Don't under estimate this point.  Providing aliasing metadata will be 
> *really* important for this scheme to work reasonably well.  You may 
> need to add custom extensions locally or propose extensions upstream to 
> encode the information you need.
Could you please be more specific? Pointing to some docs or examples will work just fine. :)

> Another option: If you know the size of your vm frame statically, you 
> can emit loads from the vm stack locations into SSA values (or allocas, 
> which will become SSA values) and spill as needed to ensure the VM stack 
> is up to date as required by your language requirements.  This will 
> likely a) decrease your dependence of the pass ordering above, and b) 
> give slightly better results since LLVM is going to have to be 
> conservative about calls into your runtime and your custom lowering gets 
> to use language specific knowledge.
Hmm.. this sounds interesting. thanks.
So, if I understand correctly, if we need to allocate a VM stack frame, the idea is to create 
enough allocas, then store there values which need to be in the VM frame? 
But how can it survive optimizing passes?
(I assume that we did stackload before and esp points to VM stack)
Could you please explain more? 

>> 3. Since the machine stack is a VM stack, we are not allowed to use alloca. It's not a problem, but the machine register allocator/spiller
>> can still use the machine stack for register spilling purposes.
>> How could this be solved? Should we provide our own register allocator? Or could it be solved by providing a machine function pass,
>> which will run on a function marked with our calling conv and substitute machine instructions responsible for spilling to stack
>> with load/store instructions to our heap memory?
> I don't understand what you're trying to ask here.  If you can spill to 
> the machine frame (instead of the VM stack frame), what's the problem?
I mean that if we do stackload and the machine stack points to the VM stack (and we 
somehow solved the problem above), LLVM still wants to spill regs to stack. 
It would be good to have the spill slots in the VM context, but not in the stack 
(neither machine nor VM).

>> Thank you for your time.
> At a meta level, let me give my standard warning: implementing a 
> functional compiler for a language of your choice on LLVM is relatively 
> easy; your looking at around 1-3 man years of effort depending on the 
> language.  implementing a *performant* compiler is far, far harder.  
> Unless you're willing to budget for upwards of 10 man years of skilled 
> compiler engineering time, you may need to adjust your expectations.  
> How hard the problem will be also depends on how good your current 
> implementation is of course.  :)
I appreciate the warning. 
Strictly speaking, we don't implement the compiler itself. It's only a runtime 
for interpreting bytecodes compiled earlier. JIT is upcoming, but not for now.
It's also not including a memory manager - it works good enough written in C.
Our task is really not easy, but not so hard as you think. :) We think. :)

> To give a flavour for the tuning involved, you might find this document 
> helpful:
> http://llvm.org/docs/Frontend/PerformanceTips.html
> If you're serious about the project, I highly recommend that you make an 
> effort to attend the developers conference in Oct.  You'll want to have 
> a bunch of high bandwidth conversations with people who've been down 
> this road before and email just doesn't quite work for that.
Thanks! We will keep this in mind.

_____________________________________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit http://www.TechEmail.com