[LLVMdev] Hash Table Virtual Calls with Conflict Resolution Stubs

Tue Jun 2 02:18:39 PDT 2015

Hi Stephen,

Have you looked at how Objective-C implements this?  There are at least four Objective-C runtimes that are open source, which all solve this problem in slightly different ways.  The GCC, GNUstep, Apple Legacy, Apple Modern, and ObjFW runtimes are all supported by clang, so you can take a look at the IR generation easily as well.

I’d be happy to answer questions about the GNUstep runtime off list,

David

> On 1 Jun 2015, at 00:01, Stephen Cross <scross at scross.co.uk> wrote:
> 
> Hi everyone,
> 
> I'm the developer of the Loci programming language (
> http://loci-lang.org ), for which the compiler is a front-end for
> LLVM. I would like to say that LLVM has been extremely useful for the
> development of the compiler and so thank you everyone for building
> this amazing system.
> 
> ---- Virtual Method Calls ----
> 
> While most aspects of the language map well onto LLVM IR, it seems
> non-trivial to generate the virtual method calls in LLVM. Loci uses a
> concept called 'structural typing' [1], which basically means classes
> don't have to explicitly implement interfaces; if a class has the
> methods required by the interface then the class -> interface cast is
> valid. A hash table is therefore used to represent the virtual method
> table of an object, in which a virtual method call indexes the hash
> table with a hash of the method name and then calls the function
> pointer it finds in the corresponding slot.
> 
> Clearly there can be clashes in the hash table where two (or more)
> method names map to the same hash table slot. There are various
> approaches for conflict resolution but the cheapest seems to be
> setting a hidden value (such as a register) on the caller side to a
> larger integer hash value (a large enough space in which collisions
> are assumed not to occur) and then generating a 'conflict resolution
> stub', which does something like:
> 
>   conflict_resolution_stub:
>          cmp eax, <METHOD 1 HASH>
>          jne .method_2
>   .method_1:
>          jmp method_1
>   .method_2:
>          jmp method_2
> 
> So in this case 'eax' is being used for conflict resolution and a
> simple comparison can distinguish the two methods. The nice aspect of
> this approach is that collisions aren't expected to occur in most
> cases so the method function can be directly placed in the vtable and
> the hidden value will simply be ignored in those cases. Hence in
> general the only overhead versus a C++ virtual call is setting the
> hidden value, which is essentially a negligible cost. For more detail,
> the basic elements of the virtual call mechanism are described in
> 'Invokeinterface Considered Harmless' [2] (albeit for Java).
> 
> ---- Encoding calls in LLVM IR ----
> 
> Encoding this information in LLVM IR seems to be very difficult,
> albeit I've noticed some recent developments (such as musttail) that
> could be useful for this. There was a previous discussion of this
> issue in 2011 [3] and the conclusion seemed to be that the mechanism
> above couldn't be encoded in LLVM IR.
> 
> It is possible to generate the necessary inline assembly, but then
> LLVM has no knowledge of the function calls and hence no inlining
> occurs and all the optimisations hit this boundary. Ideally it would
> be possible to generate safe LLVM IR that could be understood by
> existing optimisation passes.
> 
> So I'm wondering:
> 
> * Is there anything in LLVM IR that could help?
> * If this currently can't be encoded in LLVM IR, what changes do you
> think would be necessary to support this?
> 
> I understand that LLVM may not have complete support for this use case
> right now; I'd be very happy to do the necessary work and submit any
> relevant patches for review. I am very keen *not* to fork LLVM but
> instead to try to make upstream changes that are as broadly useful as
> possible. I've added some notes below about what approach might be
> taken and I'd be interested to know what you think.
> 
> Kind regards,
> Stephen Cross
> 
> [1] http://loci-lang.org/StructuralTyping.html
> [2] https://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf
> [3] http://marc.info/?l=llvm-dev&m=129594650017675
> 
> ---- Notes about potential approaches ----
> 
> In terms of the approach, there are at least two parts to this problem:
> 
> * Setting/retrieving the hidden value
> * Forwarding function arguments
> 
> For the former, two vague ideas I had were adding a new calling
> convention (it might be useful if external projects could use hooks to
> add in their own calling conventions) or modifying inreg to allow
> specifying a particular register (albeit encoding target-specific
> information like this isn't ideal). Another approach could be adding a
> parameter attribute (e.g. 'hidden' or 'outofband') that puts the
> parameter in some target-specific location. The 'llvm.read_register'
> and 'llvm.write_register' intrinsics would only be useful if they
> could guarantee the relevant register wasn't going to be overwritten
> in the meantime (and they'd also need to support more registers than
> currently).
> 
> As for forwarding arguments, it seems like the combination of
> 'musttail' and varargs come close to satisfying this. The only issue
> is that it's not possible to forward a return value.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev