[llvm-dev] Proposal: virtual constant propagation

Thu Jan 28 11:43:37 PST 2016

Hi Mehdi,

On Thu, Jan 28, 2016 at 10:49:01AM -0800, Mehdi Amini wrote:
> Hi Peter,
> 
> Thanks for your answer!
> 
> > On Jan 28, 2016, at 10:17 AM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> > 
> > Hans wrote:
> >> (and start-up time if we can drop the vtables and
> >> void the dynamic relocations).
> > 
> > On Thu, Jan 28, 2016 at 09:15:05AM -0800, Mehdi Amini wrote:
> >> Hi,
> >> 
> >> I just thought about another use case: VTable compression.
> >> If you know that an entry in the Vtable is never used, just remove it!
> >> I’d hope we could even eliminate some unused virtual functions from the final binary.
> > 
> > We could most likely extend the design to do both of these things, but we'd
> > need to be a little careful as these would both be ABI breaks, so we'd probably
> > neeed some separate way of saying "I promise not to call virtual functions
> > from these classes" as opposed to "I promise not to extend these classes”.
> 
> Mmmm, good point, that’s not at the same level as the other optimization :-/
> 
> > 
> >> 
> >> — 
> >> Mehdi
> >> 
> >> 
> >>> On Jan 27, 2016, at 10:29 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >>> 
> >>> Hi Peter,
> >>> 
> >>> Pete (Cooper, CC'ed) had a similar idea a few months ago about devirtualization and LTO: we can know the set of possible overload based on the knowledge of a closed type hierarchy. He worked on a prototype, and I continued to evolve it. I planned to submit something on the mailing list but couldn’t fit it in my planning before April :)
> >>> Our plan was about removing virtual call in general, or maybe in some cases turning the indirect call into a switch table.
> >>> 
> >>> In both use case (your constant propagation and devirtualization), I think the interesting part is “how to figure the set of possible callees for a call site” and how we encode this in the IR.
> >>> 
> >>> I haven’t worked on this for a few weeks, but I’ll try to give a rough description of where I stopped on my prototype: 
> >>> I was storing the full inheritance tree in metadata. There was one metadata per entry of the VTable, and any method that override one of the base class will have a metadata attached pointing to the slot in the base class.
> >>> The front-end is modified to emit any load of a VTable through a new intrinsic llvm.read.vtable that takes a pointer to the VTable, the index of the slot in the VTable, and a pointer to the metadata that describe the slot for the current type. Using the metadata you access directly to any override and construct naturally and very easily the set of possible overloads. The pass that transform the IR is quite easy to implement since you don’t need to walk the IR but access directly all the uses of the llvm.read.vtable intrinsic and have all the needed information in the operands.
> >>> I haven’t give much thought about your representation yet to know how it compares (and I’m not very familiar with how CFI works), but I’m interested in your feedback!
> >>> 
> >>> I was limited during my experiment by the lack of black/white list to define what hierarchy is/isn’t closed, I’m glad if such flag/control would be added!
> > 
> > Thanks for sharing that design, but I think it would be best to have a single
> > representation of the type information in the tree. The bitset representation
> > has the advantage of being proven with other features like CFI, while I'm
> > not sure that a design that worked at the virtual function level would work
> > well with CFI.
> 
> I agree with all of what you say, and I shared the design FYI only. No NIH syndrome here, reinventing the wheel is just driven by ignorance :)
> (I’m just curious about the pros/cons of both approach for my personal information)
> 
> 
> > (It also includes the insight that you don't need to store
> > an inheritance graph or anything like that, just the valid address points
> > for each vtable).
> 
> I’m not sure what you mean in this sentence.

Sorry about that. Hopefully all will become clear once I update the docs.

> > I did think about having something like your llvm.load.vtable intrinsic to
> > load function pointers, but I thought it would be more orthogonal to extend
> > existing mechanisms to load the vtable than to have a special type of load
> > that only works with vtables, to avoid having to teach passes about this
> > new type of load. Did you find this to be an issue in practice?
> 
> Initially Pete had to thread special handling for this intrinsic everywhere till instruction selection. 
> I changed the semantic of the intrinsic to be completely opaque, the codegen I have looks like:
> 
>   %vtable = load void (%struct.Derived2*)**, void (%struct.Derived2*)*** %3, align 8
>   %vfn_ptr = getelementptr inbounds void (%struct.Derived2*)*, void (%struct.Derived2*)** %vtable, i64 1
>   %vfn = load void (%struct.Derived2*)*, void (%struct.Derived2*)** %vfn_ptr, align 8
>   call void %vfn(%struct.Derived2* %2)
> 
> to:
> 
>   %vtable = load void (%struct.Derived2*)**, void (%struct.Derived2*)*** %3, align 8
>   %vfn_ptr = getelementptr inbounds void (%struct.Derived2*)*, void (%struct.Derived2*)** %vtable, i64 1
>   %vfn = load void (%struct.Derived2*)*, void (%struct.Derived2*)** %vfn_ptr, align 8
> 
>   %vfn_i8ptr = bitcast %struct.DerivedDerivedNonVirtual* %vfn to i8*
>   %vtable_actual = bitcast [3 x i8*]* @_ZTV8Derived2 to i8*
>   %vfn.devirt_i8ptr = call i8* @llvm.read.vtable(i8* %vfn_i8ptr, i32 0, metadata !5, i8* %vtable_actual)
>   %vfn.devirt = bitcast i8* %vfn.devirt_i8ptr to void (%struct.Derived2*)*
> 
>   call void %vfn.devirt(%struct.DerivedDerivedNonVirtual* %2)
> 
> The call to the llvm.read.vtable() is just here to bundle the %vfn we just read out of the vtable with the slot number in the vtable, the metadata describing the entry for this slot, and a reference to the VTable for the actual type (top of the current hierarchy).
> 
> It is always valid to RAUW the intrinsic with its first argument, but the others carry the information about the possible overload for this VTable slot.

Ah, so the llvm.read.vtable doesn't encode the load? Interesting. I'll have to
think about what the best representation would be in order to allow for ABI
breaks -- now that I've given it a little more thought I think that placing
the devirtualization pass early in the LTO pipeline and having it re-encode
a vtable load intrinsic using regular instructions if devirtualization is
unsuccessful may be sufficient to avoid teaching most of the compiler about
the intrinsic, but it would certainly be worth running experiments to see
how much it matters in practice.

That all can wait until we start making ABI-breaking changes though.

> > Now that I think about it, I believe that this sort of intrinsic could make
> > ABI-breaking vtable transformations like the ones above easier to implement
> > (there could be some IR module flag that means "I promise not to load from
> > vtables except via this intrinsic" that could be used to turn on those
> > transformations).
> > 
> > I had another look at http://llvm.org/docs/BitSets.html and it occurred
> > to me that it doesn't clearly explain how to use bitsets to encode type
> > information. I can certainly take another pass over that page to see if
> > I can improve the documentation there as we start to use bitsets in more
> > places in the compiler.
> 
> I’d love that if you could improve the docs! :)
> The CFI implementation has been on my reading list for too long now…

Great, I'll try to do that soon then.

Thanks,
-- 
Peter