[LLVMdev] Missed devirtualization opportunities

Mon Oct 11 09:12:03 PDT 2010

I took the output of clang, simplified it, and used it as a testbase.
Essentially, there is one class with one virtual function; I create an
instance and call the virtual method all in one function:

; The TestVirtual class vtbl
@classvtbl.TestVirtual = constant %classvtbltype.TestVirtual {
  ; Pointers to the virtual methods for the TestVirtual class
  ...
}
; ...

define i32 @main() nounwind {
; create the instance
%pinstance = alloca %class.TestVirtual

; %ppVtbl becomes a pointer to the instance's vtbl pointer.
%ppVtbl = getelementptr %class.TestVirtual* %pinstance, i64 0, i32 0

; Populate the instance's vtbl pointer.  After this, the instance is
constructed.
store %classvtbltype.TestVirtual* @classvtbl.TestVirtual,
%classvtbltype.TestVirtual** %ppVtbl

; If this next call is commented out, the virtual method call is
devirtualized by -std-compile-opts
%puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i32
0, i32 0))

; Call the virtual function.

; load the instance's vtbl pointer.
%pVtbl1 = load %classvtbltype.TestVirtual** %ppVtbl

; load the function pointer from the vtbl
%ppVfn1 = getelementptr %classvtbltype.TestVirtual* %pVtbl1, i64 0, i32 0
%pVfn1 = load void (%class.TestVirtual*)** %ppVfn1

; call the virtual method.
call void %pVfn1(class.TestVirtual* %pinstance)

; ...
}

(clang put in a bunch of bitcasts and stuck the vtbl itself into an
array; ripping all that out made no difference)

Tracing through MemoryDependenceAnalysis and BasicAA, it turns out
that the store into %ppVtbl (a pointer to the instance's vtbl pointer)
is clobbered by the call to @puts because PointerMayBeCaptured gets
called on it when the @puts call is encountered and returns true
because %pinstance is passed through %pVfn1 to make the virtual method
call, and the call to %pVfn1 (unlike the actual function that %pVfn1
must point to) does not declare the 'this' parameter nocapture.

The devirtualization could have happened anyway if one of the
following had happened:

1. PointerMayBeCaptured was able to determine that the possible
capture happened *after* the call to @puts.  PointerMayBeCaptured does
not expose a parameter for the instruction we're trying to
getModRefInfo on (the @puts call), and even if it did, I don't know of
a fast way to determine in the general case whether one instruction
always executed before another (unless they're in the same block, and
even then you have to scan the block).  But this would be a starting
point for a more complete escape analysis, which could also be used to
help eliminate gc allocations in many cases.

2. Some analysis existed that could tell us the complete set of
functions that %pVfn could point to, and we could then determine that
all possible targets declared that instance pointer nocapture, and
thus the instance is never captured.  Adding this bit of analysis to
the CallGraph pass would also allow several other passes to handle
indirect calls in cases where the function pointer in question could
be guaranteed to point to one of a finite set of targets.  However,
this would involve some overhead for the CallGraph construction.

3. The front-end, recognizing that scribbling on an instance's vtbl
pointer has undefined results, eliminated the loads of the vtbl
pointer and replaced them with @classvtbl.TestVirtual.  This would
allow devirtualization within a function at least, but (I think) would
do less to allow analysis to spot devirtualization opportunities
across functions.  (Although ArgPromotion could arrange to have the
vtbl pointer passed in separately, and then inlining and/or partial
specialization could be made to see that it's a pointer to a constant
and thus add in the IndirectCallBonus)

At this point I'm looking for suggestions and feedback.  I think
implementing (1) and (2) would go a long way toward making several
other transformations safely more aggressive, but would involve
noticeable (unacceptable?) overhead.  Does what I'm looking for
already exist?  Have they already been considered and rejected due to
the overhead involved?