[PATCH] D39356: [ThinLTO] Fix missing call graph edges for calls with bitcasts.

Fri Nov 3 18:02:51 PDT 2017

mehdi_amini added inline comments.

================
Comment at: llvm/lib/Analysis/ModuleSummaryAnalysis.cpp:251
+          CalledFunction = dyn_cast<Function>(CalledValue);
+      }
       // Check if this is an alias to a function. If so, get the
----------------
vsapsai wrote:
> mehdi_amini wrote:
> > tejohnson wrote:
> > > efriedma wrote:
> > > > steven_wu wrote:
> > > > > dexonsmith wrote:
> > > > > > vsapsai wrote:
> > > > > > > dexonsmith wrote:
> > > > > > > > tejohnson wrote:
> > > > > > > > > efriedma wrote:
> > > > > > > > > > vsapsai wrote:
> > > > > > > > > > > efriedma wrote:
> > > > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > > > efriedma wrote:
> > > > > > > > > > > > > > vsapsai wrote:
> > > > > > > > > > > > > > > tejohnson wrote:
> > > > > > > > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > > > > > > > vsapsai wrote:
> > > > > > > > > > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > > > > > > > > > Isn't it enough to just replace ` auto *CalledValue = CS.getCalledValue();` with ` auto *CalledValue = CS.getCalledValue()->stripPointerCastsNoFollowAliases();` ?
> > > > > > > > > > > > > > > > > > Not enough. `CalledValue` will be what we want (and hopefully it won't be NULL) but `ImmutableCallSite CS` will still use callee with pointer cast and `CalledFunction` will be NULL.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Another approach can be adding `stripPointerCastsNoFollowAliases` to `ImmutableCallSite` or `CallSiteBase`. Or we can introduce another type of call site and add it there. I don't know if stripping casts will be useful or harmful in other places, for this I rely on people who are more familiar with this code.
> > > > > > > > > > > > > > > > > It is a good point that there is likely some general goodness in looking through pointer cast in `ImmutableCallSite` and/or `CallSiteBase`.
> > > > > > > > > > > > > > > > > I'd be interested to see if `ninja check-all` is passing after adding `stripPointerCastsNoFollowAliases` to `CallSiteBase`, and if not what the failures are.
> > > > > > > > > > > > > > > > Agree, this would be a good thing to try.
> > > > > > > > > > > > > > > I've tried naive approach to add `stripPointerCastsNoFollowAliases` to `CallSiteBase` and got 544 test failures. Looks like it interferes with type checking, for instance Verifier/speculatable-callsite.ll (and many others) failed with
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > llvm/clang-build-release/bin/llvm-as: assembly parsed, but does not verify as correct!
> > > > > > > > > > > > > > > Called function is not the same type as the call!
> > > > > > > > > > > > > > >   %ret = call float bitcast (i32 ()* @speculatable to float ()*)() #0
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Another popular problem is verifier complaining "Called function is not pointer to function type!" for code like
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >   %fptrptr = getelementptr [1 x i8*], [1 x i8*]* %vtable, i32 0, i32 0
> > > > > > > > > > > > > > >   %fptr = load i8*, i8** %fptrptr
> > > > > > > > > > > > > > >   %fptr_casted = bitcast i8* %fptr to void (i8*)*
> > > > > > > > > > > > > > >   call void %fptr_casted(i8* %obj)
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > After adding `stripPointerCastsNoFollowAliases` only to `ImmutableCallSite` I got 7 failing tests, among them 3
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file llvm-project/llvm/include/llvm/Support/Casting.h, line 255.
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > >     LLVM :: Transforms/Coroutines/ex5.ll
> > > > > > > > > > > > > > >     LLVM :: Transforms/Inline/cgscc-cycle.ll
> > > > > > > > > > > > > > >     LLVM :: Transforms/InstCombine/2007-05-18-CastFoldBug.ll
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > and 2
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > Assertion failed: (i < getNumArgOperands() && "Out of bounds!"), function getArgOperand, file llvm-project/llvm/include/llvm/IR/Instructions.h, line 1572.
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > >     LLVM :: Transforms/Inline/inline-musttail-varargs.ll
> > > > > > > > > > > > > > >     LLVM :: Transforms/Inline/noinline-recursive-fn.ll
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > My conclusion is that more aggressive `stripPointerCastsNoFollowAliases` usage gives bad results and isn't worth pursuing as there is code expecting casts to be present. What do you think?
> > > > > > > > > > > > > > LLVM IR optimizations generally treat a call whose callee is a bitcast as an indirect call.  This is precisely because they want to assume the called function has the same signature as the call.  I don't think this is worth changing.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The one place that does have special handling for calls of bitcasts is instcombine: it has some code to try to turn them into direct calls in simple cases.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Also, it's suspicious that you're doing this to try to fix a bug: unless you're trying to match some other code which is also calling stripPointerCastsNoFollowAliases() on the callees of calls, you're just hiding a real problem with function pointer handling.
> > > > > > > > > > > > > Is the inliner having a special handling for calls of bitcasts as well? If yes we should try to register this as a call, otherwise we could go with a reference.
> > > > > > > > > > > > No, the inliner has no special handling; it treats calls where the callee is a bitcast as indirect calls.
> > > > > > > > > > > So how do you recommend to address this problem? I've tried to leave CalledFunction NULL and handle it in the branch for indirect calls. But IndirectCallPromotionAnalysis doesn't return any promotion candidates because call instruction has no MD_prof metadata. As the result, nothing is added to the call graph.
> > > > > > > > > > Well, in terms of fixing the PR34966 miscompile, this patch is essentially orthogonal.  ThinLTO importing should generate correct code whether or not a function pointer is called directly.
> > > > > > > > > > 
> > > > > > > > > > In terms of trying to improve performance for pre-ANSI C code, I guess this patch is fine?  It looks like adding the edge won't do any harm even if the rest of the optimizer continues to treat it as an indirect call.
> > > > > > > > > > Well, in terms of fixing the PR34966 miscompile, this patch is essentially orthogonal. ThinLTO importing should generate correct code whether or not a function pointer is called directly.
> > > > > > > > > 
> > > > > > > > > I'm not sure why you say this patch is orthogonal to the miscompile. We need these edges in the summary not only for importing, but also for things like liveness analysis and knowing what to promote as a result of other importing decisions. The fact that no edge got added here meant that we didn't realize the callee function was exported when it's caller was imported. As Mehdi points out, we could also add the callee to the reference edges for the caller, but by adding a call edge we are able to import it and if some other optimization can remove the bitcast or otherwise convert to a "direct" call we would be able to inline it.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > I don't understand Eli's argument that this fix is unrelated to the miscompile.  As Teresa said, ThinLTO needs an accurate call graph in order to correctly promote static functions.  This does exactly that.
> > > > > > > > 
> > > > > > > > It's irrelevant (to the miscompile) whether the static function is inlinable.
> > > > > > > After more thinking and investigation I think there is more to this problem. It is still not clear why duplicate function name caused miscompile. If you change static function name - everything is fine, if you disable LTO - linker is perfectly happy with duplicate function names and doesn't mix them. This fix avoids the problem of duplicate names by allowing to promote static function and to rename it. But I haven't found yet why duplicate names aren't handled correctly with ThinLTO.
> > > > > > > 
> > > > > > > Another test I've done is renaming static function and checking module index. Imports/exports are still empty but the final binary is correct. That makes me think that there is another bug not related to module index.
> > > > > > > If you change static function name - everything is fine
> > > > > > Okay; I agree that's strange.
> > > > > > 
> > > > > > > if you disable LTO - linker is perfectly happy with duplicate function names and doesn't mix them
> > > > > > Sure, of course not.  The linker has been getting this right for decades.
> > > > > If linking the object works, I don't think LTO mess up the visibility.
> > > > > 
> > > > > There is a stage in ld64 called AtomSyncer which maps the information from llvm-ir to generated object file. It is likely that the undef is sync to the static version in error.
> > > > > 
> > > > In general, if the callee is a ConstantExpr, you can't determine the function it will call.  Consider, e.g.:
> > > > 
> > > > ```
> > > > define void @h() {
> > > > entry:
> > > >   tail call void select (i1 icmp eq (i32* @a, i32* null), void ()* @f, void ()* @g)()
> > > >   ret void
> > > > }
> > > > ```
> > > > 
> > > > I'm not that familiar with ThinLTO, so I'm not sure how a "reference edge" is different from a "call edge".  But in general you need to walk the ConstantExpr to find which globals it refers to.
> > > I also agree that is odd that changing the static name would allow things to work. 
> > > 
> > > Normally the there is no issue as the compiler/linker know to resolve to the static function in that file. In the case in the PR, with ext() imported and without the static f() being promoted/renamed, it presumably links with the wrong version since that is the only version available for the use in the imported copy of ext(). So the initial analysis and fix make sense to me.
> > > 
> > > But it would be good to understand why it works if the static function name is changed.
> > > , if you disable LTO - linker is perfectly happy with duplicate function names and doesn't mix them.
> > 
> > There is no real "duplicate name": the naming is scoped and a static function isn't in the global scope, its name is meaningless.
> > This is also why I mentioned earlier that I'd like to see the importer fixed: it shouldn't allow to import a function with a reference to an internal one and have the reference being "remapped" to a global function in the current file (I assume that the issue is at the LLVM level, not the linker...). There should be a fatal error when the client of the importer tries to do that.
> > 
> > I don't understand how renaming the function solves the problem though: there is something fishy. I'd be interested to get the intermediate files (pass -save-temps to the linker) to understand what's going on.
> OK, I believe I've found a place in ld64 where we adjust fixups and associate them with other atoms based on name only, without checking linkage visibility. It is in AtomSyncer as Steven has helpfully mentioned. That issue is investigated separately.
> 
> With -save-temps I tried to link intermediate files manually, like `ld -o a-manual.out a.out.0.thinlto.o a.out.1.thinlto.o a.out.2.thinlto.o` and resulting binary was correct. So I make the conclusion that ThinLTO object file generation is working correctly, the problem is in linker integration with ThinLTO.
> 
> Back to the current patch. Is there value in it in order to improve performance for pre-ANSI C code?
@efriedma : usually a reference edge is what we have when we load from a global variable, or take the address of a symbol (variable or function). IIRC it is always OK to add extra edges (calls and references) in the graph.

Call edges are used to compute function importing, while reference edges are used for things like dead-stripping.

As Teresa mentioned, speculatively inserting a call edge means we are likely to import the function and get it inlined in case the indirect call becomes direct (or if indirect call speculation happens).

By the way with PGO and value-profiling we speculate some call edges as well I think.

@vsapsai thanks for checking ld64! I think we should still move on with this patch, but I'd like to hear from Teresa.

https://reviews.llvm.org/D39356