[PATCH] D42453: Use branch funnels for virtual calls when retpoline mitigation is enabled.

Wed Jan 24 19:00:12 PST 2018

chandlerc added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/LowerTypeTests.cpp:1106-1108
+    AsmOS << "leaq ${" << Args.size() << ":c}+" << T.Offset
+          << "(%rip), %r11\n";
+    Args.push_back(CombinedGlobalAddr);
----------------
pcc wrote:
> chandlerc wrote:
> > pcc wrote:
> > > chandlerc wrote:
> > > > Emitting all of this with (I assume) inline assembly seems like a really messy design. It makes all of this entirely x86-specific, but *this* part is completely independent of architecture.
> > > > 
> > > > Why not emit LLVM IR? Is there no pattern of IR that actually lowers to reasonable branches? I find that a little bit surprising, but maybe we should just fix the lowering in that case?
> > > The main reason is that we need to make sure that whatever code we generate conforms with the calling convention used for the virtual call. At this point there is no guarantee that we have correct information about the calling convention to be used here because the "source of truth" for the calling convention is the call site itself, and with ThinLTO there may not be a call site in the current module. We cannot even rely on the function prototype with ThinLTO because we may have discarded the prototype by this point. Furthermore there doesn't seem to be a point in preserving the calling convention because no matter what it is, we will always want the same code. That is what led me to the conclusion that what we need here is an IR construct that represents the notion of a branch funnel call to a function with an unknown prototype -- that is what the `llvm.icall.jumptable` intrinsic is.
> > > 
> > > I considered putting the implementation of this intrinsic in the backend like a regular intrinsic, but one of the problems with that is that one of the things that we need to be able to do is stitch together multiple global variables into a single global in order to implement the branch funnel as a binary search, and it is too late to do that in the backend. The LowerTypeTests pass was a convenient place to do it (since it already needs to know how to do it in order to implement `llvm.type.test`) but admittedly it may not be the best place. It occurred to me that it may be possible to implement the stitching together of globals in a pre-isel pass and the rest of `llvm.icall.jumptable` in the backend, but I think that's something that's best considered for a followup patch since it may have implications for how we implement `llvm.type.test` as well.
> > The more I think about this the more I think that implementing this with inline asm is just the wrong design.
> > 
> > I understand the challenges you're hitting here, but I think we should solve them by lowering in the backend, not in the IR. The IR really can't model the kinds of things you're doing (and it shouldn't!).
> > 
> > My suggestion would be to change the `llvm.icall.<mumble>` intrinsic or add a during-lowering intrinsic that lets you prepare the global variable stuff as necessary in the IR pass, and hand that prepared information cleanly (and abstractly) to the backend where you can effectively lower it to branch instructions. Since you are (notionally) lowering a call, you may even be able to use the intrinsic all the way into the code generator and just lower this manually with an MI pass.
> > 
> > Another aspect that this will improve is that you shouldn't need to embed things into naked functions at all. This should be something that you can just expand wherever it is needed into the code.
> I'm not sure that I understand your proposal. Are you proposing that in the WholeProgramDevirt pass we would transform each virtual call site into an intrinsic call with the list of possible targets, and then in the backend we would lower the intrinsic call into a call to a branch funnel "function" together with its definition?
I'm imagining in the backend, we would lower the intrinsic call into an *inline* branch funnel across the destinations.

(I can also imagine using a pseudo instead of call to an intrinsic, but as you want to capture a call with some specific calling convention / signature / etc, using a call to an intrinsic seems a reasonable representation... if needed, you could even put the potential destinations into an operand bundle so that the formal argument list *is* actually the argument list which should be used for calling the target function.)

https://reviews.llvm.org/D42453