[PATCH] D76811: [X86] Refactor X86IndirectThunks.cpp to Accomodate Mitigations other than Retpoline [2/3]

Tue Mar 31 15:27:38 PDT 2020

sconstab added a comment.

In D76811#1953287 <https://reviews.llvm.org/D76811#1953287>, @zbrid wrote:

> In D76811#1952580 <https://reviews.llvm.org/D76811#1952580>, @sconstab wrote:
>
> > By the way, I had initially implemented this patch with a pure virtual base class and a retpoline thunk (and later LVI thunk) class that implements the interface. However, I could not for the life of me structure the classes in a manner that would allow the compiler to devirtualize. Using CRTP admittedly sacrifices some readability, but it does not prevent the compiler from inlining `RetpolineThunkInserter`'s methods.
>
>
> Asking to learn here. I've not heard of CRTP and don't quite understand your explanation.

Recommend the article here: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern

> - What do you mean by allowing the compiler to devirtualize?

Suppose you have

  struct Base { virtual void foo() = 0; };
  struct D1 : Base { void foo() { … }; };
  struct D2 final : Base { void foo() { … }; };

  void barB(B *bptr) { bptr->foo(); } // cannot devirtualize
  void barD1(D1 *d1ptr) { d1ptr->foo(); } // cannot devirtualize
  void barD2(D2 *d2ptr) { d2ptr->foo(); } // can devirtualize

Devirtualizing a call means that you don't need to look up the method in the object's vtable. In short, a compiler can devirtualize a call if the target callee is //unambiguous//. In `BarB()`, the call to `foo()` cannot be devirtualized, because the pointee may be of type `D1`, `D2`, or something else that derives from `D1`. In `BarD1()`, the call also cannot be devirtualized because the compiler has no way of knowing that something else, perhaps in some other translation unit, may derive from `D1` and have its own `foo()` implementation. The call to `foo()` in `barD2()` CAN be devirtualized because `D2` is `final` and thus nothing can derive from it.

> - Why is it preferable to allow the compiler to devirtualize?

1. You save yourself two loads: loading the object's vtable, and loading the address of the target method.
2. You allow the compiler to inline the callee.

> - Why is it desirable to allow the compiler to inline RetpolineThunkInserter's methods?

This is my opinion and someone else may disagree. I suspect that most code compiled for X86 will not actually use any of these special thunks. But this pass is always run anyways, and the thunk inserter(s) need to look at every function and determine whether it is a thunk, or a function that needs a thunk. If these checks are being made as virtual calls, then IMO those cycles are being wasted. Virtual calls are most useful for big, scalable software that may change frequently. I see the thunk inserter as something fairly static that may only need to add a new thunk, say, every two years or so.

> - Why is that preferable to readability in this case? Is the implication that there is a large perf impact to not use CRTP?

I don't know whether there will be a large performance impact, but there will be an impact. If the impact only applied to code that was using thunks, I think this would be ok. But some of the logic is being applied to **all** code, and therefore IMO should be as fast as possible.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76811/new/

https://reviews.llvm.org/D76811