[clang] [compiler-rt] [llvm] [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (PR #81442)

Mingming Liu via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 26 15:11:08 PDT 2024


================
@@ -103,27 +112,226 @@ static cl::opt<bool>
     ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
                  cl::desc("Dump IR after transformation happens"));
 
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt<float> ICPVTablePercentageThreshold(
+    "icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+    cl::desc("The percentage threshold of vtable-count / function-count for "
+             "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
----------------
minglotus-6 wrote:

> I think what you mean is that doing so for an earlier candidate delays the comparisons for later candidates, but that for the last candidate, only the fallback path is affected?

Yes. I updated the comment.

> Do we expect to set this parameter above 1?

Yes. Setting it to 1 is to make the default parameter conservative.  Based on my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable performance win compared with 1, and setting it to 3 doesn't give stable performance wins across different binaries or across runs.
 
One interesting thing is the actual cost of materializing one vtable address point depends on compile option `fpic/fpie`, and the cost of materializing a vtable address point and a function is comparable if `fpie/fpic` option is the same.
  * For non-pie binaries, `@vtable + address-point-offset` is lowered to an immediate representing vtable address point. It could be folded into `icmp` IR after lowering, something like `icmp #imm, <reg>`. For pie (but non-pic) binaries, `@vtable + address-point-offset` is lowered to a pc-relative address. So it takes one instruction to materialize the pc-relative address itself(something like `leaq	2890849(%rip), %rdx     # 0x30fe50 <_ZTV8Derived1>` for x86). 
 



https://github.com/llvm/llvm-project/pull/81442


More information about the cfe-commits mailing list