[PATCH] D64642: [AMDGPU] Tune inlining parameters for AMDGPU target

Mon Jul 15 08:59:06 PDT 2019

dfukalov marked 2 inline comments as done.
dfukalov added inline comments.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:883
   //
-  // Vector bonuses: We want to more aggressively inline vector-dense kernels
-  // and apply this bonus based on the percentage of vector instructions. A
----------------
arsenm wrote:
> How does it decide what "vector dense" means? We already report costs that approximately say scalarize everything, and scalarization is free
They estimate this "dense" by a percent of LLVM IR instructions with vector arguments. So if a function contains more than 50% of vector instructions this bonus added to threshold. For 10%-50% vector instructions cases they add half of the bonus.
I guess this logic of bonuses is based on x86 extensions like MMX and others.

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-inline.ll:28-41
 define coldcc void @foo_private_ptr2(float addrspace(5)* nocapture %p1, float addrspace(5)* nocapture %p2) {
 entry:
   %tmp1 = load float, float addrspace(5)* %p1, align 4
-  %cmp = fcmp ogt float %tmp1, 1.000000e+00
-  br i1 %cmp, label %if.then, label %if.end
-
-if.then:                                          ; preds = %entry
----------------
arsenm wrote:
> Why this test change? I would expect a separate version without the control flow?
Without the modification test @test_inliner_multi_pvt_ptr_cutoff starts to fail since I decreased the threshold multiplier and cost of the function started to be slightly higher.
The test is not about cotrol flow, we should check amdgpu-inline-arg-alloca-cutoff value: foo_private_ptr2 should be inlined in test_inliner_multi_pvt_ptr and shouldn't be inlined in test_inliner_multi_pvt_ptr_cutoff

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64642/new/

https://reviews.llvm.org/D64642