[PATCH] D64642: [AMDGPU] Tune inlining parameters for AMDGPU target
Daniil Fukalov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 15 08:59:06 PDT 2019
dfukalov marked 2 inline comments as done.
dfukalov added inline comments.
================
Comment at: llvm/lib/Analysis/InlineCost.cpp:883
//
- // Vector bonuses: We want to more aggressively inline vector-dense kernels
- // and apply this bonus based on the percentage of vector instructions. A
----------------
arsenm wrote:
> How does it decide what "vector dense" means? We already report costs that approximately say scalarize everything, and scalarization is free
They estimate this "dense" by a percent of LLVM IR instructions with vector arguments. So if a function contains more than 50% of vector instructions this bonus added to threshold. For 10%-50% vector instructions cases they add half of the bonus.
I guess this logic of bonuses is based on x86 extensions like MMX and others.
================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu-inline.ll:28-41
define coldcc void @foo_private_ptr2(float addrspace(5)* nocapture %p1, float addrspace(5)* nocapture %p2) {
entry:
%tmp1 = load float, float addrspace(5)* %p1, align 4
- %cmp = fcmp ogt float %tmp1, 1.000000e+00
- br i1 %cmp, label %if.then, label %if.end
-
-if.then: ; preds = %entry
----------------
arsenm wrote:
> Why this test change? I would expect a separate version without the control flow?
Without the modification test @test_inliner_multi_pvt_ptr_cutoff starts to fail since I decreased the threshold multiplier and cost of the function started to be slightly higher.
The test is not about cotrol flow, we should check amdgpu-inline-arg-alloca-cutoff value: foo_private_ptr2 should be inlined in test_inliner_multi_pvt_ptr and shouldn't be inlined in test_inliner_multi_pvt_ptr_cutoff
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64642/new/
https://reviews.llvm.org/D64642
More information about the llvm-commits
mailing list