[PATCH] D143624: Inlining: Run the legacy AlwaysInliner before the regular inliner.
Amara Emerson via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon May 1 17:16:40 PDT 2023
aemerson added a comment.
Herald added a subscriber: jplehr.
@dmgreen I've been looking at this test again trying to see what's missing. The problem now is that only a VF of 4 is chosen. In the good case, instcombine/simplifyCFG runs so that it simplifies down to an `smin` intrinsic. After this change `__SSAT()` is inlined first. We then have:
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-linux-gnu"
define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 %blockSize) {
entry:
br label %while.cond
while.cond: ; preds = %while.body, %entry
%pSrcB.addr.0 = phi ptr [ %pSrcB, %entry ], [ %incdec.ptr1, %while.body ]
%pDst.addr.0 = phi ptr [ %pDst, %entry ], [ %incdec.ptr4, %while.body ]
%pSrcA.addr.0 = phi ptr [ %pSrcA, %entry ], [ %incdec.ptr, %while.body ]
%blkCnt.0 = phi i32 [ %blockSize, %entry ], [ %dec, %while.body ]
%cmp.not = icmp eq i32 %blkCnt.0, 0
br i1 %cmp.not, label %while.end, label %while.body
while.body: ; preds = %while.cond
%incdec.ptr = getelementptr inbounds i16, ptr %pSrcA.addr.0, i32 1
%0 = load i16, ptr %pSrcA.addr.0, align 2
%conv = sext i16 %0 to i32
%incdec.ptr1 = getelementptr inbounds i16, ptr %pSrcB.addr.0, i32 1
%1 = load i16, ptr %pSrcB.addr.0, align 2
%conv2 = sext i16 %1 to i32
%mul = mul nsw i32 %conv, %conv2
%shr = ashr i32 %mul, 15
%cmp4.i = icmp sgt i32 %shr, 32767
%switch.i = icmp ult i1 %cmp4.i, true
%spec.select.i = select i1 %switch.i, i32 %shr, i32 32767
%conv3 = trunc i32 %spec.select.i to i16
%incdec.ptr4 = getelementptr inbounds i16, ptr %pDst.addr.0, i32 1
store i16 %conv3, ptr %pDst.addr.0, align 2
%dec = add i32 %blkCnt.0, -1
br label %while.cond
while.end: ; preds = %while.cond
ret void
}
These instructions are from the callee that should now be combined into `smin`:
%cmp4.i = icmp sgt i32 %shr, 32767
%switch.i = icmp ult i1 %cmp4.i, true
%spec.select.i = select i1 %switch.i, i32 %shr, i32 32767
... except due to the surrounding instructions, the first icmp is optimized into
`icmp sgt i32 %mul, 1073741823` by `InstCombinerImpl::foldICmpInstWithConstant()`
This breaks the smin recognition. I'm not sure what the best approach is to fix this. InstCombine already has this chunk of code to try to avoid messing with compares that might form min/max patterns but it expects further simplification to fire:
// Test if the ICmpInst instruction is used exclusively by a select as
// part of a minimum or maximum operation. If so, refrain from doing
// any other folding. This helps out other analyses which understand
// non-obfuscated minimum and maximum idioms, such as ScalarEvolution
// and CodeGen. And in this case, at least one of the comparison
// operands has at least one user besides the compare (the select),
// which would often largely negate the benefit of folding anyway.
//
// Do the same for the other patterns recognized by matchSelectPattern.
if (I.hasOneUse())
if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) {
Value *A, *B;
SelectPatternResult SPR = matchSelectPattern(SI, A, B);
if (SPR.Flavor != SPF_UNKNOWN)
return nullptr;
}
Any ideas? I'd really like to get this inliner change in because it's fundamentally a good change to have.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D143624/new/
https://reviews.llvm.org/D143624
More information about the cfe-commits
mailing list