[PATCH] D116804: [x86] use SETCC_CARRY instead of SBB node for select lowering

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 2 08:12:13 PST 2022


craig.topper added a comment.

In D116804#3290586 <https://reviews.llvm.org/D116804#3290586>, @spatel wrote:

> In D116804#3289352 <https://reviews.llvm.org/D116804#3289352>, @craig.topper wrote:
>
>> In D116804#3289136 <https://reviews.llvm.org/D116804#3289136>, @apostolakis wrote:
>>
>>> In D116804#3287531 <https://reviews.llvm.org/D116804#3287531>, @spatel wrote:
>>>
>>>> In D116804#3286618 <https://reviews.llvm.org/D116804#3286618>, @apostolakis wrote:
>>>>
>>>>> We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
>>>>> Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
>>>>>
>>>>>   mov   (%rbx),%rax 
>>>>>   sbb   %rax,%rax
>>>>>   or    %rdx,%rax 
>>>>
>>>> Do you have the IR for the function where that appears?
>>>
>>> Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
>>> Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)
>>>
>>>> If we're going to need a tuning flag similar to `TuningPOPCNTFalseDep`, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).
>>>
>>> This sounds okay to me.
>>
>> Rather than modifying the code touched by this patch with a new tuning flag, should we do it in the X86ISelDAGToDAG.cpp where SETCC_CARRY is selected?
>
> Yes, good point. I suspect it would be hard to come up with a test where there's a difference, but the later we make the conversion, the better.
> I did look at BreakFalseDeps a bit, and I don't think it can work for this case because we convert the X86ISD::SETCC_CARRY into a X86::SETB_C32r pseudo op, and that has no general register operands for BreakFalseDeps to detect.

Won't SETB_C32r be turned into SBB32rr by expandPostRAPseudo before we get to BreakFalseDeps? But I don't think BreakFalseDeps is prepared to deal with a tied dest/src that's also used by another source.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116804/new/

https://reviews.llvm.org/D116804



More information about the llvm-commits mailing list