[PATCH] D116804: [x86] use SETCC_CARRY instead of SBB node for select lowering

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 2 08:00:44 PST 2022


spatel added a comment.

In D116804#3289352 <https://reviews.llvm.org/D116804#3289352>, @craig.topper wrote:

> In D116804#3289136 <https://reviews.llvm.org/D116804#3289136>, @apostolakis wrote:
>
>> In D116804#3287531 <https://reviews.llvm.org/D116804#3287531>, @spatel wrote:
>>
>>> In D116804#3286618 <https://reviews.llvm.org/D116804#3286618>, @apostolakis wrote:
>>>
>>>> We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
>>>> Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
>>>>
>>>>   mov   (%rbx),%rax 
>>>>   sbb   %rax,%rax
>>>>   or    %rdx,%rax 
>>>
>>> Do you have the IR for the function where that appears?
>>
>> Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
>> Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)
>>
>>> If we're going to need a tuning flag similar to `TuningPOPCNTFalseDep`, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).
>>
>> This sounds okay to me.
>
> Rather than modifying the code touched by this patch with a new tuning flag, should we do it in the X86ISelDAGToDAG.cpp where SETCC_CARRY is selected?

Yes, good point. I suspect it would be hard to come up with a test where there's a difference, but the later we make the conversion, the better.
I did look at BreakFalseDeps a bit, and I don't think it can work for this case because we convert the X86ISD::SETCC_CARRY into a X86::SETB_C32r pseudo op, and that has no general register operands for BreakFalseDeps to detect.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116804/new/

https://reviews.llvm.org/D116804



More information about the llvm-commits mailing list