[llvm-commits] [llvm] r123621 - in /llvm/trunk: lib/CodeGen/SelectionDAG/TargetLowering.cpp test/CodeGen/X86/ctpop-combine.ll

Tue Jan 18 09:29:26 PST 2011

On Jan 18, 2011, at 7:38 AM, Benjamin Kramer wrote:
>> This seems like an important loop!  One simple thing that I see is that we have:
>> 
>> ...
>>       imulq   %r9, %r13
>>       shrq    $56, %r13
>> ...
>>       imulq   %r9, %rbp
>>       shrq    $56, %rbp
>>       subl    %r13d, %ebp
>> ...
>>       cmpl    $2, %ebp
>>       jne     LBB4_14
>> 
>> Is there some cheaper way to do ((x*0x101010101010101)>>56)-((y*0x101010101010101)>>56) != 2?
> 
> Applying distributive laws yields ((x-y)*0x101010101010101)>>56 != 2, however adding a
> "machine distribution" pass just to catch this one feels like overkill.

Interesting! If that was safe then this could recursively simplify even more.  Is distribution safe across shifts though?  Instcombine doesn't simplify this, and doesn't simplify the similar code with add instead of sub:

define i32 @test2(i64 %x, i64 %y) nounwind readnone ssp {
entry:
  %shr = lshr i64 %x, 56
  %shr2 = lshr i64 %y, 56
  %sub = sub i64 %shr, %shr2
  %conv = trunc i64 %sub to i32
  ret i32 %conv
}

I think we'd have to "and" out the low bits to avoid them percolating up into the 56'th bit.

> Is there an easier way to catch such simplifications at codegen time?

There isn't an easier way to catch things like this than adding them to dagcombine unfortunately.

>> In Evaluate, things are more difficult to analyze.  One commonly repeated pattern is a "conditional negate", which looks like this in IR:
>> 
>> DrawScore.exit:                                   ; preds = %186, %180, %171
>> %188 = phi i32 [ %172, %171 ], [ %storemerge1.i, %180 ], [ %.392, %186 ]
>> %189 = sub nsw i32 0, %188
>> %190 = select i1 %170, i32 %189, i32 %188
>> 
>> and codegen's to:
>> 
>> LBB3_62:                                ## %DrawScore.exit
>>       movl    %edx, %eax
>>       negl    %eax
>>       testl   %ecx, %ecx
>>       cmovnel %edx, %eax
>> 
>> I don't know if there is a trickier way to do that with flags.
> 
> One possibility would be
> cmp $1, %ecx
> sbbl %eax, %eax
> (notl %eax)
> xorl %edx, %eax
> 
> Some processors can even ignore the fake sbb dependency on %eax so this should be slightly
> faster than the cmov code. The existing conditional increment X86 DAGcombine can probably
> be extended to handle this.

Nice. cmov is often relatively high latency, I bet that this would be a win.

Thanks Benjamin!

-Chris