[llvm-commits] [llvm] r123547 - /llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Mon Jan 17 04:24:34 PST 2011

On 16.01.2011, at 02:02, Chris Lattner wrote:

> One interesting thing that I see is that the calls are often of the form:
> 
>  icmp_ugt(ctpop(a), ctpop(b))
> 
> I wonder if there is some clever optimization for that case.
> 
> Another case I see in a few places is:
> 
>  %331 = tail call i64 @llvm.ctpop.i64(i64 %330) nounwind
>  %cast.i137 = trunc i64 %331 to i32
>  %332 = icmp ugt i32 %cast.i137, 1
> 
> Where both the trunc and the ctpop have one use.

I added a DAGCombine to turn this pattern into (x & x-1) != 0 in r123621.

> There are a few other interesting patterns that seem simplifiable:
> 
> ; <label>:178                                     ; preds = %176
> ...
>  %183 = tail call i64 @llvm.ctpop.i64(i64 %182) nounwind
>  %cast.i107 = trunc i64 %183 to i32
>  %184 = getelementptr inbounds [64 x i64]* @w_pawn_attacks, i64 0, i64 %179
>  %185 = load i64* %184, align 8, !tbaa !2
>  %186 = and i64 %68, %185
>  %187 = tail call i64 @llvm.ctpop.i64(i64 %186) nounwind
>  %cast.i108 = trunc i64 %187 to i32
>  %188 = getelementptr inbounds [65 x i64]* @set_mask, i64 0, i64 %179
>  %189 = load i64* %188, align 8, !tbaa !2
>  %190 = and i64 %120, %189
>  %191 = icmp eq i64 %190, 0
>  br i1 %191, label %192, label %.thread207
> 
> ; <label>:192                                     ; preds = %178
>  %193 = icmp ugt i32 %cast.i108, %cast.i107
>  br i1 %193, label %197, label %194
> 
> ; <label>:194                                     ; preds = %192
>  %195 = icmp eq i32 %cast.i107, 0
>  %196 = icmp ult i32 %cast.i107, %cast.i108
>  %or.cond246 = or i1 %195, %196
>  %indvar.next433 = add i32 %indvar432, 1
>  br i1 %or.cond246, label %176, label %.thread218
> 
> ; <label>:197                                     ; preds = %192
>  %198 = sub nsw i32 %cast.i108, %cast.i107
>  %199 = icmp eq i32 %cast.i108, %cast.i107
>  br i1 %199, label %.thread218, label %.thread207
> 
> 
> "%195" seems like it is just "icmp ne i64 %182, 0"
> 
> %196 seems like it is the same thing as %193.  I wonder if we should always canonicalizing icmps to "lt" comparisons when both operands are non-constant.  It seems that this would expose more CSEs.

Either that or teach our CSE machinery to eliminate commuted instructions.

> Here another case that might allow cleverness:
> 
>  %437 = tail call i64 @llvm.ctpop.i64(i64 %436) nounwind
>  %cast.i138 = trunc i64 %437 to i32
> ...
>  %440 = tail call i64 @llvm.ctpop.i64(i64 %439) nounwind
>  %cast.i139 = trunc i64 %440 to i32
>  %441 = sub nsw i32 %cast.i139, %cast.i138
>  %442 = icmp eq i32 %441, 2
> 
> Here's another obvious case:
> 
>  %592 = tail call i64 @llvm.ctpop.i64(i64 %591) nounwind
>  %cast.i176 = trunc i64 %592 to i32
>  %593 = icmp eq i32 %cast.i176, 0
> 
> In this case, 592 has multiple uses.  It seems that we should be able to eliminate the trunc though since we know the top bits are zero.

Actually %cast.i176 has multiple uses, so folding the trunc into the icmp isn't profitable here.

> Here's another interesting pattern:
> 
>  %778 = tail call i64 @llvm.ctpop.i64(i64 %777) nounwind
>  %cast.i195 = trunc i64 %778 to i32
> ...
>  %781 = tail call i64 @llvm.ctpop.i64(i64 %780) nounwind
>  %cast.i196 = trunc i64 %781 to i32
>  %782 = sub nsw i32 %cast.i195, %cast.i196
>  %783 = icmp eq i32 %782, 2
>  br i1 %783, label %.loopexit.thread, label %784
> 
> ; <label>:784                                     ; preds = %775
>  switch i32 %cast.i195, label %.thread245 [
>    i32 1, label %785
>    i32 0, label %786
>  ]
> 
> ; <label>:785                                     ; preds = %784
>  %.not9 = icmp ne i32 %cast.i196, 0
>  ...
> 
> 
> And:
>  %1037 = tail call i64 @llvm.ctpop.i64(i64 %1036) nounwind
>  %1038 = icmp eq i64 %1037, 3
> 
> These "popcount = 3" and "popcount < 2" sorts of cases seems that they could use a couple iterations of the unrolled "a &= a-1" checks or something, instead of computing the full computation.

popcount < 2 is caught by the new DAGCombine.

If we want to expand popcount = 3 we would have to emit 3 branches:
a != 0 && (a &= a-1) != 0 && (a & a-1) == 0
I don't think that's worth it.

> For example, the top of GenerateCheckEvasions has "popcount(x) == 1" which seems that it could be something like "x != 0 && (x & (x-1) == 0)" cheaper than expanding the popcount.  This sort of thing is a bad idea of ctpop expands to a single cycle instruction though, so this is probably best to do in dag combine instead of instcombine.

Agreed, I planted a TODO in the DAGCombiner. I don't know how to lower this to get optimal code though.