[llvm-commits] [llvm] r123547 - /llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
Benjamin Kramer
benny.kra at googlemail.com
Mon Jan 17 04:24:34 PST 2011
On 16.01.2011, at 02:02, Chris Lattner wrote:
> One interesting thing that I see is that the calls are often of the form:
>
> icmp_ugt(ctpop(a), ctpop(b))
>
> I wonder if there is some clever optimization for that case.
>
> Another case I see in a few places is:
>
> %331 = tail call i64 @llvm.ctpop.i64(i64 %330) nounwind
> %cast.i137 = trunc i64 %331 to i32
> %332 = icmp ugt i32 %cast.i137, 1
>
> Where both the trunc and the ctpop have one use.
I added a DAGCombine to turn this pattern into (x & x-1) != 0 in r123621.
> There are a few other interesting patterns that seem simplifiable:
>
> ; <label>:178 ; preds = %176
> ...
> %183 = tail call i64 @llvm.ctpop.i64(i64 %182) nounwind
> %cast.i107 = trunc i64 %183 to i32
> %184 = getelementptr inbounds [64 x i64]* @w_pawn_attacks, i64 0, i64 %179
> %185 = load i64* %184, align 8, !tbaa !2
> %186 = and i64 %68, %185
> %187 = tail call i64 @llvm.ctpop.i64(i64 %186) nounwind
> %cast.i108 = trunc i64 %187 to i32
> %188 = getelementptr inbounds [65 x i64]* @set_mask, i64 0, i64 %179
> %189 = load i64* %188, align 8, !tbaa !2
> %190 = and i64 %120, %189
> %191 = icmp eq i64 %190, 0
> br i1 %191, label %192, label %.thread207
>
> ; <label>:192 ; preds = %178
> %193 = icmp ugt i32 %cast.i108, %cast.i107
> br i1 %193, label %197, label %194
>
> ; <label>:194 ; preds = %192
> %195 = icmp eq i32 %cast.i107, 0
> %196 = icmp ult i32 %cast.i107, %cast.i108
> %or.cond246 = or i1 %195, %196
> %indvar.next433 = add i32 %indvar432, 1
> br i1 %or.cond246, label %176, label %.thread218
>
> ; <label>:197 ; preds = %192
> %198 = sub nsw i32 %cast.i108, %cast.i107
> %199 = icmp eq i32 %cast.i108, %cast.i107
> br i1 %199, label %.thread218, label %.thread207
>
>
> "%195" seems like it is just "icmp ne i64 %182, 0"
>
> %196 seems like it is the same thing as %193. I wonder if we should always canonicalizing icmps to "lt" comparisons when both operands are non-constant. It seems that this would expose more CSEs.
Either that or teach our CSE machinery to eliminate commuted instructions.
> Here another case that might allow cleverness:
>
> %437 = tail call i64 @llvm.ctpop.i64(i64 %436) nounwind
> %cast.i138 = trunc i64 %437 to i32
> ...
> %440 = tail call i64 @llvm.ctpop.i64(i64 %439) nounwind
> %cast.i139 = trunc i64 %440 to i32
> %441 = sub nsw i32 %cast.i139, %cast.i138
> %442 = icmp eq i32 %441, 2
>
> Here's another obvious case:
>
> %592 = tail call i64 @llvm.ctpop.i64(i64 %591) nounwind
> %cast.i176 = trunc i64 %592 to i32
> %593 = icmp eq i32 %cast.i176, 0
>
> In this case, 592 has multiple uses. It seems that we should be able to eliminate the trunc though since we know the top bits are zero.
Actually %cast.i176 has multiple uses, so folding the trunc into the icmp isn't profitable here.
> Here's another interesting pattern:
>
> %778 = tail call i64 @llvm.ctpop.i64(i64 %777) nounwind
> %cast.i195 = trunc i64 %778 to i32
> ...
> %781 = tail call i64 @llvm.ctpop.i64(i64 %780) nounwind
> %cast.i196 = trunc i64 %781 to i32
> %782 = sub nsw i32 %cast.i195, %cast.i196
> %783 = icmp eq i32 %782, 2
> br i1 %783, label %.loopexit.thread, label %784
>
> ; <label>:784 ; preds = %775
> switch i32 %cast.i195, label %.thread245 [
> i32 1, label %785
> i32 0, label %786
> ]
>
> ; <label>:785 ; preds = %784
> %.not9 = icmp ne i32 %cast.i196, 0
> ...
>
>
> And:
> %1037 = tail call i64 @llvm.ctpop.i64(i64 %1036) nounwind
> %1038 = icmp eq i64 %1037, 3
>
> These "popcount = 3" and "popcount < 2" sorts of cases seems that they could use a couple iterations of the unrolled "a &= a-1" checks or something, instead of computing the full computation.
popcount < 2 is caught by the new DAGCombine.
If we want to expand popcount = 3 we would have to emit 3 branches:
a != 0 && (a &= a-1) != 0 && (a & a-1) == 0
I don't think that's worth it.
> For example, the top of GenerateCheckEvasions has "popcount(x) == 1" which seems that it could be something like "x != 0 && (x & (x-1) == 0)" cheaper than expanding the popcount. This sort of thing is a bad idea of ctpop expands to a single cycle instruction though, so this is probably best to do in dag combine instead of instcombine.
Agreed, I planted a TODO in the DAGCombiner. I don't know how to lower this to get optimal code though.
More information about the llvm-commits
mailing list