[llvm-commits] [llvm] r123547 - /llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Sun Jan 16 12:13:28 PST 2011

Hi Chris,

> One major advantage of recognizing popcount though is that the optimizer has a chance of hacking on it, and there are a lot of instcombine xforms that we could do.  Here are some that I note looking at the bc file for crafy (when hacked to use builtin_popcount).  I attached the bc file below if you're interested.
>
> One interesting thing that I see is that the calls are often of the form:
>
>    icmp_ugt(ctpop(a), ctpop(b))
>
> I wonder if there is some clever optimization for that case.

I didn't add intrinsics to my auto-simplifier yet, but I plan to.  It only finds
simplifications in the InstructionSimplify style for the moment (i.e. when some
existing subset of the IR generates the final result) but that might still be
interesting for ctpop and friends.

Ciao, Duncan.

>
> Another case I see in a few places is:
>
>    %331 = tail call i64 @llvm.ctpop.i64(i64 %330) nounwind
>    %cast.i137 = trunc i64 %331 to i32
>    %332 = icmp ugt i32 %cast.i137, 1
>
> Where both the trunc and the ctpop have one use.
>
>
> There are a few other interesting patterns that seem simplifiable:
>
> ;<label>:178                                     ; preds = %176
> ...
>    %183 = tail call i64 @llvm.ctpop.i64(i64 %182) nounwind
>    %cast.i107 = trunc i64 %183 to i32
>    %184 = getelementptr inbounds [64 x i64]* @w_pawn_attacks, i64 0, i64 %179
>    %185 = load i64* %184, align 8, !tbaa !2
>    %186 = and i64 %68, %185
>    %187 = tail call i64 @llvm.ctpop.i64(i64 %186) nounwind
>    %cast.i108 = trunc i64 %187 to i32
>    %188 = getelementptr inbounds [65 x i64]* @set_mask, i64 0, i64 %179
>    %189 = load i64* %188, align 8, !tbaa !2
>    %190 = and i64 %120, %189
>    %191 = icmp eq i64 %190, 0
>    br i1 %191, label %192, label %.thread207
>
> ;<label>:192                                     ; preds = %178
>    %193 = icmp ugt i32 %cast.i108, %cast.i107
>    br i1 %193, label %197, label %194
>
> ;<label>:194                                     ; preds = %192
>    %195 = icmp eq i32 %cast.i107, 0
>    %196 = icmp ult i32 %cast.i107, %cast.i108
>    %or.cond246 = or i1 %195, %196
>    %indvar.next433 = add i32 %indvar432, 1
>    br i1 %or.cond246, label %176, label %.thread218
>
> ;<label>:197                                     ; preds = %192
>    %198 = sub nsw i32 %cast.i108, %cast.i107
>    %199 = icmp eq i32 %cast.i108, %cast.i107
>    br i1 %199, label %.thread218, label %.thread207
>
>
> "%195" seems like it is just "icmp ne i64 %182, 0"
>
> %196 seems like it is the same thing as %193.  I wonder if we should always canonicalizing icmps to "lt" comparisons when both operands are non-constant.  It seems that this would expose more CSEs.
>
>
> Here another case that might allow cleverness:
>
>    %437 = tail call i64 @llvm.ctpop.i64(i64 %436) nounwind
>    %cast.i138 = trunc i64 %437 to i32
> ...
>    %440 = tail call i64 @llvm.ctpop.i64(i64 %439) nounwind
>    %cast.i139 = trunc i64 %440 to i32
>    %441 = sub nsw i32 %cast.i139, %cast.i138
>    %442 = icmp eq i32 %441, 2
>
> Here's another obvious case:
>
>    %592 = tail call i64 @llvm.ctpop.i64(i64 %591) nounwind
>    %cast.i176 = trunc i64 %592 to i32
>    %593 = icmp eq i32 %cast.i176, 0
>
> In this case, 592 has multiple uses.  It seems that we should be able to eliminate the trunc though since we know the top bits are zero.
>
>
>
> Here's another interesting pattern:
>
>    %778 = tail call i64 @llvm.ctpop.i64(i64 %777) nounwind
>    %cast.i195 = trunc i64 %778 to i32
> ...
>    %781 = tail call i64 @llvm.ctpop.i64(i64 %780) nounwind
>    %cast.i196 = trunc i64 %781 to i32
>    %782 = sub nsw i32 %cast.i195, %cast.i196
>    %783 = icmp eq i32 %782, 2
>    br i1 %783, label %.loopexit.thread, label %784
>
> ;<label>:784                                     ; preds = %775
>    switch i32 %cast.i195, label %.thread245 [
>      i32 1, label %785
>      i32 0, label %786
>    ]
>
> ;<label>:785                                     ; preds = %784
>    %.not9 = icmp ne i32 %cast.i196, 0
>    ...
>
>
> And:
>    %1037 = tail call i64 @llvm.ctpop.i64(i64 %1036) nounwind
>    %1038 = icmp eq i64 %1037, 3
>
> These "popcount = 3" and "popcount<  2" sorts of cases seems that they could use a couple iterations of the unrolled "a&= a-1" checks or something, instead of computing the full computation.
>
> For example, the top of GenerateCheckEvasions has "popcount(x) == 1" which seems that it could be something like "x != 0&&  (x&  (x-1) == 0)" cheaper than expanding the popcount.  This sort of thing is a bad idea of ctpop expands to a single cycle instruction though, so this is probably best to do in dag combine instead of instcombine.