[PATCH] D19087: [x86, ppc] prefer comparisons against zero for and+cmp sequences

Thu Apr 14 11:51:37 PDT 2016

hfinkel added a comment.

In http://reviews.llvm.org/D19087#401302, @spatel wrote:

> In http://reviews.llvm.org/D19087#401155, @hfinkel wrote:
>
> > In http://reviews.llvm.org/D19087#401119, @spatel wrote:
> >
> > > In http://reviews.llvm.org/D19087#400599, @hfinkel wrote:
> > >
> > > > This certainly a good pattern to catch, but I'm not sure that CGP is the right place for this. We generally have things in CGP to work-around the fact that SDAG/ISel is basic-block local. This kind of thing seems much more natural as a something that should be in DAGCombine.
> > >
> > >
> > > Yes, I initially thought I'd do this as a DAGCombine, but then I realized that we'd need "isKnownToBeAPowerOfTwo()", and I don't see an SDNode equivalent for the IR version. I'm not sure where we draw the line between CGP and DAGCombine, but duplicating isKnownToBeAPowerOfTwo() scared me away.
> >
> >
> > Why? How do we select the instructions anyway? For x86 or ppc, would we not want to limit the applicability to constants? It needs to be a constant for rlwinm I'd imagine, and bt too?. For constants, we can check this with existing functionality.
>
>
> isKnownToBeAPowerOfTwo() is ~100 lines and recursive. We can approximate it using computeKnownBits(), but I think getting the full power of the IR version would mean duplicating the code.
>
> One of the x86 regression tests shows the case where a simple constant check won't do:
>
>   define i1 @and_cmp_const_power_of_two(i32 %x, i32 %y) {
>     %shl = shl i32 1, %y
>     %and = and i32 %x, %shl
>     %cmp = icmp ne i32 %and, %shl
>     ret i1 %cmp
>   }
>   
>
> X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.

Okay, but in the end, the check used has to match the capabilities of the check used in the backend for lowering. In this case, it needs to use computeKnownBits(). Regardless, in this case, as I understand it, it is not enough to know that the number is some power of 2, we need to know which power of 2.

In http://reviews.llvm.org/D19087#401505, @spatel wrote:

> In http://reviews.llvm.org/D19087#401346, @hfinkel wrote:
>
> >
>
>
> ...
>
> > > X86TargetLowering::LowerToBT() uses computeKnownBits() to know that's a power-of-2 mask.
>
> > 
>
> > 
>
> > Okay, but in the end, the check used has to match the capabilities of the check used in the backend for lowering. In this case, it needs to use computeKnownBits(). Regardless, in this case, as I understand it, it is not enough to know that the number is some power of 2, we need to know which power of 2.
>
>
> I don't understand the last statement. For x86, the test case demonstrates that "some power-of-2" is the requirement to avoid impeding lowering to 'bt'. For PPC, I think the requirement is the same - we could use a variable shift instruction (eg, 'rlwnm' rather than 'rlwinm').
>
> If the PPC lowering is not doing that, wouldn't it be considered an optimization bug? I suppose this leads us to the conclusion that we really do need to duplicate isKnownToBeAPowerOfTwo() for SDNodes so everyone can use it.

You're right, it does not need to be a constant. However, we still need to know how to determine which power of two it will be (i.e. the shift amount), even if not a constant. That is stronger than just knowing it is some power of two. In any case, we should move this to DAGCombine and sync the associated logic with that used for instruction selection.

> Alternatively, we could wait even longer to attempt this transform - MachineCombiner? - so we avoid stepping on the other optimization. That looks simple enough for the x86 tests, but PPC would take a lot of work.

Not sure why PPC would take more work than x86 (both backends currently use the MachineCombiner for other purposes). Regardless, I'd not jump here unless necessary.

http://reviews.llvm.org/D19087