[PATCH] D33076: [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.

Thu May 11 15:54:54 PDT 2017

iteratee accepted this revision.
iteratee added a comment.
This revision is now accepted and ready to land.

Looks fine with the comments I gave addressed.

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrAltivec.td:1099
+
+def : Pat<(v2i64 (shl v2i64:$vA, v2i64:$vB)),
+          (v2i64 (VSLD $vA, $vB))>;
----------------
Minor point, the patterns above are grouped by type and then by PPCxxx vs xxx
Can you group these the same?

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrAltivec.td:990
           (v4i32 (VSLW $vA, $vB))>;
+def : Pat<(v16i8 (PPCshl v16i8:$vA, v16i8:$vB)),
+          (v16i8 (VSLB $vA, $vB))>;
----------------
timshen wrote:
> iteratee wrote:
> > timshen wrote:
> > > iteratee wrote:
> > > > timshen wrote:
> > > > > iteratee wrote:
> > > > > > timshen wrote:
> > > > > > > iteratee wrote:
> > > > > > > > Can these patterns go in a separate patch? They only seem partially related.
> > > > > > > They can, the problem is that there is no way to test that patch, since no one generates PPCshl on vector until this patch.
> > > > > > > 
> > > > > > > Do you think it'd be ok to have a separate patch being unable to test?
> > > > > > If you add the patterns, they should be generated. That was what happened when I added the vector shift patterns for v1i128
> > > > > The patterns will be generated, but they will match nothing at the time (therefore not easy to test).
> > > > > 
> > > > > This is because no one generate the SDNodes "PPCshl", aka PPCISD::SHL, for vector operations, without the changes in stripModuloOnShift().
> > > > OK, why do you need a separate node for shift?
> > > 1) It already exists for scalar types. I simply extend it for vector types as well.
> > > 2) They are different because shl (ISD::SHL) has UB, but PPCshl (PPCISD::SHL) doesn't.
> > > 
> > > For example, (shl (i32 a), b) is UB when b >= 32. (PPCshl (i32 a), b), however, performs a << (b % 32) on vector types, as the instructions do too.
> > > 
> > > See discussion here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170508/452111.html
> > OK, I get why you're doing it in dagcombiner, but why can't you just produce the instructions directly? Why do you need the intermediate nodes?
> Because I don't want to manually lower it to 8 * 2 = 16 instructions in C++ code. I prefer to write .td patterns for that.
> 
> Without PPCISD::SHL I either write C++, or write pattern to match ISD::SHL. But ISD::SHL doesn't have the semantic we want.
OK. I get the reasoning behind delaying the output.

https://reviews.llvm.org/D33076