[PATCH] D46536: [Power9]Legalize and emit code for W vector extract and convert to Quad-Precision

Fri May 11 04:30:37 PDT 2018

nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.

I think we should review the code sequences for signed conversions and make more consistent use of loops.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3165
                       (EXTRACT_SUBREG (XXPERMDI $src, $src, 3), sub_64)))>;
-  }
+
+    // (Un)Signed Word vector extract -> QP
----------------
This probably needs `let Predicates = [IsBigEndian, HasP9Vector]` right?

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3168
+    def : Pat<(f128 (sint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+              (f128 (XSCVSDQP (EXTRACT_SUBREG (XVCVSXWDP $src), sub_64)))>;
+    foreach Idx = 1-3 in {
----------------
Is this sequence actually correct? We convert a vector of 4 4-byte integers into a vector of 2 8-byte double precision floating point values. Then we treat it as a signed 8-byte integer and convert it to a 16-byte floating point value. Shouldn't the outer instruction be `xscvdpqp`?

In any case, `vextsw2d -> xscvsdqp` is a much lower latency sequence than this. Why not use that?

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3176
+      def : Pat<(f128 (uint_to_fp (i32 (extractelt v4i32:$src, Idx)))),
+                (f128 (XSCVUDQP (XXEXTRACTUW $src, !add(Idx, Idx, Idx, Idx))))>;
+    }
----------------
Is there no `mul` function in TableGen? i.e. can't we just write `!mul(Idx, 4)`?

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3194
               (f128 (XSCVUDQP (COPY_TO_REGCLASS $src, VFRC)))>;
-  }
+
+    // (Un)Signed Word vector extract -> QP
----------------
Same note regarding the predicate.

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3196
+    // (Un)Signed Word vector extract -> QP
+    def : Pat<(f128 (sint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+              (f128 (XSCVSDQP (EXTRACT_SUBREG
----------------
To be consistent, I think you should write these as a neat for-loop as you did above. The element would be `Idx` and the splat index would be `!sub(3, Idx)`. Wouldn't that work?

================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3207
+              (f128 (XSCVSDQP (EXTRACT_SUBREG (XVCVSXWDP $src), sub_64)))>;
+    def : Pat<(f128 (uint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+              (f128 (XSCVUDQP (XXEXTRACTUW $src, 12)))>;
----------------
Same thing here, a for-loop would be nicer and more consistent.

https://reviews.llvm.org/D46536