[PATCH] D46536: [Power9]Legalize and emit code for W vector extract and convert to Quad-Precision
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 11 04:30:37 PDT 2018
nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.
I think we should review the code sequences for signed conversions and make more consistent use of loops.
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3165
(EXTRACT_SUBREG (XXPERMDI $src, $src, 3), sub_64)))>;
- }
+
+ // (Un)Signed Word vector extract -> QP
----------------
This probably needs `let Predicates = [IsBigEndian, HasP9Vector]` right?
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3168
+ def : Pat<(f128 (sint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+ (f128 (XSCVSDQP (EXTRACT_SUBREG (XVCVSXWDP $src), sub_64)))>;
+ foreach Idx = 1-3 in {
----------------
Is this sequence actually correct? We convert a vector of 4 4-byte integers into a vector of 2 8-byte double precision floating point values. Then we treat it as a signed 8-byte integer and convert it to a 16-byte floating point value. Shouldn't the outer instruction be `xscvdpqp`?
In any case, `vextsw2d -> xscvsdqp` is a much lower latency sequence than this. Why not use that?
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3176
+ def : Pat<(f128 (uint_to_fp (i32 (extractelt v4i32:$src, Idx)))),
+ (f128 (XSCVUDQP (XXEXTRACTUW $src, !add(Idx, Idx, Idx, Idx))))>;
+ }
----------------
Is there no `mul` function in TableGen? i.e. can't we just write `!mul(Idx, 4)`?
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3194
(f128 (XSCVUDQP (COPY_TO_REGCLASS $src, VFRC)))>;
- }
+
+ // (Un)Signed Word vector extract -> QP
----------------
Same note regarding the predicate.
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3196
+ // (Un)Signed Word vector extract -> QP
+ def : Pat<(f128 (sint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+ (f128 (XSCVSDQP (EXTRACT_SUBREG
----------------
To be consistent, I think you should write these as a neat for-loop as you did above. The element would be `Idx` and the splat index would be `!sub(3, Idx)`. Wouldn't that work?
================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:3207
+ (f128 (XSCVSDQP (EXTRACT_SUBREG (XVCVSXWDP $src), sub_64)))>;
+ def : Pat<(f128 (uint_to_fp (i32 (extractelt v4i32:$src, 0)))),
+ (f128 (XSCVUDQP (XXEXTRACTUW $src, 12)))>;
----------------
Same thing here, a for-loop would be nicer and more consistent.
https://reviews.llvm.org/D46536
More information about the llvm-commits
mailing list