[PATCH] D18189: [X86][XOP] Support for VPPERM byte shuffle instruction

Wed Mar 23 17:59:47 PDT 2016

spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.

LGTM. See inline comments for a couple of small changes.


================
Comment at: lib/Target/X86/X86InstrXOP.td:225-244
@@ -224,1 +224,22 @@
 
+multiclass xop4op<bits<8> opc, string OpcodeStr, SDNode OpNode,
+                  ValueType vt128> {
+  def rr : IXOPi8<opc, MRMSrcReg, (outs VR128:$dst),
+           (ins VR128:$src1, VR128:$src2, VR128:$src3),
+           !strconcat(OpcodeStr,
+           "\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}"),
+           [(set VR128:$dst,
+             (vt128 (OpNode (vt128 VR128:$src1), (vt128 VR128:$src2),
+                            (vt128 VR128:$src3))))]>,
+           XOP_4V, VEX_I8IMM;
+  def rm : IXOPi8<opc, MRMSrcMem, (outs VR128:$dst),
+           (ins VR128:$src1, VR128:$src2, i128mem:$src3),
+           !strconcat(OpcodeStr,
+           "\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}"),
+           [(set VR128:$dst,
+             (vt128 (OpNode (vt128 VR128:$src1), (vt128 VR128:$src2),
+                            (vt128 (bitconvert (loadv2i64 addr:$src3))))))]>,
+           XOP_4V, VEX_I8IMM, VEX_W, MemOp4;
+  def mr : IXOPi8<opc, MRMSrcMem, (outs VR128:$dst),
+           (ins VR128:$src1, i128mem:$src2, VR128:$src3),
+           !strconcat(OpcodeStr,
----------------
The reg/mem suffixes here confused me at first. I realize this is copying existing code, but I'd prefer if these were more accurate for the 3 input operands: "rrr", "rrm", "rmr". It's fine if that's a separate commit for that NFC change.

================
Comment at: test/CodeGen/X86/vector-shuffle-combining-xop.ll:33-40
@@ +32,9 @@
+
+define <16 x i8> @combine_vpperm_as_unpckhwd(<16 x i8> %a0, <16 x i8> %a1) {
+; CHECK-LABEL: combine_vpperm_as_unpckhwd:
+; CHECK:       # BB#0:
+; CHECK-NEXT:    vpperm {{.*}}(%rip), %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    retq
+  %res0 = call <16 x i8> @llvm.x86.xop.vpperm(<16 x i8> %a0, <16 x i8> %a1, <16 x i8> <i8 8, i8 24, i8 9, i8 25, i8 10, i8 26, i8 11, i8 27, i8 12, i8 28, i8 13, i8 29, i8 14, i8 30, i8 15, i8 31>)
+  ret <16 x i8> %res0
+}
----------------
Add 2 tests for the load folding variants?


Repository:
  rL LLVM

http://reviews.llvm.org/D18189