[PATCH] D14151: [X86][AVX] Fix lowering of X86ISD::VZEXT_MOVL for 128-bit -> 256-bit extension

Thu Nov 5 08:25:31 PST 2015

andreadb added a comment.

Hi Simon,


================
Comment at: lib/Target/X86/X86InstrSSE.td:7207-7227
@@ -7221,2 +7206,23 @@
 
+  def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
+                   (v4i32 VR128:$src), (iPTR 0)))),
+            (SUBREG_TO_REG (i32 0),
+                           (VPBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 3)),
+                           sub_xmm)>;
+  def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
+                   (v2i64 VR128:$src), (iPTR 0)))),
+            (SUBREG_TO_REG (i32 0),
+                           (VPBLENDWrri (v4i32 (V_SET0)), VR128:$src, (i8 15)),
+                           sub_xmm)>;
+  def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
+                   (v4f32 VR128:$src), (iPTR 0)))),
+            (SUBREG_TO_REG (i32 0),
+                           (VBLENDPSrri (v4f32 (V_SET0)), VR128:$src, (i8 1)),
+                           sub_xmm)>;
+  def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
+                   (v2f64 VR128:$src), (iPTR 0)))),
+            (SUBREG_TO_REG (i32 0),
+                           (VBLENDPDrri (v2f64 (V_SET0)), VR128:$src, (i8 1)),
+                           sub_xmm)>;
+
   // These will incur an FP/int domain crossing penalty, but it may be the only
----------------
I don't think these new patterns are needed. We already have sse4.1/avx patterns to select a blend from a vzmovl node.

If your goal is just to fix the miscompile, then the minimal fix consists in removing the offending patterns between lines 939 and 952.

The poor codegen reported by Jeroen is caused by the lack of smart x86 combine rules for 256-bit shuffles in function 'PerformShuffleCombine256'. That function implements a very simple rule for when there is a shuffle between two concat_vector nodes. Ideally we should extend it and add rules for the case where the second operand is a build_vector of all zeroes.

Currently we check if a shuffle takes as input two concat_vectors and we try to fold it to a zero extending load or an insert of a 128-bit vector into a zero vector.
I think that we are just missing rules for the case where we are inserting a 64/32-bit quantity in a zero vector.


Repository:
  rL LLVM

http://reviews.llvm.org/D14151