[PATCH] D16067: Redundant vmov instruction generated with vcvtph2ps

Mon Jan 11 07:50:01 PST 2016

rob.lougher created this revision.
rob.lougher added reviewers: RKSimon, qcolombet.
rob.lougher added a subscriber: llvm-commits.

Since revision 248784 the following code generates a redundant vmov:

__m128 test1(__m128i const *src) {
    return _mm_cvtph_ps(_mm_loadl_epi64(src));
}

Output:

vmovq   (%rdi), %xmm0           # xmm0 = mem[0],zero
vcvtph2ps       %xmm0, %xmm0
retq

The regression was caused by a change made in revision 247504 which teaches the instruction combiner that only the lower 64 bits of a 128-bit vcvtph2ps are used.  The IR for the above code is:

define <4 x float> @_Z4testPKDv2_x(<2 x i64>* nocapture readonly %src) #0 {
entry:
  %__u.i = getelementptr inbounds <2 x i64>, <2 x i64>* %src, i64 0, i64 0
  %0 = load i64, i64* %__u.i, align 1, !tbaa !1
  %vecinit.i = insertelement <2 x i64> undef, i64 %0, i32 0
  %vecinit1.i = insertelement <2 x i64> %vecinit.i, i64 0, i32 1
  %1 = bitcast <2 x i64> %vecinit1.i to <8 x i16>
  %2 = call <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16> %1) #2
  ret <4 x float> %2
}

After r247504 the instruction combiner can see that the second insertelement is redundant and it is deleted.  While this is correct, it interferes with the isel patterns that select the memory-register form of VCVTPH2PS.  Previously, the load followed by the two insertelements would have been recognized by the pattern fragment 'vzmovl_v2i64'.  This would then have selected a VCVTPH2PSrm via the following pattern:

  // Pattern match vcvtph2ps of a scalar i64 load.
  def : Pat<(int_x86_vcvtph2ps_128 (vzmovl_v2i64 addr:$src)),
            (VCVTPH2PSrm addr:$src)>;

However, after r247504 we only have a single insertelement and the vzmovl_v2i64 is no longer matched.  The reason the problem only occurs after r248784 is that prior to this the combiner was unable to peek through the bitcast operation.


http://reviews.llvm.org/D16067

Files:
  lib/Target/X86/X86InstrSSE.td
  test/CodeGen/X86/f16c-intrinsics.ll

Index: test/CodeGen/X86/f16c-intrinsics.ll
===================================================================

--- test/CodeGen/X86/f16c-intrinsics.ll
+++ test/CodeGen/X86/f16c-intrinsics.ll
@@ -61,6 +61,18 @@
   ret <4 x float> %res
 }
 
+define <4 x float> @test_x86_vcvtps2ph_128_scalar2(i64* %ptr) {
+; CHECK-LABEL: test_x86_vcvtps2ph_128_scalar2:
+; CHECK-NOT: vmov
+; CHECK: vcvtph2ps (%
+
+  %load = load i64, i64* %ptr
+  %ins = insertelement <2 x i64> undef, i64 %load, i32 0
+  %bc = bitcast <2 x i64> %ins to <8 x i16>
+  %res = tail call <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16> %bc)
+  ret <4 x float> %res
+}
+
 define void @test_x86_vcvtps2ph_256_m(<8 x i16>* nocapture %d, <8 x float> %a) nounwind {
 entry:
   ; CHECK-LABEL: test_x86_vcvtps2ph_256_m:
Index: lib/Target/X86/X86InstrSSE.td
===================================================================
--- lib/Target/X86/X86InstrSSE.td
+++ lib/Target/X86/X86InstrSSE.td
@@ -8257,6 +8257,9 @@
             (VCVTPH2PSrm addr:$src)>;
   def : Pat<(int_x86_vcvtph2ps_128 (vzload_v2i64 addr:$src)),
             (VCVTPH2PSrm addr:$src)>;
+  def : Pat<(int_x86_vcvtph2ps_128 (bitconvert
+              (v2i64 (scalar_to_vector (loadi64 addr:$src))))),
+            (VCVTPH2PSrm addr:$src)>;
 
   def : Pat<(store (f64 (extractelt (bc_v2f64 (v8i16
                   (int_x86_vcvtps2ph_128 VR128:$src1, i32:$src2))), (iPTR 0))),


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D16067.44509.patch
Type: text/x-patch
Size: 1398 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160111/4a956ee2/attachment.bin>