[PATCH] D28744: [X86][AVX] Remove "OptForSize" condition from some memory foldings.

Fri Jan 20 20:54:34 PST 2017

craig.topper added inline comments.

================
Comment at: lib/Target/X86/X86InstrInfo.cpp:1928
+    { X86::VRCP14SSrr,        X86::VRCP14SSrm,          TB_NO_REVERSE },
+    { X86::VRSQRT14SSrr,      X86::VRSQRT14SSrm,        TB_NO_REVERSE },
     { X86::VSHUFPDZrri,       X86::VSHUFPDZrmi,         0 },
----------------
aymanmus wrote:
> RKSimon wrote:
> > aymanmus wrote:
> > > RKSimon wrote:
> > > > Isn't it only the _Int variants that need TB_NO_REVERSE ?
> > > TB_NO_REVERSE is used when the load size of the memory form is smaller than the register's size of the register form. In that case the unfolding is not legal since it will produce a load with the registers size.
> > I maybe reading the avx512_fp14_s definitions incorrectly but it appears to be using the scalar f32x_info type not the packed equivalents. @craig.topper / @igorb please can you confirm?
> That's right, but if you track the definitions more deeply you can see that the "rr" version is defined to have an xmm register operand while the "rm" version is defined to have a 32-bit memory operand.
> Unfolding from the memory form to the register form will generate a 128-bit load (because of the xmm register), which we want to avoid.
Can leave a TODO here. We really should fix that so the we have a separate VRCP14SSrr_Int that uses the XMM type like we do for similar instructions.

================
Comment at: lib/Target/X86/X86InstrSSE.td:1789
                       [], IIC_SSE_CVT_Scalar_RM>,
-                      XD, Requires<[HasAVX, OptForSize]>, VEX_4V, VEX_LIG,
+                      XD, Requires<[HasAVX]>, VEX_4V, VEX_LIG,
                       Sched<[WriteCvtF2FLd, ReadAfterLd]>;
----------------
This doesn't need a Requires. There's no pattern.

================
Comment at: lib/Target/X86/X86InstrSSE.td:1855
                     [], IIC_SSE_CVT_Scalar_RM>,
-                    XS, VEX_4V, VEX_LIG, Requires<[HasAVX, OptForSize]>,
+                    XS, VEX_4V, VEX_LIG, Requires<[HasAVX]>,
                     Sched<[WriteCvtF2FLd, ReadAfterLd]>;
----------------
Not sure this even needs a Requires. There's no pattern defined.

================
Comment at: lib/Target/X86/X86InstrSSE.td:1866
     (VCVTSS2SDrm (f32 (IMPLICIT_DEF)), addr:$src)>,
-    Requires<[UseAVX, OptForSize]>;
-def : Pat<(extloadf32 addr:$src),
-    (VCVTSS2SDrr (f32 (IMPLICIT_DEF)), (VMOVSSrm addr:$src))>,
-    Requires<[UseAVX, OptForSpeed]>;
+    Requires<[UseAVX]>;

----------------
Will this fit on the previous line now?

================
Comment at: test/CodeGen/X86/avx-arith.ll:353
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
-; CHECK-NEXT:    vsqrtss %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vsqrtss (%rax), %xmm0, %xmm0
 ; CHECK-NEXT:    retq
----------------
This now has a register read dependency on xmm0 which I believe is what the OptForSize was originally protecting against.  I know work has gone into ExeDepFix for UndefRegClearance. Do we believe that is sufficient to allow this folding now?

https://reviews.llvm.org/D28744