[PATCH] D11438: Part 2 to fix x86_64 fp128 calling convention.

Mon Dec 7 12:14:00 PST 2015

chh added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14628
@@ -14598,2 +14627,3 @@
   if ((isOneConstant(Op1) || isNullConstant(Op1)) &&
+      Op1.getValueType() != MVT::i128 &&  // getZExtValue() works up to i64 only.
       (CC == ISD::SETEQ || CC == ISD::SETNE)) {
----------------
davidxl wrote:
> This needs some explanation. Why can the Op1's value type be i128?
Removed it. It was an condition only triggered by my older hacks.
Not it should not happen.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:27776
@@ -27743,2 +27775,3 @@
         return std::make_pair(0U, &X86::FR64RegClass);
+      // TODO: handle f128 and i128 in FR128RegClass.
       // Vector types.
----------------
davidxl wrote:
> Why TODO here? 'x' constraint should work.
f128 and i128 were not in SSE_REG before.
So I think this is a new feature that can be added later.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:27889
@@ -27855,2 +27888,3 @@
 
+    // TODO: handle f128 and i128 in FR128RegClass.
     if (VT == MVT::f32 || VT == MVT::i32)
----------------
davidxl wrote:
> Explain TODO here.
Same as above. f128 and i128 not in SSE_REG before this change.


================
Comment at: lib/Target/X86/X86InstrInfo.td:957
@@ -956,6 +956,3 @@
 
-def loadi8  : PatFrag<(ops node:$ptr), (i8  (load node:$ptr))>;
-def loadi64 : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;
-def loadf32 : PatFrag<(ops node:$ptr), (f32 (load node:$ptr))>;
-def loadf64 : PatFrag<(ops node:$ptr), (f64 (load node:$ptr))>;
-def loadf80 : PatFrag<(ops node:$ptr), (f80 (load node:$ptr))>;
+def loadi8   : PatFrag<(ops node:$ptr), (i8  (load node:$ptr))>;
+def loadi64  : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;
----------------
davidxl wrote:
> Unrelated format change here.
They are changed to align up with new line 962.


================
Comment at: lib/Target/X86/X86InstrSSE.td:8862
@@ +8861,3 @@
+          (MOVAPSmr addr:$dst, (COPY_TO_REGCLASS (f128 FR128:$src), VR128))>;
+// When the data is used as floating point, "movaps" should be faster and shorter
+// than "movdqa". "movaps" is in SSE and movdqa is in SSE2.
----------------
davidxl wrote:
> Move the comment above the pattern def. 
> 
> 1) movaps is shorter, not 'should be'
> 2) regarding 'faster' part -- put a reference there. In fact, f128 operations should be considered in integer domain so movdqa should be used to avoid domain bypass penalty.
I updated comment to keep only the shorter and SSE reasons.
I was not sure about 'faster with movaps',
which seems to be used more in clang than gcc.
My main reasons are shorter and available in SSE for Android's applications.


================
Comment at: lib/Target/X86/X86InstrSSE.td:8868
@@ +8867,3 @@
+
+// andps is faster and shorter than andpd, andps is SSE and andpd is SSE2
+def : Pat<(X86fand FR128:$src1, (loadf128 addr:$src2)),
----------------
davidxl wrote:
> pand is for SIMD integer. andps is shorter though.
I also choose andps over pand for shorter and availability in SSE,
not sure about performance difference when combined with other instructions.
I think by not splitting f128 into two registers, we already saved more code and execution time.




http://reviews.llvm.org/D11438