[PATCH] D27392: Vectorcall Calling Convention - Adding CodeGen Complete Support

Tue Dec 13 09:47:13 PST 2016

rnk added inline comments.

================
Comment at: include/llvm/CodeGen/CallingConvLower.h:326

+  /// A shadow allocated register is a register that was allcoated
+  /// but wasn't added to the location list (Locs).
----------------
typo on "allocated"

================
Comment at: lib/CodeGen/CallingConvLower.cpp:74
+
+  for (unsigned i = 0, e = Locs.size(); i != e; ++i) {
+    for (MCRegAliasIterator AI(Locs[i].getLocReg(), &TRI, true); AI.isValid();
----------------
majnemer wrote:
> Range-based for loop?
This creates a copy of a CCValAssign, which is unnecessary.

================
Comment at: lib/Target/X86/X86CallingConv.cpp:130
+  // "A vector type is either a floating-point type庸or example,
+  //  a float or double熔r an SIMD vector type庸or example, __m128 or __m256."
+  if (!ValVT.isFloatingPoint() &&
----------------
rnk wrote:
> mojibake
I still see "type庸or example" with an Asian character in there.

================
Comment at: lib/Target/X86/X86CallingConv.td:631
-
-  // 256-bit vectors use YMM registers.
-  CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
----------------
oren_ben_simhon wrote:
> rnk wrote:
> > Ouch. I locally confirmed this is correct, but why design a new calling convention that doesn't handle the latest vector types... =P
> If you refer to 512 bit vector types, Microsoft probably forgot to mention that it should be assigned to ZMM [[ https://msdn.microsoft.com/en-us/library/dn375768.aspx | here ]].
Actually, I think I just misunderstood. It looks like you allocate things to AVX registers elsewhere. I thought with this change you made it so that AVX registers would never be used with vectorcall.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:2799
+    SmallVector<ISD::InputArg, 16> InsSec;
+    for (auto InArg : Ins) {
+      ISD::InputArg InArgSec(InArg);
----------------
Unnecessary copy

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:2810
+  // input arguments.
+  std::stable_sort(ArgLocs.begin(), ArgLocs.end(),
+                   [](const CCValAssign &A, const CCValAssign &B) -> bool {
----------------
The stable_sort is only needed if the CC was vectorcall.

I think the code would actually be clearer if we used std::merge. After the first pass, ArgLocs should be all the non-HVA argument locations sorted by argument number. The second pass appends additional sorted HVA argument locations. Then you merge those two sorted lists. You'll need a temporary ArgLocs vector to make this work.

Please factor out this two pass vectorcall code into a template function that operates on a SmallVectorImpl<T>, where T can be InputArg or OutputArg.

================
Comment at: test/CodeGen/X86/vectorcall.ll:71
-; tablegen any other way.
-define x86_vectorcallcc {double, double, double, double, double} @test_fp_4() {
-  ret {double, double, double, double, double}
----------------
oren_ben_simhon wrote:
> aaboud wrote:
> > Do you really want to remove this test? Cannot you just fix it?
> AFAIK, This test checks something that cannot happen.
> There is no case in which a function will return {double, double, double, double, double}.
> If it wanted to return such a structure it will be returned by pointer as the first argument to the function:
> 
> > define x86_vectorcallcc void @test_fp_4(%struct.five_doubles* sret %a)
> 
> 
This was really just a whitebox test to show that LLVM does not crash when you return 5 doubles. We should keep the test if we don't fatal error. Any behavior is fine. For example, it might trigger LLVM's logic to demote return by value to sret.

Repository:
  rL LLVM

https://reviews.llvm.org/D27392