[PATCH] D90219: [SVE] Deal with SVE tuple call arguments correctly when running out of registers

Wed Nov 4 13:37:13 PST 2020

sdesmalen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:50
+
+    ArgFlags.setInConsecutiveRegs(false);
+    ArgFlags.setInConsecutiveRegsLast(false);
----------------
Can you add some comments to explain what this code is doing?

>From what I understand, it saves the state (i.e. which registers were used/allocated), then marks all registers as used so that when it calls the CCAssignFunction again all Z registers are used, which forces the aggregate to be passed indirectly. When the control flow returns, it restores the state, so that remaining unused Z regs can still be used.

================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:54
+    bool RegsAllocated[8];
+    for (int i = 0; i < 8; i++) {
+      RegsAllocated[i] = State.isAllocated(ZRegList[i]);
----------------
Please address the clang-tidy suggestion here and below.

================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:62
+        TLI->CCAssignFnForCall(State.getCallingConv(), /*IsVarArg=*/false);
+    bool Res = AssignFn(It.getValNo(), It.getValVT(), It.getValVT(),
+                        CCValAssign::Full, ArgFlags, State);
----------------
nit: To avoid `(void) Res;`, you can write:
  if (AssignFn(....))
    llvm_unreachable("Call operand has unhandled type");

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4296
+      if (Ins[j].Flags.isInConsecutiveRegs()) {
+        unsigned k = j;
+        assert(!Ins[k].Flags.isInConsecutiveRegsLast());
----------------
You can drop `k`, and write:
  while(!Ins[j + NumParts -1].Flags.isInConsecutiveRegsLast())
    ++NumParts;
That makes it more clear that this code is calculating only NumParts.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4307
+
+      ArgValue = DAG.getLoad(PartLoad, DL, Chain, Ptr, MachinePointerInfo());
+      InVals.push_back(ArgValue);
----------------
Can you rewrite this into a single loop, in a way that it only adds the increment by VScale if NumParts > 1  (and same for increment of `ExtraArgLocs`). That makes the loop a bit more readable.

Also, a one-line comment describing what this code does would be useful too.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4983
+        while (!Outs[k].Flags.isInConsecutiveRegsLast()) {
+          k++;
+          NumParts++;
----------------
Like above, you can rewrite this to not need `k`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll:136
+entry:
+  %0 = tail call <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double> %x1, i32 2)
+  %1 = tail call <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double> %x2, i32 1)
----------------
For all the callee-tests, I think the tests would be simpler if they'd just store to some pointer, e.g.

  define double @foo4(double %x0, double * %ptr1, double * %ptr2, double * %ptr3, <vscale x 8 x double> %x1, <vscale x 8 x double> %x2, <vscale x 2 x double> %x3) {
  entry:
    %ptr1.bc = bitcast double * %ptr1 to <vscale x 8 x double> *
    store volatile <vscale x 8 x double> %x1, <vscale x 8 x double>* %ptr1.bc
    %ptr2.bc = bitcast double * %ptr2 to <vscale x 8 x double> *
    store volatile <vscale x 8 x double> %x2, <vscale x 8 x double>* %ptr2.bc
    %ptr3.bc = bitcast double * %ptr3 to <vscale x 2 x double> *
    store volatile <vscale x 2 x double> %x3, <vscale x 2 x double>* %ptr3.bc
    ret double undef
  }

Then you can also easily verify that all four values are loaded from the indirect pointer (rather than just testing one of the 4 from the tuple).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90219/new/

https://reviews.llvm.org/D90219