[PATCH] D90219: [SVE] Deal with SVE tuple call arguments correctly when running out of registers
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 4 13:37:13 PST 2020
sdesmalen added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:50
+
+ ArgFlags.setInConsecutiveRegs(false);
+ ArgFlags.setInConsecutiveRegsLast(false);
----------------
Can you add some comments to explain what this code is doing?
>From what I understand, it saves the state (i.e. which registers were used/allocated), then marks all registers as used so that when it calls the CCAssignFunction again all Z registers are used, which forces the aggregate to be passed indirectly. When the control flow returns, it restores the state, so that remaining unused Z regs can still be used.
================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:54
+ bool RegsAllocated[8];
+ for (int i = 0; i < 8; i++) {
+ RegsAllocated[i] = State.isAllocated(ZRegList[i]);
----------------
Please address the clang-tidy suggestion here and below.
================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:62
+ TLI->CCAssignFnForCall(State.getCallingConv(), /*IsVarArg=*/false);
+ bool Res = AssignFn(It.getValNo(), It.getValVT(), It.getValVT(),
+ CCValAssign::Full, ArgFlags, State);
----------------
nit: To avoid `(void) Res;`, you can write:
if (AssignFn(....))
llvm_unreachable("Call operand has unhandled type");
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4296
+ if (Ins[j].Flags.isInConsecutiveRegs()) {
+ unsigned k = j;
+ assert(!Ins[k].Flags.isInConsecutiveRegsLast());
----------------
You can drop `k`, and write:
while(!Ins[j + NumParts -1].Flags.isInConsecutiveRegsLast())
++NumParts;
That makes it more clear that this code is calculating only NumParts.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4307
+
+ ArgValue = DAG.getLoad(PartLoad, DL, Chain, Ptr, MachinePointerInfo());
+ InVals.push_back(ArgValue);
----------------
Can you rewrite this into a single loop, in a way that it only adds the increment by VScale if NumParts > 1 (and same for increment of `ExtraArgLocs`). That makes the loop a bit more readable.
Also, a one-line comment describing what this code does would be useful too.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:4983
+ while (!Outs[k].Flags.isInConsecutiveRegsLast()) {
+ k++;
+ NumParts++;
----------------
Like above, you can rewrite this to not need `k`.
================
Comment at: llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll:136
+entry:
+ %0 = tail call <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double> %x1, i32 2)
+ %1 = tail call <vscale x 2 x double> @llvm.aarch64.sve.tuple.get.nxv2f64.nxv8f64(<vscale x 8 x double> %x2, i32 1)
----------------
For all the callee-tests, I think the tests would be simpler if they'd just store to some pointer, e.g.
define double @foo4(double %x0, double * %ptr1, double * %ptr2, double * %ptr3, <vscale x 8 x double> %x1, <vscale x 8 x double> %x2, <vscale x 2 x double> %x3) {
entry:
%ptr1.bc = bitcast double * %ptr1 to <vscale x 8 x double> *
store volatile <vscale x 8 x double> %x1, <vscale x 8 x double>* %ptr1.bc
%ptr2.bc = bitcast double * %ptr2 to <vscale x 8 x double> *
store volatile <vscale x 8 x double> %x2, <vscale x 8 x double>* %ptr2.bc
%ptr3.bc = bitcast double * %ptr3 to <vscale x 2 x double> *
store volatile <vscale x 2 x double> %x3, <vscale x 2 x double>* %ptr3.bc
ret double undef
}
Then you can also easily verify that all four values are loaded from the indirect pointer (rather than just testing one of the 4 from the tuple).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D90219/new/
https://reviews.llvm.org/D90219
More information about the llvm-commits
mailing list