[PATCH] D70401: [WIP][RISCV] Implement ilp32e ABI

Thu Jan 14 16:22:06 PST 2021

lenary added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:1525
   unsigned TwoXLenInBytes = (2 * XLen) / 8;
   if (!IsFixed && ArgFlags.getOrigAlign() == TwoXLenInBytes &&
       DL.getTypeAllocSize(OrigTy) == TwoXLenInBytes) {
----------------
lenary wrote:
> shiva0217 wrote:
> > lenary wrote:
> > > shiva0217 wrote:
> > > > The variadic argument for ilp32e doesn't need to align to even register. We could also add a test line in vararg.ll.
> > > I'm not sure I agree with this interpretation of the psABI. The [[ https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#ilp32e-calling-convention | ILP32E Section ]] makes no exception for variadic arguments, and the base calling convention is only defined in relation to `XLEN`, not in terms of stack alignment.
> > > 
> > > I will add a test to `vararg.ll` so the behaviour is at least tested. 
> > It seems to be the current GCC behavior and the following case could observe that double will not align to even pair.
> >   #include <stdarg.h>
> >   void va_double (int n, ...) {
> >     va_list args;
> >     va_start (args, n);
> >     if (va_arg (args, double) != 2.0)
> >       abort ();
> >     va_end (args);
> >    }
> >    int main (int a) {
> >     va_double (1, 2.0);
> >     return a;
> >    }
> > 
> > In a second thought, it seems that non-fixed double arguments may generate incorrect code, even with align even pair.
> > For ilp32 or lp64 ABI with feature D, stack alignment will be 16, so even pair can make sure when pushing/popping the non-fixed double to/from the stack, it will be 8-byte alignment. For ilp32e with 4-byte alignment, even pair can not guarantee the double will be pushed to stack with 8-byte alignment.
> Ah, I see the issue.
> 
> It's not clear that choosing to spill to a register pair where the first register is a multiple of 4 would solve the problem either, right? The problem is that we actually need to realign the spill slots for these register pairs.
> 
> I'm not sure how we achieve this. I will investigate further.
I missed that I need to cover this case. 

I'm going to upload a testcase based on your example, but I'm not quite convinced it's correct. It does seem to align the stack correctly for the fp64, but that's maybe not the right thing to be doing here?

I haven't managed to execute the assembly in the testcase, but I thought adding the testcase was important.

================
Comment at: llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll:11
+; This test currently fails because the machine code does not validate:
+; RUN: not llc -mtriple=riscv32 -mattr=+d -target-abi ilp32e -verify-machineinstrs < %s
+; It will need FileCheck %s -check-prefix=ILP32-LP64-NO-D
----------------
shiva0217 wrote:
> lenary wrote:
> > lenary wrote:
> > > shiva0217 wrote:
> > > > shiva0217 wrote:
> > > > > Jim wrote:
> > > > > > shiva0217 wrote:
> > > > > > > lenary wrote:
> > > > > > > > @shiva0217 I think this test is failing because of the base pointer patch, but I'm not sure. Can you look at the issue? It thinks that x8 gets killed by a store (which I don't think should be using x8), and therefore x8 is not live when we come to the epilog. It's a super confusing issue.
> > > > > > > Hi @lenary, it seems that hasBP() return false in this case, the issue trigger by register allocation allocating x8 which should be preserved. I'm not sure why it will happen, I try to write a simple C code to reproduce the case but fail to do that. Could you obtain the C code for the test case?
> > > > > > It seems that RISCVRegisterInfo::getReservedRegs doesn't add x8(fp) into reserved registers (TFI->hasFP(MF) return false), then x8 is a candidate register for register allocation. After register allocation, some of fpr64 splitted into stack that makes stack need to be realign (MaxAlignment(8) > StackAlignment(4)), therefore x8 should be used as frame pointer (TFI->hasFP(MF) return true). In emitting epilogue, instructions for fp adjustment is inserted.
> > > > > With the investigation from @Jim, here is the simple C could reproduce the case.
> > > > >   extern double var;
> > > > >   extern void callee();
> > > > >   void test(){
> > > > >     double val = var;
> > > > >     callee();
> > > > >     var = val;
> > > > >   }
> > > > > Thanks, @Jim 
> > > > There're might be few ways to fix the issue:
> > > > 1. hasFP() return true for ilp32e ABI with feature D
> > > > 2. hasFP() return true for ilp32e ABI with feature D and there's is a virtual register with f64 type.
> > > > 3. Not allow  ilp32e ABI with feature D.
> > > > Given that most of the targets supported double float instructions have stack alignment at least eight bytes to avoid frequently realignment. Would it more reasonable to have a new embedded ABI with stack alignment at least eight bytes to support feature D?
> > > @Jim, @shiva0217, thank you very much for tracking down this bug, and providing a small testcase, that's very helpful.
> > > 
> > > We talked about this on the call this week, and I indicated I was going to go with a solution as close to 2 as I could.
> > > 
> > > I have since started an investigation (which I hoped would be quicker than it is) of what happens if we implement `canRealignStackFrame` to check if FP is unused, and this also seems to solve the problem. I'm doing some deeper checks (which require implementing parts of the backend around MIR that I haven't looked at before), but I think this might be a better solution? I'll keep this patch updated on when I upload the fix for stack realignment to cover this case. In the case that this fix isn't enough, I'll look to implement solution 2.
> > > 
> > > In any case, it's evident that allocating a spill slot for a register that has higher spill alignment than the stack slot, is the kernel of the problem, and this may arise again depending on how we choose to implement other extensions.
> > > 
> > > 
> > I couldn't find a reasonable way to check for a virtual (or physical) register of type fp64, without iterating over all the instructions in a function, which I'd prefer not to do.
> > 
> > So Instead I have implemented option 1 in `hasFP`.
> I think option 1 could be a reasonable way to fix the issue.
I went back and thinking about this, we just need to make sure `fp` is reserved for later, rather than overriding `hasFP`, so we don't need to reserve FP unnecessarily.

Iterating over used registers to find FP64 registers didn't fill me with joy, and if you override `canRealignStackFrame`, it seems you just get very incorrect stack management (where the code just… doesn't bother to realign the stack before saving/restoring).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70401/new/

https://reviews.llvm.org/D70401