[PATCH] D74225: [AIX] Implement formal arguments passed in stack memory

Thu Feb 27 17:41:26 PST 2020

ZarkoCA marked 21 inline comments as done.
ZarkoCA added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:7010
+      int CurArgOffset = VA.getLocMemOffset();
+      // Objects in AIX are right justified.
+      if (ArgSize < ObjSize)
----------------
cebowleratibm wrote:
> cebowleratibm wrote:
> > cebowleratibm wrote:
> > > suggest:
> > > 
> > > // AIX objects are right justified because they are word written to stack memory from their register image.
> > > 
> > updated suggestion:
> > 
> > // Objects are right-justified because AIX is big-endian.
> > 
> > I tend to avoid the use of the term word because in Power ISA it means 4 bytes.  To others it means PtrByteSize.
> I don't see the updated comment reflected in the current patch.  I still see:
> 
>     // AIX objects are right justified because they are word written to stack
>     // memory from their register image.
> 
> and I would prefer this to say 
> 
>    // Objects are right-justified because AIX is big-endian.
I missed this, sorry about that. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1320
+; 32BIT-DAG:   - { id: 1, type: default, offset: 88, size: 4
+; 32BIT-DAG:   - { id: 0, type: default, offset: 92, size: 4
+
----------------
cebowleratibm wrote:
> When i looked at Linux PPC32 I saw that they used 8 byte frame objects for the long long parameters.  I think we should be consistent with the precedent.
I spent a while investigating this and it looks like the PPC32 and 32BIT AIX do the same thing here.

I see that both create two 4yte sized fixed frame objects for one i64 int: 
AIX:
```fixedStack:
  - { id: 0, type: default, offset: 60, size: 4, alignment: 4, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
  - { id: 1, type: default, offset: 56, size: 4, alignment: 8, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
```
PPC32
```
fixedStack:
  - { id: 0, type: default, offset: 12, size: 4, alignment: 4, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
  - { id: 1, type: default, offset: 8, size: 4, alignment: 8, stack-id: default,
      isImmutable: true, isAliased: false, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

```

However, if we don't run the IR under opt I see we add it adds allocas and stores for each of the parameter and this is what I think creates local stack objects for all the parameters, including a size 8 for the i64 parameter in both AIX and PPC32.  

On PPC32 and 32BIT AIX all of these local stack objects are at offset 0 and a negative local-offset, their size is : 

```  - { id: 8, name: ll9.addr, type: default, offset: 0, size: 8, alignment: 8,
      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
      local-offset: -40, debug-info-variable: '', debug-info-expression: '',
      debug-info-location: '' }
```

I'm not sure that what the use of these stack objects is since they are eliminated if I run the IR under opt which is what I've done to generate the IR for all test cases. I think they are related to lowering of the stack frame but I'm not entirely sure yet.  

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1352
+; ASM32PWR4-DAG:   add 3, 3, 4
+; ASM32PWR4-DAG:   lwz [[REG1:[0-9]+]], 60(1)
+; ASM32PWR4-DAG:   add 3, 3, 5
----------------
cebowleratibm wrote:
> I got confused by why there's no load at 56 to account for the msw of the 9th parameter (i64) then realized the result is stored in i32, so the optimizer is probably getting clever.  To keep the test straight forward and avoid confusion, I'd suggest accumulating the result in an i64 so that we'll see (and can validate) the load at 56.
My initial thought in doing it this was to show that on AIX the upper word of the i64 holds the most significant values.  But I agree, it's confusing and it's better to accumulate in an i64. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1356
+; ASM32PWR4-DAG:   add 3, 3, 7
+; ASM32PWR4-DAG:   lwz [[REG2:[0-9]+]], 64(1)
+; ASM32PWR4-DAG:   add 3, 3, 8
----------------
cebowleratibm wrote:
> My preference is to see the:
> add 3, 3, 5
> add 3, 3, 6
> add 3,3, 7
> ...
> as a block and coalesce the stack loads next to where they're used.
I think it's best to remove these as per your comment and my reply below. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1382
+; ASM64PWR4-DAG:   lwz [[REG7:[0-9]+]], 164(1)
+; ASM64PWR4-DAG:   lwa [[REG8:[0-9]+]], 172(1)
+
----------------
cebowleratibm wrote:
> The ASM32PWR4-DAG expects the add operations but they're omitted in the ASM64PWR4-DAG expected output.
> 
> I don't strongly prefer either though I strongly prefer consistency.
This is inconsistent, your right.  I'm leaning toward only showing the loads and stores in these test cases to more clear about what we are testing. It's also the fastest change to make. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1409
+
+; 32BIT-DAG:   renamable $r[[REGLL1:[0-9]+]] = LWZtoc @ll1, $r2 :: (load 4 from got)
+; 32BIT-DAG:   renamable $r[[REGLL1ADDR1:[0-9]+]] = LWZ 0, renamable $r[[REGLL1]] :: (dereferenceable load 4 from @ll1, align 8)
----------------
cebowleratibm wrote:
> I usually name the toc load appending ADDR to indicate the register holds the address of the variable, then name the loads after to indicate they hold the value of the variable.  Seems the naming you have here his backwards.
Thanks, fixed now. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1438
+; 32BIT-DAG:   ADJCALLSTACKDOWN 96, 0, implicit-def dead $r1, implicit $r1
+; 32BIT-DAG:   $r3 = LI 1
+; 32BIT-DAG:   $r4 = LI 2
----------------
cebowleratibm wrote:
> Logically I like to see the parameter initializations in order.  I find it easier to understand in the approach.
Yes, that's better, I agree. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1693
+; 32BIT-DAG:   STW killed renamable $r[[SCRATCHREG:[0-9]+]], 80, $r1 :: (store 4, align 8)
+; 32BIT-DAG:   STW renamable $r[[SCRATCHREG:[0-9]+]], 84, $r1 :: (store 4)
+; 32BIT-DAG:   STW killed renamable $r[[SCRATCHREG:[0-9]+]], 88, $r1 :: (store 4, align 8)
----------------
cebowleratibm wrote:
> why are all the other STW flagged "killed" and this one not?
I'm not sure...I found that if I change the value that's held in that register (the double 1.000000e-01) to anything else, the register is then flagged "killed" like the rest.  

I will need to follow up with some more investigation to find what determines when a register will not be killed in a case like this. 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1713
+; 32BIT-DAG:   renamable $r[[SCRATCHREG:[0-9]+]] = LWZ 0, killed renamable $r[[REGF2]] :: (load 4 from @f16)
+; 32BIT-DAG:   renamable $r[[SCRATCHREG:[0-9]+]] = LWZtoc %const.0, $r2 :: (load 4 from got)
+; 32BIT-DAG:   renamable $r[[SCRATCHREG:[0-9]+]] = LWZtoc %const.1, $r2 :: (load 4 from got)
----------------
cebowleratibm wrote:
> I don't think that it's useful to expect the constant pool loads given that the STW matches on SCRATCHREG.  There's no validation of where each constant is written and we're not intending to test the constant loads anyways.
In this case here I'm using SCRATCHREG only to not hard code the registers in which we do the constant pool loads.  Hypothetically, they can be any register and I don't want the test to break if any of those scratch registers change at any point in time. I could take them out altogether maybe? 

================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-abi.ll:1861
+; 32BIT-LABEL: fixedStack:
+; 32BIT-DAG:   - { id: 6, type: default, offset: 56, size: 4
+; 32BIT-DAG:   - { id: 5, type: default, offset: 60, size: 4
----------------
cebowleratibm wrote:
> Should be size 2.  Though the arg passes in 4 bytes, the object is 2 bytes at offset 58.
To me it looks like CC_AIX promotes i16 to i32s, and the ValVT passed to getLoad() is i32, so I think this is why it does a 4 byte load at offset 56.  

```  // Promote integers if needed.
      if (ValVT.getSizeInBits() < RegVT.getSizeInBits())
        LocInfo = ArgFlags.isSExt() ? CCValAssign::LocInfo::SExt
                                    : CCValAssign::LocInfo::ZExt;```

Is this not correct in 32BIT mode? I see PPC32 do the same thing and XLC on AIX also using a 4byte load (lwz) at offset 56 for an i16.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74225/new/

https://reviews.llvm.org/D74225