[PATCH] D76401: [PowerPC][AIX] ByVal formal argument support: single register.

Chris Bowler via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 23 20:09:26 PDT 2020


cebowleratibm added a comment.

Summary:
-Test tidy ups
-FYI that I noted a 32-bit 8 byte aligned by-val (struct S { double d; }) that will need special handling in future
-Requested some comments in the code / tests to explain the suboptimal codegen, which otherwise looks odd.

The patch looks very close to me.



================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:7094
+      SDValue Store =
+          DAG.getStore(CopyFrom.getValue(1), dl, CopyFrom, FrameIndex,
+                       MachinePointerInfo::getFixedStack(MF, FI, 0));
----------------
sfertile wrote:
> cebowleratibm wrote:
> > Does this store get elided by optimization when possible?
> > 
> > For example:
> > struct S {int i;};
> > int foo(S s) { return s.i; }
> > 
> > I assume we don't want to manifest the store to the stack.  It's odd to me because we don't manifest the store for the equivalent:
> > 
> > int foo(int i) { return i; }
> No. Several of the lit test cases show where we DAG combine to extract the value from the register (see the 4-byte 32-bit test or the 8-byte 64-bit test cases) but we don't remove the dead store. We will need work that can generalize the optimization the DAG combines is doing for extracting the values from the register, and a pass to clean up the stores.
> 
> >  It's odd to me because we don't manifest the store for the equivalent:
> > int foo(int i) { return i; }
> 
> These are seemingly equivalent in C/C++ source, but consider the IR we have to generate for the 2 cases and I think the difference become apparent: 
> 
> ```
> ; Function Attrs: norecurse nounwind readonly
> define i32 @foo(%struct.S* nocapture readonly byval(%struct.S) align 4 %s) local_unnamed_addr #0 {
> entry:
>   %i = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
>   %0 = load i32, i32* %i, align 4, !tbaa !3
>   ret i32 %0
> }
> 
> ; Function Attrs: norecurse nounwind readnone
> define i32 @Bar(i32 returned %i) local_unnamed_addr #1 {
> entry:
>   ret i32 %i
> }
> ```
> 
> The GEP and the load force us to store to the stack for the ByVal arg making the 2 functions that happen to be 'equivalent' in source to be non-equivalent in their IR representations. 
Thanks for the explanation.  I think a brief comment in the source to describe why we always need the frame object would be useful.  I also made comments in the test changes where I think test comments should explain the suboptimal expected codegen.  I am in favour of the patch as you've presented it in order to get AIX functional and follow up with optimization improvement later.

There is a case in 32-bit AIX:

struct S { double d; }

has 8 byte alignment (only for first member is a double) and the normal frame object location with PtrByteSize (4) alignment will not suffice.  Currently that case emits an error so it doesn't need to be handled in this patch, but I mention it to you because I expect your code will need to be modified when we remove the error.  You may need special handling in this case to remap the frame object location.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:6867
+    if (ByValSize == 0) {
+      State.addLoc(CCValAssign::getMem(ValNo, ValVT, State.getNextStackOffset(),
+                                       RegVT, LocInfo));
----------------
Minor concern here that ValVT is a lie.  I assume the consumer will ignore it when it sees the ByValSize is 0, but I'd rather it hold an invalid value than something that could be conceived as meaningful.


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:86
+; 32BIT:         STW killed renamable $r3, 0, %fixed-stack.0 :: (store 4 into %fixed-stack.0, align 8)
+; 32BIT-NEXT:    renamable $r3 = LBZ 0,  %fixed-stack.0 :: (dereferenceable load 1
+; 32BIT-NEXT:    BLR
----------------
missing ')'. Still passes but worth tidying.


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:100
+; 64BIT:        STD killed renamable $x3, 0, %fixed-stack.0 :: (store 8 into %fixed-stack.0, align 16)
+; 64BIT-NEXT:   renamable $x3 = LBZ8 0, %fixed-stack.0 :: (dereferenceable load 1
+
----------------
Also missing ')'


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:105
+; ASM32:        stw 3, 24(1)
+; ASM32-NEXT:   lbz 3, 24(1)
+; ASM32-NEXT:   blr
----------------
Curious to me that the optimizer didn't tidy up the stw/lbz.  It's ok for now but we need to ensure the param stack writes are being elided for performance.


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:172
+; 32BIT:        STW killed renamable $r3, 0, %fixed-stack.0 :: (store 4 into %fixed-stack.0, align 8)
+; 32BIT-NEXT:   renamable $r3 = LBZ 1, %fixed-stack.0 :: (dereferenceable load 1
+
----------------
')'


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:175
+; 64BIT:      fixedStack:
+; 64BIT-NEXT:   - { id: 0, type: default, offset: 48, size: 8, alignment: 16,
+
----------------
missing trailing text?


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:180
+; 64BIT:        STD killed renamable $x3, 0, %fixed-stack.0 :: (store 8 into %fixed-stack.0, align 16)
+; 64BIT-NEXT:   renamable $x3 = LBZ8 1, %fixed-stack.0 :: (dereferenceable load 1
+
----------------
')'


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:257
+; 32BIT:      fixedStack:
+; 32BIT-NEXT:   - { id: 0, type: default, offset: 24, size: 4, alignment: 8,
+
----------------
trailing text?


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:262
+; 32BIT:        STW killed renamable $r3, 0, %fixed-stack.0 :: (store 4 into %fixed-stack.0, align 8)
+; 32BIT-NEXT:   renamable $r3 = LHZ 1, %fixed-stack.0 :: (dereferenceable load 2
+
----------------
)


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:265
+; 64BIT:      fixedStack:
+; 64BIT-NEXT:   - { id: 0, type: default, offset: 48, size: 8, alignment: 16,
+
----------------
trailing text


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:362
+; 32BIT-NEXT:   liveins: $r3
+; 32BIT:        STW renamable $r3, 0, %fixed-stack.2 :: (store 4 into %fixed-stack.2, align 8)
+; 32BIT-DAG:    STW killed renamable $r4, 0, %fixed-stack.0 :: (store 4 into %fixed-stack.0)
----------------
dead stores at opt?  If this is temporary it probably warrants a comment in the expected output.


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:383
+; 64BIT-NEXT:   STD killed renamable $x4, 0, %fixed-stack.0 :: (store 8 into %fixed-stack.0)
+; 64BIT-DAG:    renamable $r[[SCRATCH1:[0-9]+]] = LBZ 3, %fixed-stack.2 :: (dereferenceable load 1
+; 64BIT-DAG:    renamable $r[[SCRATCH2:[0-9]+]] = LWZ 0, %fixed-stack.0 :: (dereferenceable load 4
----------------
) 


================
Comment at: llvm/test/CodeGen/PowerPC/aix-cc-byval.ll:384
+; 64BIT-DAG:    renamable $r[[SCRATCH1:[0-9]+]] = LBZ 3, %fixed-stack.2 :: (dereferenceable load 1
+; 64BIT-DAG:    renamable $r[[SCRATCH2:[0-9]+]] = LWZ 0, %fixed-stack.0 :: (dereferenceable load 4
+; 64BIT-NEXT:   renamable $r[[SCRATCH3:[0-9]+]] = nsw ADD4 killed renamable $r[[SCRATCH2]], killed renamable $r[[SCRATCH1]]
----------------
)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76401/new/

https://reviews.llvm.org/D76401





More information about the llvm-commits mailing list