[PATCH] D61259: AArch64: support arm64_32, an ILP32 slice for watchOS.

Thu Aug 22 00:48:25 PDT 2019

aemerson added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64CallingConvention.cpp:115

-  unsigned RegResult = State.AllocateRegBlock(RegList, PendingMembers.size());
-  if (RegResult) {
+  unsigned EltsPerReg = (IsDarwinILP32 && LocVT.SimpleTy == MVT::i32) ? 2 : 1;
+  unsigned RegResult = State.AllocateRegBlock(
----------------
Comment here explaining why we need this? I'm guessing for passing arrays of i32s & compatibility with armv7k?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3736

-  SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;
+  std::map<unsigned, SDValue> RegsToPass;
   SmallVector<SDValue, 8> MemOpChains;
----------------
t.p.northover wrote:
> efriedma wrote:
> > What's changing here?  Does it make sense to add any comments?
> We use the ability to lookup whether a register has already been added (use at line 3801 in this diff). If it has then we're trying to combine two parts of (say) a `[2 x i32]` into a single register for compatibility with armv7k IR and need to do the bitwise arithmetic to make that happen.
> 
> I don't know that a comment here would work (it would either be a historic note, or pre-empting what comes later). I'll try to do something to call it out unobtrusively at the use-point.
Looks like we still need some form of documentation here at RegUsed use?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3428
+    // Avoid copying a physreg twice since RegAllocFast is incompetent and only
+    // allows one use of a physreg per block.
+    SDValue Val = CopiedRegs.lookup(VA.getLocReg());
----------------
Why does this happen at all?

================
Comment at: llvm/test/CodeGen/AArch64/arm64-collect-loh.ll:163
 define i64 @getSExtInternalCPlus4() {
-  %addr = getelementptr i32, i32* @InternalC, i32 4
+  %addr = getelementptr inbounds i32, i32* @InternalC, i32 4
   %res = load i32, i32* %addr, align 4
----------------
Can you add a comment here explaining that inbounds is needed for arm64_32 to produce the same code.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll:22
 ; NONEFP: ldr h0,{{.*}}
-; NONEFP: fmov s1, wzr
-; NONEFP: fmov d2, xzr
-; NONEFP: movi{{(.16b)?}} v3{{(.2d)?}}, #0
-; NONE16: fmov h0, wzr
-; NONE16: fmov s1, wzr
-; NONE16: fmov d2, xzr
-; NONE16: movi{{(.16b)?}} v3{{(.2d)?}}, #0
+; NONEFP-DAG: fmov s1, wzr
+; NONEFP-DAG: fmov d2, xzr
----------------
It's odd that this test changed?

================
Comment at: llvm/test/CodeGen/AArch64/arm64_32-neon.ll:6
+; CHECK: mov.d v0[0], v1[0]
+  %res = insertelement <2 x double> %vec, double %val, i32 0
+  ret <2 x double> %res
----------------
Is anything really expected to change with NEON & arm64_32?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61259/new/

https://reviews.llvm.org/D61259