Support for HiPE-compatible code emission

Eli Bendersky eliben at google.com
Mon Feb 11 11:25:45 PST 2013


On Mon, Feb 4, 2013 at 9:08 AM, Yiannis Tsiouris
<gtsiour at softlab.ntua.gr> wrote:
> Hi Nadav (and list),
>
> The purpose of this email is to get a review for the attached patch that
> adds support for generating Erlang Run-Time System (ERTS) compatible
> native code in the assembly prologue.
>
> We 've been trying to push this commit for quite a long time now without
> getting an answer/review; I suspect that this had to do with the 'bad
> timing' we tried to send this as the whole developer community was
> preparing for a release. Let me add that a patch for adding a new
> calling convention for Erlang has already been accepted (cc11) and
> another patch adding a compatible GC plugin will follow (the last one!).
>
> In general, our work on ErLLVM [1] is not a toy project but an important
> project for the Erlang/OTP community; we intend to push our changes to
> the HiPE compiler as soon as we push these modifications to LLVM.
>

Hello Yiannis,

Some comments & questions below:

diff --git a/include/llvm/Target/TargetFrameLowering.h
b/include/llvm/Target/TargetFrameLowering.h
index 1958f90..32d72e7 100644
--- a/include/llvm/Target/TargetFrameLowering.h
+++ b/include/llvm/Target/TargetFrameLowering.h
@@ -120,6 +120,11 @@ public:
   /// by adding a check even before the "normal" function prologue.
   virtual void adjustForSegmentedStacks(MachineFunction &MF) const { }

+  /// Adjust the prologue to add HiPE specific code that explicitly handles the
+  /// stack. This works by adding a check and a call to "inc_stack_0"
Erlang BIF
+  /// before the normal function prologue.
+  virtual void adjustForHiPEPrologue(MachineFunction &MF) const { }
+

I don't think the implementation details belong in the comment for the generic
target-independent method. Leave them for the actual implementation.

Also, I'm concerned about this growing "adjustFor<XXX>" trend. I won't heap this
on your patch, but any ideas about generalizing this are appreciated. Perhaps
some way to register options -> adjust methods.

   /// spillCalleeSavedRegisters - Issues instruction(s) to spill all callee
   /// saved registers and returns true if it isn't possible / profitable to do
   /// so by issuing a series of store instructions via
diff --git a/lib/CodeGen/PrologEpilogInserter.cpp
b/lib/CodeGen/PrologEpilogInserter.cpp
index 954613d..45e04a9 100644
--- a/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/lib/CodeGen/PrologEpilogInserter.cpp
@@ -693,6 +693,14 @@ void PEI::insertPrologEpilogCode(MachineFunction &Fn) {
   // space in small chunks instead of one large contiguous block.
   if (Fn.getTarget().Options.EnableSegmentedStacks)
     TFI.adjustForSegmentedStacks(Fn);
+
+  // Emit additional code that is required to explicitly handle the stack in
+  // HiPE native code (if needed) when loaded in the Erlang/OTP runtime. The
+  // approach is rather similar to that of Segmented Stacks, but it uses a
+  // different conditional check and another BIF for allocating more stack
+  // space.
+  if (Fn.getFunction()->getCallingConv() == CallingConv::HiPE)
+    TFI.adjustForHiPEPrologue(Fn);
 }

 /// replaceFrameIndices - Replace all MO_FrameIndex operands with physical
diff --git a/lib/Target/X86/X86FrameLowering.cpp
b/lib/Target/X86/X86FrameLowering.cpp
index 420aeb8..f1d244a 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -1387,11 +1387,16 @@ HasNestArgument(const MachineFunction *MF) {
 /// either one or two registers will be needed. Set primary to true for
 /// the first register, false for the second.
 static unsigned
-GetScratchRegister(bool Is64Bit, const MachineFunction &MF, bool Primary) {
+GetScratchRegister(bool Is64Bit, const MachineFunction &MF, bool
Primary=true) {

The comment above the function mentions that it's needed fro segmented stacks.
You now use it for a different purpose. Please modify the commento to reflect
that.

+  CallingConv::ID CallingConvention = MF.getFunction()->getCallingConv();
+
+  // R14/EBX is a temp register, not used in HiPE Calling Convention.
+  if (CallingConvention == CallingConv::HiPE)
+    return (Is64Bit ? X86::R14 : X86::EBX);
+

Can this be added into the existing if (CallingConvention) condition chain?

   if (Is64Bit)
     return Primary ? X86::R11 : X86::R12;

-  CallingConv::ID CallingConvention = MF.getFunction()->getCallingConv();
   bool IsNested = HasNestArgument(&MF);

   if (CallingConvention == CallingConv::X86_FastCall ||
@@ -1598,3 +1603,131 @@
X86FrameLowering::adjustForSegmentedStacks(MachineFunction &MF) const
{
   MF.verify();
 #endif
 }
+
+// Erlang programs may need a special prologue to handle the stack size they
+// might need at runtime. That is because Erlang/OTP does not implement a C
+// stack but uses a custom implementation of hybrid stack/heap
+// architecture. (for more information see Eric Stenman's Ph.D. thesis:
+// http://publications.uu.se/uu/fulltext/nbn_se_uu_diva-2688.pdf)
+//
+//
+// CheckStack:
+// temp0 = sp - MaxStack
+// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
+// OldStart:
+// ...
+// IncStack:
+// call inc_stack   # doubles the stack space
+// temp0 = sp - MaxStack
+// if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
+void X86FrameLowering::adjustForHiPEPrologue(MachineFunction &MF) const {
+  const X86InstrInfo &TII = *TM.getInstrInfo();
+  const X86Subtarget *ST = &MF.getTarget().getSubtarget<X86Subtarget>();
+  MachineFrameInfo *MFI = MF.getFrameInfo();
+  const uint64_t SlotSize = TM.getDataLayout()->getPointerSize(0);
+  const bool Is64Bit = STI.is64Bit();

Tests indicate that you care about the x32 ABI, where sometimes you want to
know the data model and not the platform bitness. This applies to stack slot
size and to registers used for addresses. How does this affect your code?

+  DebugLoc DL;
+  // HiPE-specific values
+  const unsigned HipeLeafWords = 24;
+  const unsigned CCRegisteredArgs = Is64Bit ? 6 : 5;
+  const unsigned Guaranteed = HipeLeafWords * SlotSize;
+  const unsigned CallerStkArity =
+    std::max<int>(0, MF.getFunction()->arg_size() - CCRegisteredArgs);
+  unsigned MaxStack =
+    MFI->getStackSize() + CallerStkArity * SlotSize + SlotSize;
+
+  if (!ST->isTargetLinux())
+    report_fatal_error("HiPE prologue not supported on this platform.");

Don't you have an earlier place for this runtime test? I would assume it should
not get this far if HiPe codegen is attempted for non-Linux platforms. At this
point I'd expect an assertion.

+
+  // Make sure 'Guaranteed'-area of callee fits in caller's frame
+  if (MFI->hasCalls()) {
+    unsigned MoreStackForCalls = 0;
+    for(MachineFunction::iterator MBBI = MF.begin(), MBBE = MF.end();

for (MachineFunction

+        MBBI != MBBE; ++MBBI)
+      for (MachineBasicBlock::iterator MI = MBBI->begin(), ME = MBBI->end();
+          MI != ME; ++MI)
+            if(MI->isCall()) {

if (MI->isCall)
Similarly, fix this in other instances in the patch.

+                // Get callee operand

Indentation?

+                const MachineOperand &MO = MI->getOperand(0);
+                const Function *F;
+
+                // Only take account of global function calls (no closures etc)
+                if (!MO.isGlobal()) continue;
+                if (!(F = dyn_cast<Function>(MO.getGlobal()))) continue;
+
+                // Do not update MaxStack for primitive and built-in functions
+                if ((F->getName().find("erlang.") != std::string::npos) ||
+                    (F->getName().find("bif_") != std::string::npos)) continue;
+                if (F->getName().find_first_of("._") == std::string::npos)
+                  continue;
+
+                const uint64_t CalleeStkArity =
+                  std::max<int64_t>(0, F->arg_size()-CCRegisteredArgs);
+                MoreStackForCalls = std::max<int64_t>(
+                  MoreStackForCalls,
(HipeLeafWords-1-CalleeStkArity)*SlotSize);
+            }
+    MaxStack += MoreStackForCalls;
+  }
+

Some high-level comment explaining what's going on would be nice here.

+  if (MaxStack > Guaranteed) {
+    MachineBasicBlock &prologueMBB = MF.front();
+    MachineBasicBlock *stackCheckMBB = MF.CreateMachineBasicBlock();
+    MachineBasicBlock *incStackMBB = MF.CreateMachineBasicBlock();
+
+    for (MachineBasicBlock::livein_iterator i = prologueMBB.livein_begin(),
+           e = prologueMBB.livein_end(); i != e; i++) {

s/i/I/ stick to coding conventions (especially within a single function).

+      stackCheckMBB->addLiveIn(*i);
+      incStackMBB->addLiveIn(*i);
+    }
+
+    MF.push_front(incStackMBB);
+    MF.push_front(stackCheckMBB);
+
+    unsigned ScratchReg, SPReg, PReg, SPLimOffset;
+    unsigned LEAop, CMPop, CALLop;
+    if(Is64Bit) {
+      SPReg = X86::RSP;
+      PReg  = X86::RBP;
+      LEAop = X86::LEA64r;
+      CMPop = X86::CMP64rm;
+      CALLop = X86::CALL64pcrel32;
+      SPLimOffset = 0x48;
+    } else {
+      SPReg = X86::ESP;
+      PReg  = X86::EBP;
+      LEAop = X86::LEA32r;
+      CMPop = X86::CMP32rm;
+      CALLop = X86::CALLpcrel32;
+      SPLimOffset = 0x24;
+    }
+
+    ScratchReg = GetScratchRegister(Is64Bit, MF);
+    assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&
+           "HiPE prologue scratch register is live-in");
+
+    // Create new MBB for StackCheck:
+    addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(LEAop), ScratchReg),
+                 SPReg, false, -MaxStack);
+    // SPLimOffset is in a fixed heap location (pointed by BP).
+    addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(CMPop))
+                 .addReg(ScratchReg), PReg, false, SPLimOffset);
+    BuildMI(stackCheckMBB, DL, TII.get(X86::JAE_4)).addMBB(&prologueMBB);
+
+    // Create new MBB for IncStack:
+    BuildMI(incStackMBB, DL, TII.get(CALLop)).
+      addExternalSymbol("inc_stack_0");
+    addRegOffset(BuildMI(incStackMBB, DL, TII.get(LEAop), ScratchReg),
+                 SPReg, false, -MaxStack);
+    addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))
+                 .addReg(ScratchReg), PReg, false, SPLimOffset);
+    BuildMI(incStackMBB, DL, TII.get(X86::JLE_4)).addMBB(incStackMBB);
+
+    stackCheckMBB->addSuccessor(&prologueMBB, 99);
+    stackCheckMBB->addSuccessor(incStackMBB, 1);
+    incStackMBB->addSuccessor(&prologueMBB, 99);
+    incStackMBB->addSuccessor(incStackMBB, 1);
+  }
+#ifdef XDEBUG
+  MF.verify();
+#endif
+}
diff --git a/lib/Target/X86/X86FrameLowering.h
b/lib/Target/X86/X86FrameLowering.h
index dc515dc..c35d952 100644
--- a/lib/Target/X86/X86FrameLowering.h
+++ b/lib/Target/X86/X86FrameLowering.h
@@ -43,6 +43,8 @@ public:

   void adjustForSegmentedStacks(MachineFunction &MF) const;

+  void adjustForHiPEPrologue(MachineFunction &MF) const;
+
   void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
                                             RegScavenger *RS = NULL) const;

diff --git a/test/CodeGen/X86/hipe-prologue.ll
b/test/CodeGen/X86/hipe-prologue.ll
new file mode 100644
index 0000000..d78d087
--- /dev/null
+++ b/test/CodeGen/X86/hipe-prologue.ll
@@ -0,0 +1,63 @@
+; RUN: llc < %s -mcpu=generic -mtriple=i686-linux
-verify-machineinstrs | FileCheck %s -check-prefix=X32-Linux
+; RUN: llc < %s -mtriple=x86_64-linux-gnu -verify-machineinstrs |
FileCheck %s -check-prefix=X64-Linux
+
+; Just to prevent the alloca from being optimized away
+declare void @dummy_use(i32*, i32)
+
+define {i32, i32} @test_basic(i32 %hp, i32 %p) {
+  ; X32-Linux:       test_basic:
+  ; X32-Linux-NOT:   calll inc_stack_0
+
+  ; X64-Linux:       test_basic:
+  ; X64-Linux-NOT:   callq inc_stack_0
+
+  %mem = alloca i32, i32 10
+  call void @dummy_use (i32* %mem, i32 10)
+  %1 = insertvalue {i32, i32} undef, i32 %hp, 0
+  %2 = insertvalue {i32, i32} %1, i32 %p, 1
+  ret {i32, i32} %1
+}
+
+define cc 11 {i32, i32} @test_basic_hipecc(i32 %hp, i32 %p) {
+  ; X32-Linux:       test_basic_hipecc:
+  ; X32-Linux:       leal -156(%esp), %ebx
+  ; X32-Linux-NEXT:  cmpl 36(%ebp), %ebx
+  ; X32-Linux-NEXT:  jb .LBB1_1
+
+  ; X32-Linux:       ret
+
+  ; X32-Linux:       .LBB1_1:
+  ; X32-Linux-NEXT:  calll inc_stack_0
+
+  ; X64-Linux:       test_basic_hipecc:
+  ; X64-Linux:       leaq -232(%rsp), %r14
+  ; X64-Linux-NEXT:  cmpq 72(%rbp), %r14
+  ; X64-Linux-NEXT:  jb .LBB1_1
+
+  ; X64-Linux:       ret
+
+  ; X64-Linux:       .LBB1_1:
+  ; X64-Linux-NEXT:  callq inc_stack_0
+
+  %mem = alloca i32, i32 10
+  call void @dummy_use (i32* %mem, i32 10)
+  %1 = insertvalue {i32, i32} undef, i32 %hp, 0
+  %2 = insertvalue {i32, i32} %1, i32 %p, 1
+  ret {i32, i32} %2
+}
+
+define cc 11 {i32,i32,i32} @test_nocall_hipecc(i32 %hp,i32 %p,i32 %x,i32 %y) {
+  ; X32-Linux:       test_nocall_hipecc:
+  ; X32-Linux-NOT:   callq inc_stack_0
+
+  ; X64-Linux:       test_nocall_hipecc:
+  ; X64-Linux-NOT:   callq inc_stack_0
+
+  %1 = add i32 %x, %y
+  %2 = mul i32 42, %1
+  %3 = sub i32 24, %2
+  %4 = insertvalue {i32, i32, i32} undef, i32 %hp, 0
+  %5 = insertvalue {i32, i32, i32} %4, i32 %p, 1
+  %6 = insertvalue {i32, i32, i32} %5, i32 %p, 2
+  ret {i32, i32, i32} %6
+}



More information about the llvm-commits mailing list