[LLVMdev] ARMv4T Copy Lowering

Jonathan Roelofs jonathan at codesourcery.com
Wed Aug 20 11:01:56 PDT 2014


Jim/Tim/Renato,

A few days ago (has it been weeks now?) we discussed a codegen problem on
armv4t having to do with lo->lo register copies.  I'd like to start that
discussion again, this time with a patch.

A brief summary of the problem for folks who didn't catch the discussion
earlier, and those like me who forget what they ate for breakfast: ;]
The mov instruction on armv4t (specifically, thumb1) is limited in that it
cannot copy from lo register to lo register, but the register allocator assumes
it can do whatever it wants. This means that copies get emitted which don't
work on that architecture, leading to broken code even for trivial cases like:
"void foo(int a, int b) { bar(b, a); }".

There are several different options which we discussed, and various trade-offs
for each (roughly ordered in amount of work to implement):
    1)  push {$src} ; pop {$dst}
         The idea here is that since push & pop are relatively unrestricted in
         which registers they can operate on, we can use the stack to do the
         lo -> lo copy. The big drawback here is performance (both code size and
         speed).  On the other hand, it looks 'obviously' correct.  The attached
         patch implements this.
    2)  mov hi, $src ; mov $dst, hi
         Here, we copy through some high register, but the options on which one
         are a bit limited:
            r8  - Callee save (in ARM mode, unused in Thumb mode)
            r9  - IIRC, this one is a platform defined special register?
            r10 - Caller save (in ARM mode, unused in Thumb mode)
            r11 - Caller save (in ARM mode, unused in Thumb mode)
            r12 - Linker Scratch
         Callee save regs are basically a non-starter, and r9 is not available
         on all platforms so that leaves r10-12.  I tired using r12, but I think
         this clashes with assumptions made by the register scavenger that it can
         use that register.  I haven't tried using r10 & r11.
           At the time when we discussed this, it was suggested that I just let
         the register scavenger take care of it. I tried that unsuccessfully,
         and I don't remember what the hang-up was.
    3)  movs $dsdt, $src
         This one clobbers the cpsr. This one works for a lot of cases, but
         fails for phi's that end up between, say a cmp and a branch at the
         bottom of a loop.  To use this one, we'd need liveness information at
         the location of the copy in order to ensure that we don't clobber cpsr
         when it's actually going to be used.  I went looking for liveness
         information, and found a function that looked like it calculated it for
         a small window around a MachineBasicBlock::iterator... that seems like
         it would work if I told it that the window was the whole basic block. Am
         I missing a better way to get post-RA liveness?  I haven't tried
         implementing this.
    4)  Do some combination of the others, and tell the register allocator
        that non-cpsr-clobbering lo-lo copies are slower, and that it should
        avoid them.

I've got the patch ready that implements (1). Does it make sense to commit this
(or something close to it) in the near term to fix the bug, and let someone
else worry about performance later? (Possibly me, but it will probably be a
a while before I come back to this. Hopefully this thread serves as a nice
bread-crumb for anyone who is interested)


Cheers,

Jon

-- 
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
-------------- next part --------------
Index: lib/Target/ARM/Thumb1InstrInfo.cpp
===================================================================
--- lib/Target/ARM/Thumb1InstrInfo.cpp	(revision 436473)
+++ lib/Target/ARM/Thumb1InstrInfo.cpp	(revision 437773)
@@ -1,67 +1,82 @@
 //===-- Thumb1InstrInfo.cpp - Thumb-1 Instruction Information -------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains the Thumb-1 implementation of the TargetInstrInfo class.
 //
 //===----------------------------------------------------------------------===//
 
+#include "ARMSubtarget.h"
 #include "Thumb1InstrInfo.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/MC/MCInst.h"
 
 using namespace llvm;
 
 Thumb1InstrInfo::Thumb1InstrInfo(const ARMSubtarget &STI)
   : ARMBaseInstrInfo(STI), RI(STI) {
 }
 
 /// getNoopForMachoTarget - Return the noop instruction to use for a noop.
 void Thumb1InstrInfo::getNoopForMachoTarget(MCInst &NopInst) const {
   NopInst.setOpcode(ARM::tMOVr);
   NopInst.addOperand(MCOperand::CreateReg(ARM::R8));
   NopInst.addOperand(MCOperand::CreateReg(ARM::R8));
   NopInst.addOperand(MCOperand::CreateImm(ARMCC::AL));
   NopInst.addOperand(MCOperand::CreateReg(0));
 }
 
 unsigned Thumb1InstrInfo::getUnindexedOpcode(unsigned Opc) const {
   return 0;
 }
 
 void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
                                   MachineBasicBlock::iterator I, DebugLoc DL,
                                   unsigned DestReg, unsigned SrcReg,
                                   bool KillSrc) const {
-  AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
-    .addReg(SrcReg, getKillRegState(KillSrc)));
+  // Need to check the arch.
+  MachineFunction &MF = *MBB.getParent();
+  const ARMSubtarget &st = MF.getTarget().getSubtarget<ARMSubtarget>();
+
   assert(ARM::GPRRegClass.contains(DestReg, SrcReg) &&
          "Thumb1 can only copy GPR registers");
+
+  if (st.hasV6Ops() || ARM::hGPRRegClass.contains(SrcReg)
+      || ! ARM::tGPRRegClass.contains(DestReg)) {
+    AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+      .addReg(SrcReg, getKillRegState(KillSrc)));
+  } else {
+    // This is the only way we can move low -> low for < v6.
+    AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tPUSH)))
+      .addReg(SrcReg, getKillRegState(KillSrc));
+    AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tPOP)))
+      .addReg(DestReg, getDefRegState(true));
+  }
 }
 
 void Thumb1InstrInfo::
 storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
                     unsigned SrcReg, bool isKill, int FI,
                     const TargetRegisterClass *RC,
                     const TargetRegisterInfo *TRI) const {
   assert((RC == &ARM::tGPRRegClass ||
           (TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
            isARMLowRegister(SrcReg))) && "Unknown regclass!");
 
   if (RC == &ARM::tGPRRegClass ||
       (TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
        isARMLowRegister(SrcReg))) {
     DebugLoc DL;
     if (I != MBB.end()) DL = I->getDebugLoc();
 
     MachineFunction &MF = *MBB.getParent();
     MachineFrameInfo &MFI = *MF.getFrameInfo();
     MachineMemOperand *MMO =



More information about the llvm-dev mailing list