[llvm] r174696 - When Mips16 frames grow large, the immediate field may exceed the maximum

Fri Feb 8 04:43:33 PST 2013

On 02/07/2013 10:46 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Reed Kotler" <rkotler at mips.com>
>> To: llvm-commits at cs.uiuc.edu
>> Sent: Thursday, February 7, 2013 9:57:41 PM
>> Subject: [llvm] r174696 - When Mips16 frames grow large,	the immediate field may exceed the maximum
>>
>> Author: rkotler
>> Date: Thu Feb  7 21:57:41 2013
>> New Revision: 174696
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=174696&view=rev
>> Log:
>> When Mips16 frames grow large, the immediate field may exceed the
>> maximum
>> allowed size for the instruction. This code uses RegScavenger to fix
>> this.
>> We sometimes need 2 registers for Mips16 so we must handle things
>> differently than how register scavenger is normally used.
> Normally the reason that you can't get more than one register from the scavenger is that there is only one emergency spill slot. In your case, you're getting around this problem by not using the spill slot at all and simply "spilling" to other reserved registers?
>
>   -Hal
Exactly. Mips16 is a processor mode which is using a different 
instruction decoder in order to save space.
The full set of Mips32 (or Mips64) registers are available but only in a 
limited way when in Mips16 mode.
There are only 8 registers which can be accessed as general registers so 
there are many additional registers which would be free, if the 
processor were in mips32 mode.

At the present time, I use these additional registers in an ad hoc way 
as in this situation. In the future I hope to use them to avoid spills 
to memory in a more general way.

Interestingly enough, SP is not a Mips16 register but there are a set of 
instructions which implicitly use this register but I cannot for 
example, directly add SP and another Mips16 register together. This 
makes SP complicated to use when you are referencing elements on the 
stack and you deal with the more general case in which the local stack 
offsets can be large. Gcc decides to not deal with any of that 
complexity and they just copy SP to a Mips16 register when they need to 
do anything with the stack; like reference local stack data. This seems 
way too wasteful to me since there are so few registers; so I have gone 
the extra mile to deal with all the extra Sp induced complexities.

T8 is another non Mips16 register which is used implicitly in many 
conditional test and branch instructions.  I use it mostly as if it were 
a condition code register and try and avoid copying it back to a Mips16 
register. I had to write a lot of long patterns for various things like 
conditional moves so that sets of instructions that implicitly use T8 
can be strung together.

With good optimization, you should be able to save 40% on the memory 
footprint by using Mips16 mode as opposed to Mips32 mode.

You can more or less freely switch back and forth between Mips16 and 
Mips32 mode.

Micromips is another newer encoding which offers space savings too but 
without any of these Mips16 complexities because it is essentially 
identical to Mips32 mode at the .s level. But with micro mips, if you 
generate code without worrying about the differences, you will probably 
get little compression because it will be using the 32 bit variants much 
of the time.

Both mips16 and micro mips have both 16 and 32 bit length instructions.

>>
>> Added:
>>      llvm/trunk/test/CodeGen/Mips/largefr1.ll
>> Modified:
>>      llvm/trunk/lib/Target/Mips/Mips16InstrInfo.cpp
>>      llvm/trunk/lib/Target/Mips/Mips16InstrInfo.h
>>      llvm/trunk/lib/Target/Mips/Mips16RegisterInfo.cpp
>>
>> Modified: llvm/trunk/lib/Target/Mips/Mips16InstrInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/Mips16InstrInfo.cpp?rev=174696&r1=174695&r2=174696&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/Mips/Mips16InstrInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/Mips/Mips16InstrInfo.cpp Thu Feb  7
>> 21:57:41 2013
>> @@ -19,6 +19,7 @@
>>   #include "llvm/ADT/StringRef.h"
>>   #include "llvm/CodeGen/MachineInstrBuilder.h"
>>   #include "llvm/CodeGen/MachineRegisterInfo.h"
>> +#include "llvm/CodeGen/RegisterScavenging.h"
>>   #include "llvm/Support/CommandLine.h"
>>   #include "llvm/Support/ErrorHandling.h"
>>   #include "llvm/Support/TargetRegistry.h"
>> @@ -306,11 +307,79 @@ void Mips16InstrInfo::adjustStackPtr(uns
>>   /// This function generates the sequence of instructions needed to
>>   get the
>>   /// result of adding register REG and immediate IMM.
>>   unsigned
>> -Mips16InstrInfo::loadImmediate(int64_t Imm, MachineBasicBlock &MBB,
>> +Mips16InstrInfo::loadImmediate(unsigned FrameReg,
>> +                               int64_t Imm, MachineBasicBlock &MBB,
>>                                  MachineBasicBlock::iterator II,
>>                                  DebugLoc DL,
>> -                               unsigned *NewImm) const {
>> +                               unsigned &NewImm) const {
>> +  //
>> +  // given original instruction is:
>> +  // Instr rx, T[offset] where offset is too big.
>> +  //
>> +  // lo = offset & 0xFFFF
>> +  // hi = ((offset >> 16) + (lo >> 15)) & 0xFFFF;
>> +  //
>> +  // let T = temporary register
>> +  // li T, hi
>> +  // shl T, 16
>> +  // add T, Rx, T
>> +  //
>> +  RegScavenger rs;
>> +  int32_t lo = Imm & 0xFFFF;
>> +  int32_t hi = ((Imm >> 16) + (lo >> 15)) & 0xFFFF;
>> +  NewImm = lo;
>> +  unsigned Reg =0;
>> +  unsigned SpReg = 0;
>> +  rs.enterBasicBlock(&MBB);
>> +  rs.forward(II);
>> +  //
>> +  // we use T0 for the first register, if we need to save something
>> away.
>> +  // we use T1 for the second register, if we need to save something
>> away.
>> +  //
>> +  unsigned FirstRegSaved =0, SecondRegSaved=0;
>> +  unsigned FirstRegSavedTo = 0, SecondRegSavedTo = 0;
>>   
>> -  return 0;
>> +  Reg = rs.FindUnusedReg(&Mips::CPU16RegsRegClass);
>> +  if (Reg == 0) {
>> +    FirstRegSaved = Reg = Mips::V0;
>> +    FirstRegSavedTo = Mips::T0;
>> +    copyPhysReg(MBB, II, DL, FirstRegSavedTo, FirstRegSaved, true);
>> +  }
>> +  else
>> +    rs.setUsed(Reg);
>> +  BuildMI(MBB, II, DL, get(Mips::LiRxImmX16), Reg).addImm(hi);
>> +  BuildMI(MBB, II, DL, get(Mips::SllX16), Reg).addReg(Reg).
>> +    addImm(16);
>> +  if (FrameReg == Mips::SP) {
>> +    SpReg = rs.FindUnusedReg(&Mips::CPU16RegsRegClass);
>> +    if (SpReg == 0) {
>> +      if (Reg != Mips::V1) {
>> +        SecondRegSaved = SpReg = Mips::V1;
>> +        SecondRegSavedTo = Mips::T1;
>> +      }
>> +      else {
>> +        SecondRegSaved = SpReg = Mips::V0;
>> +        SecondRegSavedTo = Mips::T0;
>> +      }
>> +      copyPhysReg(MBB, II, DL, SecondRegSavedTo, SecondRegSaved,
>> true);
>> +    }
>> +    else
>> +      rs.setUsed(SpReg);
>> +
>> +    copyPhysReg(MBB, II, DL, SpReg, Mips::SP, false);
>> +    BuildMI(MBB, II, DL, get(Mips::  AdduRxRyRz16),
>> Reg).addReg(SpReg)
>> +      .addReg(Reg);
>> +  }
>> +  else
>> +    BuildMI(MBB, II, DL, get(Mips::  AdduRxRyRz16),
>> Reg).addReg(FrameReg)
>> +      .addReg(Reg, RegState::Kill);
>> +  if (FirstRegSaved || SecondRegSaved) {
>> +    II = llvm::next(II);
>> +    if (FirstRegSaved)
>> +      copyPhysReg(MBB, II, DL, FirstRegSaved, FirstRegSavedTo,
>> true);
>> +    if (SecondRegSaved)
>> +      copyPhysReg(MBB, II, DL, SecondRegSaved, SecondRegSavedTo,
>> true);
>> +  }
>> +  return Reg;
>>   }
>>   
>>   unsigned Mips16InstrInfo::GetAnalyzableBrOpc(unsigned Opc) const {
>>
>> Modified: llvm/trunk/lib/Target/Mips/Mips16InstrInfo.h
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/Mips16InstrInfo.h?rev=174696&r1=174695&r2=174696&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/Mips/Mips16InstrInfo.h (original)
>> +++ llvm/trunk/lib/Target/Mips/Mips16InstrInfo.h Thu Feb  7 21:57:41
>> 2013
>> @@ -77,12 +77,14 @@ public:
>>     void adjustStackPtr(unsigned SP, int64_t Amount, MachineBasicBlock
>>     &MBB,
>>                         MachineBasicBlock::iterator I) const;
>>   
>> -  /// Emit a series of instructions to load an immediate. If NewImm
>> is a
>> -  /// non-NULL parameter, the last instruction is not emitted, but
>> instead
>> -  /// its immediate operand is returned in NewImm.
>> -  unsigned loadImmediate(int64_t Imm, MachineBasicBlock &MBB,
>> +  /// Emit a series of instructions to load an immediate.
>> +  // This is to adjust some FrameReg. We return the new register to
>> be used
>> +  // in place of FrameReg and the adjusted immediate field (&NewImm)
>> +  //
>> +  unsigned loadImmediate(unsigned FrameReg,
>> +                         int64_t Imm, MachineBasicBlock &MBB,
>>                            MachineBasicBlock::iterator II, DebugLoc
>>                            DL,
>> -                         unsigned *NewImm) const;
>> +                         unsigned &NewImm) const;
>>   
>>   private:
>>     virtual unsigned GetAnalyzableBrOpc(unsigned Opc) const;
>>
>> Modified: llvm/trunk/lib/Target/Mips/Mips16RegisterInfo.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Mips/Mips16RegisterInfo.cpp?rev=174696&r1=174695&r2=174696&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/Mips/Mips16RegisterInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/Mips/Mips16RegisterInfo.cpp Thu Feb  7
>> 21:57:41 2013
>> @@ -1,3 +1,4 @@
>> +
>>   //===-- Mips16RegisterInfo.cpp - MIPS16 Register Information -==
>>   ----------===//
>>   //
>>   //                     The LLVM Compiler Infrastructure
>> @@ -12,6 +13,7 @@
>>   //===----------------------------------------------------------------------===//
>>   
>>   #include "Mips16RegisterInfo.h"
>> +#include "Mips16InstrInfo.h"
>>   #include "Mips.h"
>>   #include "Mips16InstrInfo.h"
>>   #include "MipsAnalyzeImmediate.h"
>> @@ -23,6 +25,7 @@
>>   #include "llvm/CodeGen/MachineFrameInfo.h"
>>   #include "llvm/CodeGen/MachineFunction.h"
>>   #include "llvm/CodeGen/MachineInstrBuilder.h"
>> +#include "llvm/CodeGen/MachineRegisterInfo.h"
>>   #include "llvm/CodeGen/ValueTypes.h"
>>   #include "llvm/DebugInfo.h"
>>   #include "llvm/IR/Constants.h"
>> @@ -140,6 +143,7 @@ void Mips16RegisterInfo::eliminateFI(Mac
>>     //   by adding the size of the stack:
>>     //   incoming argument, callee-saved register location or local
>>     variable.
>>     int64_t Offset;
>> +  bool IsKill = false;
>>     Offset = SPOffset + (int64_t)StackSize;
>>     Offset += MI.getOperand(OpNo + 1).getImm();
>>   
>> @@ -148,9 +152,14 @@ void Mips16RegisterInfo::eliminateFI(Mac
>>   
>>     if (!MI.isDebugValue() && ( ((FrameReg != Mips::SP) &&
>>     !isInt<16>(Offset)) ||
>>         ((FrameReg == Mips::SP) && !isInt<15>(Offset)) )) {
>> -    llvm_unreachable("frame offset does not fit in instruction");
>> +    MachineBasicBlock &MBB = *MI.getParent();
>> +    DebugLoc DL = II->getDebugLoc();
>> +    unsigned NewImm;
>> +    FrameReg = TII.loadImmediate(FrameReg, Offset, MBB, II, DL,
>> NewImm);
>> +    Offset = SignExtend64<16>(NewImm);
>> +    IsKill = true;
>>     }
>> -  MI.getOperand(OpNo).ChangeToRegister(FrameReg, false);
>> +  MI.getOperand(OpNo).ChangeToRegister(FrameReg, false, false,
>> IsKill);
>>     MI.getOperand(OpNo + 1).ChangeToImmediate(Offset);
>>   
>>   
>>
>> Added: llvm/trunk/test/CodeGen/Mips/largefr1.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/Mips/largefr1.ll?rev=174696&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/Mips/largefr1.ll (added)
>> +++ llvm/trunk/test/CodeGen/Mips/largefr1.ll Thu Feb  7 21:57:41 2013
>> @@ -0,0 +1,61 @@
>> +; RUN: llc -march=mipsel -mcpu=mips16 -mips16-hard-float -soft-float
>> -relocation-model=static < %s | FileCheck %s -check-prefix=1
>> +
>> + at i = common global i32 0, align 4
>> + at j = common global i32 0, align 4
>> + at .str = private unnamed_addr constant [8 x i8] c"%i %i \0A\00",
>> align 1
>> +
>> +define void @foo(i32* %p, i32 %i, i32 %j) nounwind {
>> +entry:
>> +  %p.addr = alloca i32*, align 4
>> +  %i.addr = alloca i32, align 4
>> +  %j.addr = alloca i32, align 4
>> +  store i32* %p, i32** %p.addr, align 4
>> +  store i32 %i, i32* %i.addr, align 4
>> +  store i32 %j, i32* %j.addr, align 4
>> +  %0 = load i32* %j.addr, align 4
>> +  %1 = load i32** %p.addr, align 4
>> +  %2 = load i32* %i.addr, align 4
>> +  %add.ptr = getelementptr inbounds i32* %1, i32 %2
>> +  store i32 %0, i32* %add.ptr, align 4
>> +  ret void
>> +}
>> +
>> +define i32 @main() nounwind {
>> +entry:
>> +; 1: main:
>> +; 1: 1: 	.word	-797992
>> +; 1:            li ${{[0-9]+}}, 12
>> +; 1:            sll ${{[0-9]+}}, ${{[0-9]+}}, 16
>> +; 1:            addu ${{[0-9]+}}, ${{[0-9]+}}, ${{[0-9]+}}
>> +; 2:            move $sp, ${{[0-9]+}}
>> +; 2:            addu ${{[0-9]+}}, ${{[0-9]+}}, ${{[0-9]+}}
>> +; 1:            li ${{[0-9]+}}, 6
>> +; 1:            sll ${{[0-9]+}}, ${{[0-9]+}}, 16
>> +; 1:            addu ${{[0-9]+}}, ${{[0-9]+}}, ${{[0-9]+}}
>> +; 2:            move $sp, ${{[0-9]+}}
>> +; 2:            addu ${{[0-9]+}}, ${{[0-9]+}}, ${{[0-9]+}}
>> +; 1:          	addiu	${{[0-9]+}}, ${{[0-9]+}}, 6800
>> +; 1: 	        li	${{[0-9]+}}, 1
>> +; 1:	        sll	${{[0-9]+}}, ${{[0-9]+}}, 16
>> +; 2: 	        li	${{[0-9]+}}, 34463
>> +  %retval = alloca i32, align 4
>> +  %one = alloca [100000 x i32], align 4
>> +  %two = alloca [100000 x i32], align 4
>> +  store i32 0, i32* %retval
>> +  %arrayidx = getelementptr inbounds [100000 x i32]* %one, i32 0,
>> i32 0
>> +  call void @foo(i32* %arrayidx, i32 50, i32 9999)
>> +  %arrayidx1 = getelementptr inbounds [100000 x i32]* %two, i32 0,
>> i32 0
>> +  call void @foo(i32* %arrayidx1, i32 99999, i32 5555)
>> +  %arrayidx2 = getelementptr inbounds [100000 x i32]* %one, i32 0,
>> i32 50
>> +  %0 = load i32* %arrayidx2, align 4
>> +  store i32 %0, i32* @i, align 4
>> +  %arrayidx3 = getelementptr inbounds [100000 x i32]* %two, i32 0,
>> i32 99999
>> +  %1 = load i32* %arrayidx3, align 4
>> +  store i32 %1, i32* @j, align 4
>> +  %2 = load i32* @i, align 4
>> +  %3 = load i32* @j, align 4
>> +  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
>> ([8 x i8]* @.str, i32 0, i32 0), i32 %2, i32 %3)
>> +  ret i32 0
>> +}
>> +
>> +declare i32 @printf(i8*, ...)
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>