[llvm-commits] CVS: llvm/lib/Target/X86/InstSelectSimple.cpp

Chris Lattner lattner at cs.uiuc.edu
Wed Feb 25 01:02:03 PST 2004


Changes in directory llvm/lib/Target/X86:

InstSelectSimple.cpp updated: 1.177 -> 1.178

---
Log message:

Teach the instruction selector how to transform 'array' GEP computations into X86 
scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret 

This is useful for things like int/float/double arrays, as the indexing can be folded into 
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)



---
Diffs of the changes:  (+23 -24)

Index: llvm/lib/Target/X86/InstSelectSimple.cpp
diff -u llvm/lib/Target/X86/InstSelectSimple.cpp:1.177 llvm/lib/Target/X86/InstSelectSimple.cpp:1.178
--- llvm/lib/Target/X86/InstSelectSimple.cpp:1.177	Wed Feb 25 00:13:04 2004
+++ llvm/lib/Target/X86/InstSelectSimple.cpp	Wed Feb 25 01:00:55 2004
@@ -2438,11 +2438,30 @@
       assert(idx->getType() == Type::LongTy && "Bad GEP array index!");
 
       // If idx is a constant, fold it into the offset.
+      unsigned TypeSize = TD.getTypeSize(SqTy->getElementType());
       if (ConstantSInt *CSI = dyn_cast<ConstantSInt>(idx)) {
-        Disp += TD.getTypeSize(SqTy->getElementType())*CSI->getValue();
+        Disp += TypeSize*CSI->getValue();
       } else {
-        // If we can't handle it, return.
-        return;
+        // If the index reg is already taken, we can't handle this index.
+        if (IndexReg) return;
+
+        // If this is a size that we can handle, then add the index as 
+        switch (TypeSize) {
+        case 1: case 2: case 4: case 8:
+          // These are all acceptable scales on X86.
+          Scale = TypeSize;
+          break;
+        default:
+          // Otherwise, we can't handle this scale
+          return;
+        }
+
+        if (CastInst *CI = dyn_cast<CastInst>(idx))
+          if (CI->getOperand(0)->getType() == Type::IntTy ||
+              CI->getOperand(0)->getType() == Type::UIntTy)
+            idx = CI->getOperand(0);
+
+        IndexReg = MBB ? getReg(idx, MBB, IP) : 1;
       }
 
       GEPOps.pop_back();        // Consume a GEP operand
@@ -2456,7 +2475,7 @@
   // FIXME: When addressing modes are more powerful/correct, we could load
   // global addresses directly as 32-bit immediates.
   assert(BaseReg == 0);
-  BaseReg = MBB ? getReg(GEPOps[0], MBB, IP) : 0;
+  BaseReg = MBB ? getReg(GEPOps[0], MBB, IP) : 1;
   GEPOps.pop_back();        // Consume the last GEP operand
 }
 
@@ -2538,26 +2557,6 @@
       }
       break;                // we are now done
 
-    } else if (const StructType *StTy = dyn_cast<StructType>(GEPTypes.back())) {
-      // It's a struct access.  CUI is the index into the structure,
-      // which names the field. This index must have unsigned type.
-      const ConstantUInt *CUI = cast<ConstantUInt>(GEPOps.back());
-      GEPOps.pop_back();        // Consume a GEP operand
-      GEPTypes.pop_back();
-
-      // Use the TargetData structure to pick out what the layout of the
-      // structure is in memory.  Since the structure index must be constant, we
-      // can get its value and use it to find the right byte offset from the
-      // StructLayout class's list of structure member offsets.
-      unsigned idxValue = CUI->getValue();
-      unsigned FieldOff = TD.getStructLayout(StTy)->MemberOffsets[idxValue];
-      if (FieldOff) {
-        unsigned Reg = makeAnotherReg(Type::UIntTy);
-        // Emit an ADD to add FieldOff to the basePtr.
-        BMI(MBB, IP, X86::ADDri32, 2, TargetReg).addReg(Reg).addZImm(FieldOff);
-        --IP;            // Insert the next instruction before this one.
-        TargetReg = Reg; // Codegen the rest of the GEP into this
-      }
     } else {
       // It's an array or pointer access: [ArraySize x ElementType].
       const SequentialType *SqTy = cast<SequentialType>(GEPTypes.back());





More information about the llvm-commits mailing list