[LLVMbugs] [Bug 11951] New: tblgen generation of Intrinsics.gen produces bloated code and slow compiles of LLVM

Wed Feb 8 16:21:35 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=11951

             Bug #: 11951
           Summary: tblgen generation of Intrinsics.gen produces bloated
                    code and slow compiles of LLVM
           Product: libraries
           Version: 1.0
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Core LLVM classes
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: clattner at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Currently, LLVM emits the global intrinsics tables to
include/llvm/Intrinsics.gen.  This file is included by Intrinsics.h to get the
list of enums for intrinsics, and by Function.cpp to implement various
Instrinsic:: methods.

There are two problems here: first, Intrinsics.gen is very large (44K LOC and
growing with # intrinsics) which slows down compilation of anything that
includes Intrinsics.h.  We should split this into an
include/llvm/Intrinsics.gen file that only has the enum list used by
Intrinsics.h, and a file in lib/VMCore that is used by Function.cpp to
implement Intrinsics.

The second problem is that the code generated for use by Function.cpp is huge,
resulting in Function.o being > 500K in Release builds (optimized, no
assertions).  This is largely due to huge bloat that could be easily fixed with
some straight-forward changes.

The first big problem is that there are very large switch statements on the
entire enum table, causing large jump tables to be emitted.  These should be
changed into array lookups.  For example, Intrinsic::getAttributes contains a
huge switch over the entire intrinsics set with only 8 uniqued implementations,
i.e.:

  switch (id) {
    default: break;
  case Intrinsic::arm_get_fpscr:
  case Intrinsic::arm_neon_vabds:
  case Intrinsic::arm_neon_vabdu:
  ... many many more...
    AWI[0] = AttributeWithIndex::get(~0,
Attribute::NoUnwind|Attribute::ReadNone);
    NumAttrs = 1;
    break;
  case Intrinsic::adjust_trampoline:
  ... many many more...
    AWI[0] = AttributeWithIndex::get(~0,
Attribute::NoUnwind|Attribute::ReadOnly);
    NumAttrs = 1;
    break;
   ... etc ..

  This could be better implemented by changing it into an array lookup to
densify these indexes, something like this:

  static const uint8_t IntrinsicToAttributesMap[] = { 0, 0, 0, 1, 0, 1, ... };

  switch (IntrinsicToAttributesMap[id]) {
  case 0:
    AWI[0] = AttributeWithIndex::get(~0,
Attribute::NoUnwind|Attribute::ReadNone);
    NumAttrs = 1;
    break;
  case 1:
     AWI[0] = AttributeWithIndex::get(~0,
Attribute::NoUnwind|Attribute::ReadOnly);
    NumAttrs = 1;
    break;
  .. 6 more cases ..
  }

This same thing should be done for Verifier::visitIntrinsicFunctionCall,
GET_INTRINSIC_GENERATOR and GET_INTRINSIC_MODREF_BEHAVIOR.

This is a win because we're going from a jump table entry in a switch table
(4/8 bytes) to a single byte in an array.  However, the next step is to take
all of these parallel arrays (+ GET_INTRINSIC_OVERLOAD_TABLE, a single bit) and
merge them all together into one big array that merges them all into a single
array of structs, where the struct fields are bitfields to pack them together
more tightly.  At this point, the huge table for GET_INTRINSIC_OVERLOAD_TABLE
would only use one bit for intrinsic, a substantial savings.

-Chris

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.