[llvm-commits] [pr1225] Fix using instcombine instead of instsimplify

Rafael EspĂ­ndola rafael.espindola at gmail.com
Wed Mar 28 18:43:01 PDT 2012


I was able to reproduce the slowdown that Chad found. The problem was
simply ComputeMaskedBits being too slow for instsimplify. When looking
at where to add it to instcombine I was surprised that it already had
code that should handle it:

-----------------------------------------------------------------------------------------
 case Instruction::And:
    // If either the LHS or the RHS are Zero, the result is zero.
   if (SimplifyDemandedBits(I->getOperandUse(1), DemandedMask,
                             RHSKnownZero, RHSKnownOne, Depth+1) ||
        SimplifyDemandedBits(I->getOperandUse(0), DemandedMask & ~RHSKnownZero,
                             LHSKnownZero, LHSKnownOne, Depth+1))
      return I;
...
    // If all of the demanded bits are known 1 on one side, return the other.
    // These bits cannot contribute to the result of the 'and'.
....
---------------------------------------------------------------------------------------

It was not working because of the '& ~RHSKnownZero'.  The RHS was 1,
so we would ask for information about the lsb bit and fail to see that
all the other bits where known to be zero.

It is very important that we pass the reduced mask to
SimplifyDemandedBits since it enables it to do optimizations that are
not valid for the other bits, so we cannot just drop the and.

I tried adding a second call to ComputeMaskedBits with the original
mask just after the calls to SimplifyDemandedBits. This worked, but
still had a noticeable performance impact.

What worked in the end was changing ComputeMaskedBits to always
compute information for all the bits. That way we can keep passing the
more restrictive mask to SimplifyDemandedBits and if it fails to
simplify LHS it will still provide us with all known bit values.

To my surprise, the new binary is actually faster on md5.c which was
the main regression before. I timed the cc1 command with

$ perf stat -r 20  clang -cc1 .....  md5.c

With 153587 I got

        161.765055 task-clock                #    0.995 CPUs utilized
          ( +-  0.42% )
                17 context-switches          #    0.000 M/sec
          ( +-  1.25% )
                 0 CPU-migrations            #    0.000 M/sec
             4,102 page-faults               #    0.025 M/sec
          ( +-  0.01% )
       544,351,774 cycles                    #    3.365 GHz
          ( +-  0.32% ) [83.17%]
       195,122,197 stalled-cycles-frontend   #   35.84% frontend
cycles idle     ( +-  0.97% ) [83.24%]
       188,725,755 stalled-cycles-backend    #   34.67% backend
cycles idle     ( +-  1.29% ) [66.87%]
       789,656,027 instructions              #    1.45  insns per cycle
                                             #    0.25  stalled cycles
per insn  ( +-  0.17% ) [83.46%]
       192,960,948 branches                  # 1192.847 M/sec
          ( +-  0.18% ) [83.67%]
         3,615,660 branch-misses             #    1.87% of all
branches          ( +-  0.15% ) [83.37%]

       0.162502223 seconds time elapsed
          ( +-  0.43% )

With this patch I got


        154.122258 task-clock                #    0.994 CPUs utilized
          ( +-  0.30% )
                16 context-switches          #    0.000 M/sec
          ( +-  1.01% )
                 0 CPU-migrations            #    0.000 M/sec
          ( +-100.00% )
             4,106 page-faults               #    0.027 M/sec
          ( +-  0.01% )
       520,205,829 cycles                    #    3.375 GHz
          ( +-  0.21% ) [83.17%]
       186,374,126 stalled-cycles-frontend   #   35.83% frontend
cycles idle     ( +-  0.66% ) [83.19%]
       180,994,552 stalled-cycles-backend    #   34.79% backend
cycles idle     ( +-  0.98% ) [66.75%]
       746,848,373 instructions              #    1.44  insns per cycle
                                             #    0.25  stalled cycles
per insn  ( +-  0.19% ) [83.45%]
       183,100,102 branches                  # 1188.019 M/sec
          ( +-  0.19% ) [83.63%]
         3,557,222 branch-misses             #    1.94% of all
branches          ( +-  0.19% ) [83.52%]

       0.155021208 seconds time elapsed
          ( +-  0.30% )

I have also tested the patch with a 3 stage bootstrap and by running
the test-suite.

Cheers,
Rafael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.patch
Type: application/octet-stream
Size: 54314 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120328/642cd827/attachment.obj>


More information about the llvm-commits mailing list