[PATCH][AArch64] Enable sign check optimization by CSE
Jiangning Liu
liujiangning1 at gmail.com
Thu Jul 31 19:27:19 PDT 2014
Hi Sergey,
I got the following performance result for running time on cortex-A57,
spec.cpu2000.ref.176_gcc 1.11%
spec.cpu2000.ref.177_mesa 2.09%
spec.cpu2000.ref.179_art 2.10%
spec.cpu2000.ref.254_gap 1.27%
spec.cpu2000.ref.256_bzip2 2.80%
spec.cpu2006.ref.401_bzip2 1.50%
spec.cpu2006.ref.403_gcc 2.54%
spec.cpu2006.ref.453_povray 2.60%
spec.cpu2006.ref.456_hmmer 1.58%
spec.cpu2000.ref.186_crafty -2.65%
spec.cpu2000.ref.197_parser -1.32%
spec.cpu2006.ref.400_perlbench -1.73%
spec.cpu2006.ref.433_milc -2.49%
spec.cpu2006.ref.464_h264ref -1.34%
So it looks we have larger number of regressions than improvements on spec
benchmark.
For 256.bzip2, I find a lot of cases of comparing with constant 1, so
finally the code like below in function hbMakeCodeLengths,
subs w12, w2, #0x1
b.lt 988 <hbMakeCodeLengths+0xd0>
your patch will transform it into
cmp w2, #0x0
b.le 98c <hbMakeCodeLengths+0xd4>
sub w12, w2, #0x1
I'm not sure this is the root cause of regression, but obviously this
change is not a good one.
Could you please have a look?
Thanks,
-Jiangning
2014-07-31 9:55 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
> Hi Sergey,
>
> I noticed a lot of code changes for spec2000/256.bzip2 after applying your
> patch, so I'd like to see the performance impact before committing the code.
>
> Thanks,
> -Jiangning
>
>
> 2014-07-30 15:25 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
>
> Hi Jiangning,
>>
>> Great! Could you please commit the patch? I don't have write access.
>>
>> Thanks,
>> Sergey
>>
>> On Tue, Jul 29, 2014 at 06:37:09PM -0700, Jiangning Liu wrote:
>> > Hi Sergey,
>> > Looks good to me now!
>> > Thanks,
>> > -Jiangning
>> >
>> > 2014-07-29 17:51 GMT+08:00 Sergey Dmitrouk <
>> sdmitrouk at accesssoftek.com>:
>> >
>> > Hi Jiangning,
>> > > I think it would be better if you can simply check sequential
>> b.xx
>> > > instructions like,
>> >
>> > Thanks, updated that, I though it won't be changed as
>> "lor.lhs.false" is
>> > in the .ll file.
>> > > I found your patch would trigger two failures for llvm regression
>> > test,
>> > > test/CodeGen/AArch64/arm64-ccmp.ll
>> > > test/CodeGen/AArch64/arm64-cse.ll
>> > > Have you tried that?
>> >
>> > Sorry, I did run all test before, but maybe failed to do this with
>> > correct
>> > version of llc binary.
>> >
>> > "test/CodeGen/AArch64/arm64-ccmp.ll" required a small update in
>> > conditional check, which doesn't seem to affect anything.
>> >
>> > "test/CodeGen/AArch64/arm64-cse.ll" explicitly checks for old
>> behaviour
>> > and
>> > assumes comparison with 1. A Modified to perform two separate
>> checks:
>> > A 1. No CSE for that case when comparing with 1.
>> > A 2. CSE for all other cases.
>> > It seems to be fine as comparison with zero should be more
>> frequent. A I
>> > guess we can't support both kinds of optimizations at the same
>> time.
>> >
>> > By the way, thanks for you patience.
>> >
>> > Cheers,
>> > Sergey
>> > On Mon, Jul 28, 2014 at 09:32:40PM -0700, Jiangning Liu wrote:
>> > > A A Hi Sergey,
>> > > A A I found your patch would trigger two failures for llvm
>> regression
>> > test,
>> > > A A test/CodeGen/AArch64/arm64-ccmp.ll
>> > > A A test/CodeGen/AArch64/arm64-cse.ll
>> > > A A Have you tried that?
>> > > A A Thanks,
>> > > A A -Jiangning
>> > >
>> > > A A 2014-07-29 10:02 GMT+08:00 Jiangning Liu
>> > <liujiangning1 at gmail.com>:
>> > >
>> > > A A A Hi Sergey,
>> > > A A A I just noticed your test case is checking the "comment"
>> in
>> > assembly
>> > > A A A code. I think it would be better if you can simply check
>> > sequential b.xx
>> > > A A A instructions like,
>> > > A A A ; CHECK: cmp
>> > > A A A ; CHECK: b.lt
>> > > A A A ; CHECK-NOT: cmp
>> > > A A A ; CHECK: b.le
>> > > A A A This way the test can be more stable, because the
>> internal
>> > symbol like
>> > > A A A %lor.lhs.false can be easily changed from time to time.
>> > > A A A Thanks,
>> > > A A A -Jiangning
>> > >
>> > > A A A 2014-07-28 16:40 GMT+08:00 Sergey Dmitrouk
>> > <sdmitrouk at accesssoftek.com>:
>> > >
>> > > A A A A Hi Jiangning,
>> > >
>> > > A A A A You're right, I just thought it might make sense to
>> leave a
>> > chance of
>> > > A A A A catching illegal immediate values after optimizations,
>> > which is
>> > > A A A A actually
>> > > A A A A not something to consider in this case with one and
>> zero.
>> > >
>> > > A A A A Also, maybe
>> > >
>> > > A A A A A A A A C = (RHS.getValueType() == MVT::i32) ?
>> (uint32_t)(C
>> > - 1) : (C -
>> > > A A A A 1);
>> > >
>> > > A A A A worth changing to
>> > >
>> > > A A A A A A A A C = 0ULL;
>> > >
>> > > A A A A It's left from my tries to generalize optimization,
>> but it
>> > might be
>> > > A A A A useful if someone will add additional condition.
>> > >
>> > > A A A A Thanks,
>> > > A A A A Sergey
>> > > A A A A On Sun, Jul 27, 2014 at 10:30:45PM -0700, Jiangning
>> Liu
>> > wrote:
>> > > A A A A > A A A Hi Sergey,
>> > > A A A A > A A A Would it be more clear on logic if you move
>> that
>> > piece of code
>> > > A A A A to be
>> > > A A A A > A A A inside the else branch of "if
>> > (!isLegalArithImmed(C)) { ... }
>> > > A A A A else {...}"?
>> > > A A A A > A A A I think it would be clear that (C==1) is
>> legal, but
>> > we still
>> > > A A A A want to
>> > > A A A A > A A A optimize it, and any more optimization cases
>> > falling into
>> > > A A A A "legal" scenario
>> > > A A A A > A A A would be easily added in future.
>> > > A A A A > A A A I will be OK if you can simply make this
>> change.
>> > > A A A A > A A A Thanks,
>> > > A A A A > A A A -Jiangning
>> > > A A A A >
>> > > A A A A > A A A 2014-07-25 15:21 GMT+08:00 Sergey Dmitrouk
>> > > A A A A <sdmitrouk at accesssoftek.com>:
>> > > A A A A >
>> > > A A A A > A A A A A Hi Jiangning,
>> > > A A A A > A A A A A > Did you forget to attach your new patch?
>> > > A A A A >
>> > > A A A A > A A A A A Yes, sorry for that. A It's attached now.
>> > > A A A A > A A A A A > Hopefully we can understand why this
>> could
>> > happen, but
>> > > A A A A maybe this is
>> > > A A A A > A A A A A just
>> > > A A A A > A A A A A > a heuristic result depending on the real
>> > control flow and
>> > > A A A A workload.
>> > > A A A A >
>> > > A A A A > A A A A A As that code is common for all supported
>> > architectures,
>> > > A A A A maybe it is
>> > > A A A A > A A A A A important for only some of them and
>> doesn't
>> > cause
>> > > A A A A significant
>> > > A A A A > A A A A A performance
>> > > A A A A > A A A A A changes for others?
>> > > A A A A >
>> > > A A A A > A A A A A Best regards,
>> > > A A A A > A A A A A Sergey
>> > > A A A A > A A A A A On Thu, Jul 24, 2014 at 07:05:43PM -0700,
>> > Jiangning Liu
>> > > A A A A wrote:
>> > > A A A A > A A A A A > A A A Hi Sergey,
>> > > A A A A > A A A A A > A A A Did you forget to attach your new
>> > patch?
>> > > A A A A > A A A A A > A A A I tried the spec benchmarks by
>> > disabling that code
>> > > A A A A of inverting
>> > > A A A A > A A A A A the
>> > > A A A A > A A A A A > A A A condition, and only see the
>> following
>> > performance
>> > > A A A A changes and no
>> > > A A A A > A A A A A change
>> > > A A A A > A A A A A > A A A for all others.
>> > > A A A A > A A A A A > A A A 164.gzip (ref) A +1.76%
>> > > A A A A > A A A A A > A A A 458.sjeng (train) + 2.19%
>> > > A A A A > A A A A A > A A A 471.omnetpp (train) -1.43%
>> > > A A A A > A A A A A > A A A 473.astar (train) -1.51%
>> > > A A A A > A A A A A > A A A Hopefully we can understand why
>> this
>> > could happen,
>> > > A A A A but maybe this
>> > > A A A A > A A A A A is just
>> > > A A A A > A A A A A > A A A a heuristic result depending on
>> the
>> > real control
>> > > A A A A flow and
>> > > A A A A > A A A A A workload.
>> > > A A A A > A A A A A > A A A Thanks,
>> > > A A A A > A A A A A > A A A -Jiangning
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A 2014-07-24 16:17 GMT+08:00 Sergey
>> > Dmitrouk
>> > > A A A A > A A A A A <sdmitrouk at accesssoftek.com>:
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A Hello Jiangning,
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A Thanks for your comments.
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > 1) I expect the fix should
>> be
>> > insideA
>> > > A A A A getAArch64Cmp.
>> > > A A A A > A A A A A > A A A A A >...
>> > > A A A A > A A A A A > A A A A A > but essentially
>> getAArch64Cmp
>> > missed the case
>> > > A A A A of (x < 1) ->
>> > > A A A A > A A A A A (x <= 0).
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A It was initial placement of
>> the
>> > fix, but the
>> > > A A A A function doesn't
>> > > A A A A > A A A A A seem to
>> > > A A A A > A A A A A > A A A A A perform transformations like
>> that.
>> > A It updates
>> > > A A A A conditions
>> > > A A A A > A A A A A only when
>> > > A A A A > A A A A A > A A A A A immediate values are not
>> legal. A
>> > There is no
>> > > A A A A comment for the
>> > > A A A A > A A A A A function,
>> > > A A A A > A A A A A > A A A A A so I'm not sure whether such
>> checks
>> > fit there,
>> > > A A A A but I moved the
>> > > A A A A > A A A A A change.
>> > > A A A A > A A A A A > A A A A A > 2) Your comment is
>> inconsistent
>> > with your
>> > > A A A A code.
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A Thanks, it's probably because
>> of
>> > inverted
>> > > A A A A conditions in DAGs.
>> > > A A A A > A A A A A > A A A A A > So now I'm wondering how to
>> > justify this is
>> > > A A A A always
>> > > A A A A > A A A A A meaningful for
>> > > A A A A > A A A A A > A A A A A AArch64?
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A I wasn't sure whether it's
>> worth
>> > such change,
>> > > A A A A but as an option
>> > > A A A A > A A A A A something
>> > > A A A A > A A A A A > A A A A A like
>> > > A A A A TargetLowering::isInversionBeneficial(SDValue Cond)
>> can
>> > > A A A A > A A A A A be added,
>> > > A A A A > A A A A A > A A A A A but I don't know whether it's
>> > possible to check
>> > > A A A A for conditions
>> > > A A A A > A A A A A like
>> > > A A A A > A A A A A > A A A A A "(a < 0 && b == c || a > 0 &&
>> b ==
>> > d)" to do not
>> > > A A A A block
>> > > A A A A > A A A A A inversion for all
>> > > A A A A > A A A A A > A A A A A cases.
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A Attached updated patch at
>> least to
>> > see whether
>> > > A A A A the fix fits
>> > > A A A A > A A A A A well in
>> > > A A A A > A A A A A > A A A A A getAArch64Cmp().
>> > > A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A Regards,
>> > > A A A A > A A A A A > A A A A A Sergey
>> > > A A A A > A A A A A > A A A A A On Wed, Jul 23, 2014 at
>> 10:05:37PM
>> > -0700,
>> > > A A A A Jiangning Liu wrote:
>> > > A A A A > A A A A A > A A A A A > A A A Hi Sergey,
>> > > A A A A > A A A A A > A A A A A > A A A 1) I expect the fix
>> should
>> > be insideA
>> > > A A A A getAArch64Cmp.
>> > > A A A A > A A A A A > A A A A A > A A A 2) Your comment is
>> > inconsistent with
>> > > A A A A your code. Your
>> > > A A A A > A A A A A code is to
>> > > A A A A > A A A A A > A A A A A transform
>> > > A A A A > A A A A A > A A A A A > A A A (x < 1) to be (x<=0),
>> > rather than "Turn
>> > > A A A A "x > 1"
>> > > A A A A > A A A A A condition into "x
>> > > A A A A > A A A A A > A A A A A >= 0"".
>> > > A A A A > A A A A A > A A A A A > A A A I also noticed we
>> have the
>> > following
>> > > A A A A transformation
>> > > A A A A > A A A A A for if
>> > > A A A A > A A A A A > A A A A A condition (x <
>> > > A A A A > A A A A A > A A A A A > A A A 0) in back-end,
>> > > A A A A > A A A A A > A A A A A > A A A stage 1: (x < 0) ->
>> (x >=
>> > 0), i.e. (x<0)
>> > > A A A A and invert
>> > > A A A A > A A A A A the targets.
>> > > A A A A > A A A A A > A A A A A > A A A stage 2: (x >= 0) ->
>> (x >
>> > -1). This
>> > > A A A A happens in
>> > > A A A A > A A A A A combine1.
>> > > A A A A > A A A A A > A A A A A > A A A stage 3: (x > -1) ->
>> (x >=
>> > 0) in
>> > > A A A A getAArch64Cmp.
>> > > A A A A > A A A A A > A A A A A > A A A For if condition (x >
>> 0),
>> > the
>> > > A A A A transformation is
>> > > A A A A > A A A A A similar. Your
>> > > A A A A > A A A A A > A A A A A patch is
>> > > A A A A > A A A A A > A A A A A > A A A trying to cover this
>> case,
>> > but
>> > > A A A A essentially
>> > > A A A A > A A A A A getAArch64Cmp missed
>> > > A A A A > A A A A A > A A A A A the case
>> > > A A A A > A A A A A > A A A A A > A A A of (x < 1) -> (x <=
>> 0).
>> > > A A A A > A A A A A > A A A A A > A A A However, as you can
>> see the
>> > root cause
>> > > A A A A of generating
>> > > A A A A > A A A A A the
>> > > A A A A > A A A A A > A A A A A comparison with
>> > > A A A A > A A A A A > A A A A A > A A A constant 1 is stage
>> 1. This
>> > happens
>> > > A A A A > A A A A A > A A A A A > A A A insideA
>> > > A A A A SelectionDAGBuilder::visitSwitchCase
>> > > A A A A > A A A A A > A A A A A > A A A A A // If the lhs
>> block is
>> > the next
>> > > A A A A block, invert the
>> > > A A A A > A A A A A condition
>> > > A A A A > A A A A A > A A A A A so that we
>> > > A A A A > A A A A A > A A A A A > A A A can
>> > > A A A A > A A A A A > A A A A A > A A A A A // fall through
>> to the
>> > lhs instead
>> > > A A A A of the rhs
>> > > A A A A > A A A A A block.
>> > > A A A A > A A A A A > A A A A A > A A A A A if (CB.TrueBB ==
>> > NextBlock) {
>> > > A A A A > A A A A A > A A A A A > A A A A A A A
>> > std::swap(CB.TrueBB,
>> > > A A A A CB.FalseBB);
>> > > A A A A > A A A A A > A A A A A > A A A A A A A SDValue True =
>> > > A A A A DAG.getConstant(1,
>> > > A A A A > A A A A A Cond.getValueType());
>> > > A A A A > A A A A A > A A A A A > A A A A A A A Cond =
>> > DAG.getNode(ISD::XOR, dl,
>> > > A A A A > A A A A A Cond.getValueType(),
>> > > A A A A > A A A A A > A A A A A Cond, True);
>> > > A A A A > A A A A A > A A A A A > A A A A A }
>> > > A A A A > A A A A A > A A A A A > A A A So now I'm wondering
>> how to
>> > justify this
>> > > A A A A is always
>> > > A A A A > A A A A A meaningful for
>> > > A A A A > A A A A A > A A A A A AArch64?
>> > > A A A A > A A A A A > A A A A A > A A A Thanks,
>> > > A A A A > A A A A A > A A A A A > A A A -Jiangning
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A 2014-07-23 23:54
>> GMT+08:00
>> > Sergey
>> > > A A A A Dmitrouk
>> > > A A A A > A A A A A > A A A A A <sdmitrouk at accesssoftek.com>:
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A Hi,
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A Basing on the
>> following
>> > information
>> > > A A A A from [this
>> > > A A A A > A A A A A post][0] by
>> > > A A A A > A A A A A > A A A A A James Molloy:
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A A A 2. "if (a < 0
>> && b
>> > == c || a > 0
>> > > A A A A && b == d)" -
>> > > A A A A > A A A A A the first
>> > > A A A A > A A A A A > A A A A A comparison
>> > > A A A A > A A A A A > A A A A A > A A A A A of
>> > > A A A A > A A A A A > A A A A A > A A A A A A A 'a' against
>> zero is
>> > done twice,
>> > > A A A A when the flag
>> > > A A A A > A A A A A results of
>> > > A A A A > A A A A A > A A A A A the first
>> > > A A A A > A A A A A > A A A A A > A A A A A A A comparison
>> could be
>> > used for the
>> > > A A A A second
>> > > A A A A > A A A A A comparison.
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A I've made a patch
>> > (attached) that
>> > > A A A A removes this
>> > > A A A A > A A A A A extra
>> > > A A A A > A A A A A > A A A A A comparison. A More
>> > > A A A A > A A A A A > A A A A A > A A A A A complex cases like
>> > comparisons with
>> > > A A A A non-zero
>> > > A A A A > A A A A A immediate values
>> > > A A A A > A A A A A > A A A A A or with
>> > > A A A A > A A A A A > A A A A A > A A A A A registers doesn't
>> seem
>> > to be task
>> > > A A A A for a code
>> > > A A A A > A A A A A generator. A
>> > > A A A A > A A A A A > A A A A A Comparing with
>> > > A A A A > A A A A A > A A A A A > A A A A A zero is quite
>> common,
>> > so I seems to
>> > > A A A A be worth
>> > > A A A A > A A A A A adding.
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A Please review the
>> > patch. A Couldn't
>> > > A A A A find a better
>> > > A A A A > A A A A A place to
>> > > A A A A > A A A A A > A A A A A make the
>> > > A A A A > A A A A A > A A A A A > A A A A A change, but I'll
>> be
>> > happy to adjust
>> > > A A A A the patch if
>> > > A A A A > A A A A A anyone has
>> > > A A A A > A A A A A > A A A A A better
>> > > A A A A > A A A A A > A A A A A > A A A A A ideas.
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A Best regards,
>> > > A A A A > A A A A A > A A A A A > A A A A A Sergey
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A 0:
>> > > A A A A > A A A A A > A A A A A
>> > > A A A
>> > A http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
>> > > A A A A > A A A A A > A A A A A >
>> > > A A A A > A A A A A > A A A A A > A A A A A
>> > > A A A A _______________________________________________
>> > > A A A A > A A A A A > A A A A A > A A A A A llvm-commits
>> mailing
>> > list
>> > > A A A A > A A A A A > A A A A A > A A A A A
>> > llvm-commits at cs.uiuc.edu
>> > > A A A A > A A A A A > A A A A A > A A A A A
>> > > A A A A > A A A A A
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140801/ed7f7344/attachment.html>
More information about the llvm-commits
mailing list