[PATCH] D46662: [X86] condition branches folding for three-way conditional codes

Sun Mar 31 07:09:31 PDT 2019

lebedev.ri added a comment.

In D46662#1442345 <https://reviews.llvm.org/D46662#1442345>, @andreadb wrote:

> In D46662#1293043 <https://reviews.llvm.org/D46662#1293043>, @lebedev.ri wrote:
>
> > In D46662#1246810 <https://reviews.llvm.org/D46662#1246810>, @andreadb wrote:
> >
> > > In D46662#1246781 <https://reviews.llvm.org/D46662#1246781>, @xur wrote:
> > >
> > > > Hi Andrea,
> > > >
> > > > Thanks for running this test, and the explanation. Can you run the tests
> > > >  on Bulldozer/Ryzen? I don't have access to these platforms. If I need to do
> > > >  this in subtarget way, it would be good to know the performance there.
> > >
> > >
> > > CC'ing @lebedev.ri  and @GGanesh.
> > >  They should be able to help you with running those tests on Bulldozer/Ryzen. Unfortunately, I don't have access to those machines.
> >
> >
> > I *think* this should be fine on bdver2, as per https://www.agner.org/optimize/microarchitecture.pdf:
> >
> >   19.15 Branches and loops
> >   The branch prediction mechanism is described on page 34. There is no longer any
> >   restriction on the number of branches per 16 bytes of code that can be predicted efficiently.
> >   The misprediction penalty is quite high because of a long pipeline.
> >
> >
> >
> >
> > In D46662#1246550 <https://reviews.llvm.org/D46662#1246550>, @andreadb wrote:
> >
> > > ...
> > >  Bench: 4evencases.cc
> > >  ...
> > >  Bench: 15evencases.cc
> > >  ...
> > >  I wouldn't be surprised if instead this patch improves the performance of code on other big AMD cores like Bulldozer/ryzen.
> >
> >
> > Are these benchmarks available from somewhere? Can i run them
>
>
> Sorry Roman,
>  I completely missed that comment.
>
> Those two benchmarks were attached by Xur to this code review.
>  You should be able to see the attachments if you expand the “Show Older Changes” section (there is a link at the top of this review).
>  One of his posts has got 3 attachments. Two of these files are the benchmarks to run.

Aha! Not sure how i did not find those. Thank you!

> I hope it helps.
> 
>> 
>> 
>>> -Andrea
>> 
>> Roman
> 
> 
> 
> In D46662#1293043 <https://reviews.llvm.org/D46662#1293043>, @lebedev.ri wrote:
> 
>> In D46662#1246810 <https://reviews.llvm.org/D46662#1246810>, @andreadb wrote:
>>
>> > In D46662#1246781 <https://reviews.llvm.org/D46662#1246781>, @xur wrote:
>> >
>> > > Hi Andrea,
>> > >
>> > > Thanks for running this test, and the explanation. Can you run the tests
>> > >  on Bulldozer/Ryzen? I don't have access to these platforms. If I need to do
>> > >  this in subtarget way, it would be good to know the performance there.
>> >
>> >
>> > CC'ing @lebedev.ri  and @GGanesh.
>> >  They should be able to help you with running those tests on Bulldozer/Ryzen. Unfortunately, I don't have access to those machines.
>>
>>
>> I *think* this should be fine on bdver2, as per https://www.agner.org/optimize/microarchitecture.pdf:
>>
>>   19.15 Branches and loops
>>   The branch prediction mechanism is described on page 34. There is no longer any
>>   restriction on the number of branches per 16 bytes of code that can be predicted efficiently.
>>   The misprediction penalty is quite high because of a long pipeline.
>>

Measurements (n=25) say that `15 cases` improves (avg: -0.18%, median: -0.35%),
and `4 cases` appears to improve (avg: -0.03%, median: **+**0.07%)
I will submit a patch.

>> In D46662#1246550 <https://reviews.llvm.org/D46662#1246550>, @andreadb wrote:
>> 
>>> ...
>>>  Bench: 4evencases.cc
>>>  ...
>>>  Bench: 15evencases.cc
>>>  ...
>>>  I wouldn't be surprised if instead this patch improves the performance of code on other big AMD cores like Bulldozer/ryzen.
>> 
>> 
>> Are these benchmarks available from somewhere? Can i run them somehow?
>> 
>>> -Andrea
>> 
>> Roman

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D46662/new/

https://reviews.llvm.org/D46662