[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.

Nadav Rotem nrotem at apple.com
Tue Nov 19 08:36:56 PST 2013


I checked Agner’s instruction table and it looks like on Sandybridge SHLD is *very* efficient. So, let’s commit the patch as is. 

Thanks,
Nadav 

On Nov 18, 2013, at 10:37 AM, Katya Romanova <Katya_Romanova at playstation.sony.com> wrote:

> 
>  Hi Nadav,
> 
>  Thanks for looking into this!
> 
>  There were several reasons for adding FeatureSlowSHLD:
> 
>  (1)  I don't really know which Intel architectures have very poor latency for shld/shrd. Based on my friend's performance measurements it seems that Ivy Bridge microarchitecture is a good candidate, but that's still needs to be confirmed (that's why I even haven't changed the code for Ivy Bridge). I have a feeling that all other modern Intel processors will fall into this category as well. However, I don't want to change the code  purely based on my "feelings". So far, I haven't heard a recommendation from a person who is intimately familiar with Intel's architecture. I'd rather do the change for Intel when I'm 100% sure or let someone else who cares about performance of shld/shrd on any of the Intel's processors (and who knows what he is doing :)) to make this change. After this patch, changing the code to disable this folding for any particular processor will be very easy (just a couple of lines of code). I've put a FIXME comment in the code, mentioning that we might makes sense 
> to disable this folding for Intel, so there is a clue in the code.
> 
> 
>  (2) Consistency. There are similar features (e.g. FeatureSlowBTMem), that are enabled for all modern Intel and AMD processors, but these features still exist (I suspect for a reason).
> 
>  (3) Having FeatureSlowSHLD is a more flexible approach. Even assuming that shld/shrd instructions indeed have very high latency for all modern Intel's processors, we still should respect "older" processors and make the support for the new ones easier (what if new AMD fixes shld issue for their next gen processor?).
> 
>  (4) Someone wrote this folding in the past... I suspect that before writing this code, that person made sure that this folding is beneficial. Of course, it might have happened a while ago and was applicable to the "older" processors.
> 
>  Katya.
>  Katya.
> 
> http://llvm-reviews.chandlerc.com/D2177





More information about the llvm-commits mailing list