[llvm] r219022 - [x86] Adjust the patterns for lowering X86vzmovl nodes which don't

Thu Apr 2 16:44:04 PDT 2015

On Thu, Apr 2, 2015 at 4:21 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> On Thu, Apr 2, 2015 at 10:15 AM Sanjay Patel <spatel at rotateright.com>
> wrote:
>
>> Hi Chandler,
>>
>> This change adds code size (1-2 bytes per blend instruction) and doesn't
>> improve performance for chips other than Sandybridge and Haswell AFAICT,
>> but it was enabled for all conditions and targets.
>>
> I'm really suspicious about how much sense it makes to micro-optimize for
> code size in this way. It will be a lot of complexity for very little gain
> IMO.
>

Turn that around to see from a non-Haswell perspective: your patch added a
lot of complexity for a perf micro-optimization that applies to exactly 2
micro-architectures while costing the rest of the world a couple of bytes
[1] and providing no gain in throughput. I view SB and Haswell as the
anomalies here. I don't have any special knowledge, but I wonder if Intel's
follow-ons will have the movs* handicap...because there's really no excuse
from the hardware side to have that limitation. FWIW, icc 15 doesn't appear
to do this blend optimization when targeting SB or Haswell.

[1] Please also consider that the target systems that you care about may
have more *cache* than I have available system memory to play with. That
kind of limit will change your world view. Space is time after all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150402/61609ba9/attachment.html>