[PATCH] D13757: [x86] promote 'add nsw' to a wider type to allow more combines

Wed Oct 14 16:51:40 PDT 2015

spatel created this revision.
spatel added reviewers: hfinkel, chandlerc, qcolombet, zansari.
spatel added a subscriber: llvm-commits.
Herald added a subscriber: aemerson.

The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

  void foo(int *a, int i) {
    a[i] = a[i+1] + a[i+2];
  }

It seems better to produce this (14 bytes):
  movslq	%esi, %rsi
  movl	0x4(%rdi,%rsi,4), %eax
  addl	0x8(%rdi,%rsi,4), %eax
  movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):
  leal	0x1(%rsi), %eax
  cltq             
  leal	0x2(%rsi), %ecx      
  movslq	%ecx, %rcx     
  movl	(%rdi,%rcx,4), %ecx
  addl	(%rdi,%rax,4), %ecx
  movslq	%esi, %rax       
  movl	%ecx, (%rdi,%rax,4)

But it wasn't clear to me where the fix(es) should go, so I tried several things: CodeGenPrepare, DAGCombiner, X86IselLowering, X86ISelDAGToDAG...and finally back to X86ISelLowering because that had the most effect for the least amount of patch. :)

I think the most basic problem (the first test case in the patch combines constants) could also be fixed in InstCombine, but it gets more complicated after that because we need to consider architecture and micro-architecture. For example, I don't think AArch64 sees any benefit from the more general transform because the ISA solves the sexting in hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

FWIW, I see no perf differences on test-suite with this change running on AMD Jaguar, and I see only very small code size improvements when building clang and the LLVM tools with the patched compiler. It would be great if someone could try this patch on a recent Intel model to see if it makes any difference. We may want to limit this to optimizing for size and/or modify FeatureSlowLEA if this is a bad change for Intel big cores.

http://reviews.llvm.org/D13757

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/add-nsw-sext.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D13757.37416.patch
Type: text/x-patch
Size: 6434 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151014/5d64b0c1/attachment.bin>