[llvm] r195496 - X86: Perform integer comparisons at i32 or larger.
silvas at purdue.edu
Mon Nov 25 22:54:56 PST 2013
On Tue, Nov 26, 2013 at 1:26 AM, Chandler Carruth <chandlerc at google.com>wrote:
> On Mon, Nov 25, 2013 at 10:22 PM, Sean Silva <silvas at purdue.edu> wrote:
>> On Mon, Nov 25, 2013 at 4:48 PM, Jim Grosbach <grosbach at apple.com> wrote:
>>> A few select examples I’m seeing are: 256.bzip2 improves by 7%.
>>> 401.bzip2 improves by 4.5%. 300.twolf improves by 3%. 186.crafty improves
>>> by 4%. The details vary, but this is true for both Ivy Bridge and Haswell
>>> in particular.
>> Hmm... on second thought, do these programs use lots of i16's? Agner
>> reports that on Ivy Bridge and Haswell there is no partial register access
>> cost for the i8 low subregs. He doesn't seem to mention anything about
>> 16-bit, so I assume that the partial register stall is still there for the
>> i16 subregs??? I don't have an Ivy Bridge or Haswell to test on
>> unfortunately :(
> The primary performance problem Jim and I both looked at here were with i8
Strange. Do you have an example?
> However, regardless of partial register stalls, the existence of false
> dependencies seems just as problematic...
I don't think we even emit code that tries to independently use subregs, so
I'm not seeing where a subreg false dependency would arise. Using the
subregs in a way that would result in a false dependency (e.g. `mov ah,
[rdx]; add al, [rdi]; mov [rsi], ax`) seems like a thoroughly out-of-date
practice. Maybe if we are in a loop with a bazillion i8 live variables such
that the "high" i8 subregs are needed to avoid spilling, but that doesn't
seem likely to happen in practice.
-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits