[llvm] r195496 - X86: Perform integer comparisons at i32 or larger.
chandlerc at google.com
Mon Nov 25 23:29:06 PST 2013
On Mon, Nov 25, 2013 at 10:54 PM, Sean Silva <silvas at purdue.edu> wrote:
>> However, regardless of partial register stalls, the existence of false
>> dependencies seems just as problematic...
> I don't think we even emit code that tries to independently use subregs,
> so I'm not seeing where a subreg false dependency would arise. Using the
> subregs in a way that would result in a false dependency (e.g. `mov ah,
> [rdx]; add al, [rdi]; mov [rsi], ax`) seems like a thoroughly out-of-date
> practice. Maybe if we are in a loop with a bazillion i8 live variables such
> that the "high" i8 subregs are needed to avoid spilling, but that doesn't
> seem likely to happen in practice.
I think you're imagining a more subtle problem than the one in evidence.
When you manipulate a register feeding into 'cmpb ..., %al' you're creating
a false (as in, it won't be used) dependence on the high 24 bits. masking
in some way (typically by using zero-extended registers makes it more clear
that the processor can skip those high bits -- no down stream instruction
can end up using them, if the 'cmp' gets microcoded down to something like
a 'sub' things like overflow needn't be handled, etc.
But unless you can directly speak to the micro-architectural features which
we should be optimizing for here, I would either trust the benchmarks
cited, or perform your own... The two of us speculating back and forth
about what may or may not be the underlying reason why one instruction
stream is faster than another seems unlikely to make progress. However,
careful benchmarks or timings of specific patterns that are introduced or
removed which explain the performance swings with more precision than what
Jim or I have done thus far *would* make progress.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits