[llvm] r195496 - X86: Perform integer comparisons at i32 or larger.
silvas at purdue.edu
Tue Nov 26 09:01:58 PST 2013
On Tue, Nov 26, 2013 at 2:29 AM, Chandler Carruth <chandlerc at google.com>wrote:
> On Mon, Nov 25, 2013 at 10:54 PM, Sean Silva <silvas at purdue.edu> wrote:
>>> However, regardless of partial register stalls, the existence of false
>>> dependencies seems just as problematic...
>> I don't think we even emit code that tries to independently use subregs,
>> so I'm not seeing where a subreg false dependency would arise. Using the
>> subregs in a way that would result in a false dependency (e.g. `mov ah,
>> [rdx]; add al, [rdi]; mov [rsi], ax`) seems like a thoroughly out-of-date
>> practice. Maybe if we are in a loop with a bazillion i8 live variables such
>> that the "high" i8 subregs are needed to avoid spilling, but that doesn't
>> seem likely to happen in practice.
> I think you're imagining a more subtle problem than the one in evidence.
> When you manipulate a register feeding into 'cmpb ..., %al' you're creating
> a false (as in, it won't be used) dependence on the high 24 bits. masking
> in some way (typically by using zero-extended registers makes it more clear
> that the processor can skip those high bits -- no down stream instruction
> can end up using them, if the 'cmp' gets microcoded down to something like
> a 'sub' things like overflow needn't be handled, etc.
Ah, I see. So would it be accurate to say that this patch is more about
inserting the proper dependency clearing than about doing comparisons at
i32 width? (i.e. once the proper dependency clearing has been done, the
comparison might as well be done at i32 width)
> But unless you can directly speak to the micro-architectural features
> which we should be optimizing for here, I would either trust the benchmarks
> cited, or perform your own... The two of us speculating back and forth
> about what may or may not be the underlying reason why one instruction
> stream is faster than another seems unlikely to make progress. However,
> careful benchmarks or timings of specific patterns that are introduced or
> removed which explain the performance swings with more precision than what
> Jim or I have done thus far *would* make progress.
I trust the benchmarks. At this point I'm just trying to understand the
-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits