[PATCH] PR 23155 - Improvement to X86 16 bit operation promotion for better performance.

Tue May 12 14:55:57 PDT 2015

I'm not trying to insist on a separate pass vs. embedding this somewhere
else. We have stall and hazard fixing passes already?

Other than the specific place, I agree with the overall approach.

I also want to push for, where ever possible, choosing an encoding which
will be good across a wide range of processors. While it is nice when we
can build binaries for the *exact* right microarchitecture, we should
minimize the degree to which this changes code generation. If for no other
reason, it makes things really brittle and hard to predict. For example,
random and inexplicable performance swings due to irrelevant variations in
which loops are sitting on which cache lines. Consistency of encoding and
emission is really useful for making the performance of programs more
predictable and consistent over time here.

On Tue, May 12, 2015 at 12:16 PM Smith, Kevin B <kevin.b.smith at intel.com>
wrote:

> Thanks for the support Chandler.  I am starting to work on this.
>
> My initial thoughts are:
>
> 1 - A very late pass through the MachineInstrs that would be inserted as
> part of X86PassConfig::addPreEmitPass.
>
> 2 - Initially look for 8 bit and 16 bit operations that would be better
> expanded into 32 bit operations.
>       - There could be some different reasons to do this
>          a - Specifically for the case in PR23155 where false dependence
> potentially slows execution.
>          b - Just in general for cases where partial registers may cost
> something (Intel X86 prior to Haswell)
>          c - cases where code could be saved by using an equivalent 32 bit
> instruction, such as 16 bit instructions that
>                would encode shorter as 32 bit. We want to do this very
> late to allow for folding memory operations into the 16 and 8 bit
>               operations, and not rely on heuristics to try to predict
> about this.
>
> If you have any comments or disagreements with that direction please let
> me know.
>
> Kevin B. Smith
>
> -----Original Message-----
> From: Chandler Carruth [mailto:chandlerc at gmail.com]
> Sent: Tuesday, May 12, 2015 10:49 AM
> To: Smith, Kevin B; qcolombet at apple.com; chisophugis at gmail.com;
> llvm-dev at redking.me.uk; Demikhovsky, Elena; spatel at rotateright.com
> Cc: Kuperstein, Michael M; ahmed.bougacha at gmail.com;
> llvm-commits at cs.uiuc.edu
> Subject: Re: [PATCH] PR 23155 - Improvement to X86 16 bit operation
> promotion for better performance.
>
> In http://reviews.llvm.org/D9209#162887, @kbsmith1 wrote:
>
> > From Chandler's comments in 22473:
> >  We need to add a pass that replaces movb (and movw) with movzbl (and
> movzwl) when the destination is a register and the high bytes aren't used.
> Then we need to benchmark bzip2 to ensure that this recovers all of the
> performance that forcing the use of cmpl did, and probably some other
> sanity benchmarking. Then we can swap the cmpl formation for the movzbl
> formation.
> >
> > I am in agreement that this would be a good solution.  If you, Chandler,
> and Eric all like that direction, I will be willing to work on that.  I
> also have access to SPEC benchmarks, both 2000 and 2006 to be able to
> benchmark as well for bzip2 specifically since that is something the
> community considers important.
>
>
> I would be *very* interested in this, and would love it if you could work
> on it. I suspect you're in a much better position to implement, document,
> and evaluate the results. We really need to kill the 'cmpl' hack that is
> currently used.
>
>
> REPOSITORY
>   rL LLVM
>
> http://reviews.llvm.org/D9209
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150512/f6e26f78/attachment.html>