[PATCH] PR 23155 - Improvement to X86 16 bit operation promotion for better performance.

Tue May 12 11:12:14 PDT 2015

Thanks for the support Chandler.  I am starting to work on this.

My initial thoughts are:

1 - A very late pass through the MachineInstrs that would be inserted as part of X86PassConfig::addPreEmitPass.

2 - Initially look for 8 bit and 16 bit operations that would be better expanded into 32 bit operations.
      - There could be some different reasons to do this
         a - Specifically for the case in PR23155 where false dependence potentially slows execution.
         b - Just in general for cases where partial registers may cost something (Intel X86 prior to Haswell)
         c - cases where code could be saved by using an equivalent 32 bit instruction, such as 16 bit instructions that
               would encode shorter as 32 bit. We want to do this very late to allow for folding memory operations into the 16 and 8 bit
              operations, and not rely on heuristics to try to predict about this.

If you have any comments or disagreements with that direction please let me know.

Kevin B. Smith

-----Original Message-----
From: Chandler Carruth [mailto:chandlerc at gmail.com] 
Sent: Tuesday, May 12, 2015 10:49 AM
To: Smith, Kevin B; qcolombet at apple.com; chisophugis at gmail.com; llvm-dev at redking.me.uk; Demikhovsky, Elena; spatel at rotateright.com
Cc: Kuperstein, Michael M; ahmed.bougacha at gmail.com; llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] PR 23155 - Improvement to X86 16 bit operation promotion for better performance.

In http://reviews.llvm.org/D9209#162887, @kbsmith1 wrote:

> From Chandler's comments in 22473:
>  We need to add a pass that replaces movb (and movw) with movzbl (and movzwl) when the destination is a register and the high bytes aren't used. Then we need to benchmark bzip2 to ensure that this recovers all of the performance that forcing the use of cmpl did, and probably some other sanity benchmarking. Then we can swap the cmpl formation for the movzbl formation.
>
> I am in agreement that this would be a good solution.  If you, Chandler, and Eric all like that direction, I will be willing to work on that.  I also have access to SPEC benchmarks, both 2000 and 2006 to be able to benchmark as well for bzip2 specifically since that is something the community considers important.

I would be *very* interested in this, and would love it if you could work on it. I suspect you're in a much better position to implement, document, and evaluate the results. We really need to kill the 'cmpl' hack that is currently used.

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D9209

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/