[PATCH] D141180: [X86] Only match BMI (BLSR, BLSI, BLSMSK) if the add/sub op is single use

Sat Jan 7 09:38:40 PST 2023

goldstein.w.n added inline comments.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.td:1272

+// An 'ineg' node with a single use.
+def ineg_su : PatFrag<(ops node:$v), (ineg node:$v), [{
----------------
pengfei wrote:
> goldstein.w.n wrote:
> > craig.topper wrote:
> > > Maybe we should have
> > > 
> > > ```
> > > `class binop_oneuse<SDPatternOperator operator>
> > >     : PatFrag<(ops node:$A, node:$B),
> > >               (operator node:$A, node:$B), [{
> > >   return N->hasOneUse();
> > > }]>;`
> > > 
> > > def add_su : binop_oneuse<add>;
> > > def and_su : binop_oneuse<and>;
> > > def srl_su : binop_oneuse<srl>;
> > > 
> > > class unop_oneuse<SDPatternOperator operator>
> > >     : PatFrag<(ops node:$A),
> > >               (operator node:$A), [{
> > >   return N->hasOneUse();
> > > }]>;
> > > 
> > > def ineg_su : unop_oneuse<ineg>;
> > > def trunc_su : unop_oneuse<trunc>;
> > > ```
> > > Maybe we should have
> > > 
> > > ```
> > > `class binop_oneuse<SDPatternOperator operator>
> > >     : PatFrag<(ops node:$A, node:$B),
> > >               (operator node:$A, node:$B), [{
> > >   return N->hasOneUse();
> > > }]>;`
> > > 
> > > def add_su : binop_oneuse<add>;
> > > def and_su : binop_oneuse<and>;
> > > def srl_su : binop_oneuse<srl>;
> > > 
> > > class unop_oneuse<SDPatternOperator operator>
> > >     : PatFrag<(ops node:$A),
> > >               (operator node:$A), [{
> > >   return N->hasOneUse();
> > > }]>;
> > > 
> > > def ineg_su : unop_oneuse<ineg>;
> > > def trunc_su : unop_oneuse<trunc>;
> > > ```
> > 
> > Makes sense, will do for V2.
> > 
> > Any thoughts on:
> > """
> > Note, this may cause more mov instructions to be emitted sometimes
> > because BMI instructions only have 1 src and write-only to dst. A
> > better approach may be to only avoid BMI for (and/xor X, (add/sub
> > 0/-1, X)) if this is the last use of X but NOT the last use of
> > (add/sub 0/-1, X).
> > """
> > ?
> Microarchitecture usually can eliminate mov instructions by register renaming, so we can consider no performance lost except for code size.
> Microarchitecture usually can eliminate mov instructions by register renaming, so we can consider no performance lost except for code size.

IIRC like 80% success rate and some targets like ICX have to disabled. Also still decode and presumably RR overhead. So unless its backend bound work think it will show up.

Note its 1 extra byte for 64-bit, 1 less byte for 32-bit.

I'll leave it as a todo for now I guess.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141180/new/

https://reviews.llvm.org/D141180