[PATCH] D141653: [X86] Improve instruction ordering of constant `srl/shl` with `and` to get better and-masks

Sun Jan 15 19:05:22 PST 2023

goldstein.w.n added inline comments.

================
Comment at: llvm/test/CodeGen/X86/btc_bts_btr.ll:988
 ; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andb $7, %cl
 ; X86-NEXT:    shlb $2, %cl
----------------
goldstein.w.n wrote:
> goldstein.w.n wrote:
> > pengfei wrote:
> > > One more `andb`
> > > One more `andb`
> > 
> > Sorry, didn't see earlier. Think this breaks a few combine patterns that probably relied on semi-canonicalizarion of (and (shift x, y), z) instead of the other way around.
> > 
> > Fixing the cases caught here is doable, but is it possible to maybe move this to later in the process? Currently it's guarded behind AfterLegalize (for the exact reason of not breaking other patterns), but is there a higher level or different pass this might be better placed in?
> Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?
> Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?

================
Comment at: llvm/test/CodeGen/X86/btc_bts_btr.ll:988
 ; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andb $7, %cl
 ; X86-NEXT:    shlb $2, %cl
----------------
craig.topper wrote:
> pengfei wrote:
> > goldstein.w.n wrote:
> > > goldstein.w.n wrote:
> > > > goldstein.w.n wrote:
> > > > > pengfei wrote:
> > > > > > One more `andb`
> > > > > > One more `andb`
> > > > > 
> > > > > Sorry, didn't see earlier. Think this breaks a few combine patterns that probably relied on semi-canonicalizarion of (and (shift x, y), z) instead of the other way around.
> > > > > 
> > > > > Fixing the cases caught here is doable, but is it possible to maybe move this to later in the process? Currently it's guarded behind AfterLegalize (for the exact reason of not breaking other patterns), but is there a higher level or different pass this might be better placed in?
> > > > Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?
> > > > Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?
> > > 
> > > 
> > No, I don't. Maybe do it in a peephole pass after ISel?
> We have something similarish implemented during isel in `X86DAGToDAGISel::tryShrinkShlLogicImm`.
> We have something similarish implemented during isel in `X86DAGToDAGISel::tryShrinkShlLogicImm`.

I did in fact try moving it there and it did clean up some of the missed optimizations but added some (more severe) other missed optimizations.

================
Comment at: llvm/test/CodeGen/X86/btc_bts_btr.ll:988
 ; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    andb $7, %cl
 ; X86-NEXT:    shlb $2, %cl
----------------
goldstein.w.n wrote:
> craig.topper wrote:
> > pengfei wrote:
> > > goldstein.w.n wrote:
> > > > goldstein.w.n wrote:
> > > > > goldstein.w.n wrote:
> > > > > > pengfei wrote:
> > > > > > > One more `andb`
> > > > > > > One more `andb`
> > > > > > 
> > > > > > Sorry, didn't see earlier. Think this breaks a few combine patterns that probably relied on semi-canonicalizarion of (and (shift x, y), z) instead of the other way around.
> > > > > > 
> > > > > > Fixing the cases caught here is doable, but is it possible to maybe move this to later in the process? Currently it's guarded behind AfterLegalize (for the exact reason of not breaking other patterns), but is there a higher level or different pass this might be better placed in?
> > > > > Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?
> > > > > Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?
> > > > 
> > > > 
> > > No, I don't. Maybe do it in a peephole pass after ISel?
> > We have something similarish implemented during isel in `X86DAGToDAGISel::tryShrinkShlLogicImm`.
> > We have something similarish implemented during isel in `X86DAGToDAGISel::tryShrinkShlLogicImm`.
> 
> I did in fact try moving it there and it did clean up some of the missed optimizations but added some (more severe) other missed optimizations.
> No, I don't. Maybe do it in a peephole pass after ISel?

Is there an example file / pass you could point to for me to emulate?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141653/new/

https://reviews.llvm.org/D141653