[PATCH] D141653: [X86] Improve instruction ordering of constant `srl/shl` with `and` to get better and-masks

Sat Jan 14 23:00:25 PST 2023

pengfei added inline comments.

================
Comment at: llvm/test/CodeGen/X86/bitreverse.ll:538
-; X64-NEXT:    andb $8, %al
-; X64-NEXT:    leal (%rdi,%rdi), %ecx
-; X64-NEXT:    andb $4, %cl
----------------
craig.topper wrote:
> pengfei wrote:
> > craig.topper wrote:
> > > pengfei wrote:
> > > > IIRC, LEA is expensive, so this looks like a good deal. 
> > > I thought LEA was only expensive if it uses 3 sources.
> > You are right. SOM says only 3 operand LEA is slow. But I found MCA cannot reflect the difference.
> > So this is yet another regression.
> > old: https://godbolt.org/z/dPW93v8zT
> > new: https://godbolt.org/z/hrEovKW4f
> Looks like the scheduler models make the distinction for AMD CPUs but not Intel
> 
> Small snippet of the predicate check:
> 
> ```
> X86SchedPredicates.td
> 61:def IsThreeOperandsLEAFn :
> 
> X86ScheduleBdVer2.td
> 560:    IsThreeOperandsLEAFn,
> 
> X86ScheduleBtVer2.td
> 945:    IsThreeOperandsLEAFn,
> 
> X86ScheduleZnver3.td
> 590:    IsThreeOperandsLEAFn,
> ```
Thanks for the information! We may need to do the same thing for Intel. Filed #60043 for it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141653/new/

https://reviews.llvm.org/D141653