[PATCH] D142084: [RFC][X86][MemFold] Upgrade the mechanism of auto-generated Memory Folding Table

Wed Feb 15 20:18:00 PST 2023

yubing added inline comments.

================
Comment at: llvm/lib/Target/X86/X86MemFoldTables.inc:4009
   {X86::VMINSSZrr_Intkz, X86::VMINSSZrm_Intkz, TB_NO_REVERSE},
-  {X86::VMOVAPDZ128rrk, X86::VMOVAPDZ128rmk, TB_NO_REVERSE|TB_ALIGN_16},
-  {X86::VMOVAPDZ256rrk, X86::VMOVAPDZ256rmk, TB_NO_REVERSE|TB_ALIGN_32},
-  {X86::VMOVAPDZrrk, X86::VMOVAPDZrmk, TB_NO_REVERSE|TB_ALIGN_64},
-  {X86::VMOVAPSZ128rrk, X86::VMOVAPSZ128rmk, TB_NO_REVERSE|TB_ALIGN_16},
-  {X86::VMOVAPSZ256rrk, X86::VMOVAPSZ256rmk, TB_NO_REVERSE|TB_ALIGN_32},
-  {X86::VMOVAPSZrrk, X86::VMOVAPSZrmk, TB_NO_REVERSE|TB_ALIGN_64},
+  {X86::VMOVAPDZ128rrk, X86::VMOVAPDZ128rmk, TB_ALIGN_16},
+  {X86::VMOVAPDZ256rrk, X86::VMOVAPDZ256rmk, TB_ALIGN_32},
----------------
craig.topper wrote:
> yubing wrote:
> > craig.topper wrote:
> > > Would this allow unfolding a VMOVAPDZ128rmk to VMOVAPDZ128rrk + VMOVAPDZ128rm? That would be a bug.
> > @craig.topper 
> > https://discourse.llvm.org/t/auto-generate-the-memory-folding-tables/61100/19
> > i remember we always fold whole load intrinsics into masked arithmetic. does it happen to masked movereg instruction as well?(only wholeload+ movereg => mask.load)
> > 
> > besides, do we need to add TB_NO_REVERSE for {X86::VMOVAPDZ128rr, X86::VMOVAPDZ128rm, TB_ALIGN_16}?
> What I said previously applied to arithmetic that has loads folded into them.
> 
> VMOVAPDZ128rmk is created from a masked load with an isel pattern.
> 
> 
> VMOVAPDZ128rm will never be unfolded. We unfold to separate a loop invariant load from arithmetic. There’s no arithmetic folded in to it.. the only operands are address operands. They would all need to be loop invariant to hoist so the whole instruction is hoistable.
> 
> For VMOVAPDZ128rmk there is a mask operand and address operands. Without TB_NO_REVERSE we would try to unfold it if the address was loop invariant but the mask wasn’t. This would be incorrect. The mask may be masking off elements that will fault.
thanks for explanation. i also check the code in llvm/lib/CodeGen/MachineLICM.cpp

```
  // First check whether we should hoist this instruction.
  if (!IsLoopInvariantInst(*MI) || !IsProfitableToHoist(*MI)) {
    // If not, try unfolding a hoistable load.
    MI = ExtractHoistableLoad(MI);
    if (!MI) return false;
  }
```
since VMOVAPDZ128rmk((the address isloop invariant but the mask is not)) is not IsLoopInvariantInst, we can extract a wholeload from it if it is Without TB_NO_REVERSE. so what you said about VMOVAPDZ128rmk and VMOVAPDZ128rm  makes sense to me.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142084/new/

https://reviews.llvm.org/D142084