[PATCH] D142084: [RFC][X86][MemFold] Upgrade the mechanism of auto-generated Memory Folding Table
Bing Yu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 15 20:18:00 PST 2023
yubing added inline comments.
================
Comment at: llvm/lib/Target/X86/X86MemFoldTables.inc:4009
{X86::VMINSSZrr_Intkz, X86::VMINSSZrm_Intkz, TB_NO_REVERSE},
- {X86::VMOVAPDZ128rrk, X86::VMOVAPDZ128rmk, TB_NO_REVERSE|TB_ALIGN_16},
- {X86::VMOVAPDZ256rrk, X86::VMOVAPDZ256rmk, TB_NO_REVERSE|TB_ALIGN_32},
- {X86::VMOVAPDZrrk, X86::VMOVAPDZrmk, TB_NO_REVERSE|TB_ALIGN_64},
- {X86::VMOVAPSZ128rrk, X86::VMOVAPSZ128rmk, TB_NO_REVERSE|TB_ALIGN_16},
- {X86::VMOVAPSZ256rrk, X86::VMOVAPSZ256rmk, TB_NO_REVERSE|TB_ALIGN_32},
- {X86::VMOVAPSZrrk, X86::VMOVAPSZrmk, TB_NO_REVERSE|TB_ALIGN_64},
+ {X86::VMOVAPDZ128rrk, X86::VMOVAPDZ128rmk, TB_ALIGN_16},
+ {X86::VMOVAPDZ256rrk, X86::VMOVAPDZ256rmk, TB_ALIGN_32},
----------------
craig.topper wrote:
> yubing wrote:
> > craig.topper wrote:
> > > Would this allow unfolding a VMOVAPDZ128rmk to VMOVAPDZ128rrk + VMOVAPDZ128rm? That would be a bug.
> > @craig.topper
> > https://discourse.llvm.org/t/auto-generate-the-memory-folding-tables/61100/19
> > i remember we always fold whole load intrinsics into masked arithmetic. does it happen to masked movereg instruction as well?(only wholeload+ movereg => mask.load)
> >
> > besides, do we need to add TB_NO_REVERSE for {X86::VMOVAPDZ128rr, X86::VMOVAPDZ128rm, TB_ALIGN_16}?
> What I said previously applied to arithmetic that has loads folded into them.
>
> VMOVAPDZ128rmk is created from a masked load with an isel pattern.
>
>
> VMOVAPDZ128rm will never be unfolded. We unfold to separate a loop invariant load from arithmetic. There’s no arithmetic folded in to it.. the only operands are address operands. They would all need to be loop invariant to hoist so the whole instruction is hoistable.
>
> For VMOVAPDZ128rmk there is a mask operand and address operands. Without TB_NO_REVERSE we would try to unfold it if the address was loop invariant but the mask wasn’t. This would be incorrect. The mask may be masking off elements that will fault.
thanks for explanation. i also check the code in llvm/lib/CodeGen/MachineLICM.cpp
```
// First check whether we should hoist this instruction.
if (!IsLoopInvariantInst(*MI) || !IsProfitableToHoist(*MI)) {
// If not, try unfolding a hoistable load.
MI = ExtractHoistableLoad(MI);
if (!MI) return false;
}
```
since VMOVAPDZ128rmk((the address isloop invariant but the mask is not)) is not IsLoopInvariantInst, we can extract a wholeload from it if it is Without TB_NO_REVERSE. so what you said about VMOVAPDZ128rmk and VMOVAPDZ128rm makes sense to me.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142084/new/
https://reviews.llvm.org/D142084
More information about the llvm-commits
mailing list