[PATCH] D150526: [X86] Add X86FixupVectorConstantsPass to re-fold AVX512 vector load folds as broadcast folds

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun May 14 09:19:46 PDT 2023


RKSimon added inline comments.


================
Comment at: llvm/lib/Target/X86/X86FixupVectorConstants.cpp:38
+  StringRef getPassName() const override {
+    return "X86 Fixup Vector Constants";
+  }
----------------
goldstein.w.n wrote:
> Since this is AVX512 only, maybe it should be "X86 Fixup Vector AVX512 Constants"? Likewise for the filename.
No, as I said in the summary, the intention is to expand this to handle broadcast/extensions in future patches.


================
Comment at: llvm/lib/Target/X86/X86FixupVectorConstants.cpp:265
+  switch (Opc) {
+  // Foldable AVX512 FP Logic Ops
+  // Attempt to use smaller width broadcast-folds.
----------------
goldstein.w.n wrote:
> Is it only for logic ops? Why can't we do something like `add`, `sub`, etc... iff we can `rebuildSplatableConstant` with the correct `BitWidth`?
The broadcast width matches the element width, we can't replace a VPADDD with a VPADDQ - but bitwise ops don't care (as long as we're not using the predicate mask which we aren't for these particular cases).


================
Comment at: llvm/lib/Target/X86/X86FixupVectorConstants.cpp:267
+  // Attempt to use smaller width broadcast-folds.
+  case X86::VANDPSZ128rm:
+  case X86::VANDPDZ128rm:
----------------
goldstein.w.n wrote:
> Can we convert `rm` ops that aren't in `evex` encoding? Or since is that dangerous in some way?
This pass is before the X86EvexToVex pass so we shouldn't encounter such cases.


================
Comment at: llvm/lib/Target/X86/X86FixupVectorConstants.cpp:370
+  case X86::VPTERNLOGQZrmi:
+    return ConvertToBroadcast(0, 0, X86::VPTERNLOGQZrmbi, X86::VPTERNLOGDZrmbi,
+                              0, 0, 3);
----------------
goldstein.w.n wrote:
> Why no attempt for `OpBcst256/128/16/8` on any of these? If they are truly unused can we remove this to kill the deadcode?
There are no OpBcst256/128/16/8 equivalent VPTERNLOG ops - we'd have to pull out the broadcast into a separate op, which isn't what we're after.

I can remove the deadcode if you think it important, but it will be readded in a follow up patch very soon.


================
Comment at: llvm/lib/Target/X86/X86FixupVectorConstants.cpp:372
+                              0, 0, 3);
+  }
+
----------------
goldstein.w.n wrote:
> do all the `OpBcst\d+` opcodes not always match? Seems to be the same for all of them. If so I think it would be alot simpler to try all `OpBcst` sizes < Native_BitWidth. You could have an overload if you have want to support only checking a specific width later.
I had wondered about trying to handle this inside lookupBroadcastFoldTable but I wasn't sure if it'd get confusing to have a memop for one element width returning a bcstop for another element width.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150526/new/

https://reviews.llvm.org/D150526



More information about the llvm-commits mailing list