[llvm] [SLP] Sort PHIs by ExtractElements when relevant (PR #131229)

Mon Mar 17 10:41:52 PDT 2025

================
@@ -22690,8 +22690,40 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
           if (NodeI1 != NodeI2)
             return NodeI1->getDFSNumIn() < NodeI2->getDFSNumIn();
           InstructionsState S = getSameOpcode({I1, I2}, *TLI);
-          if (S && !S.isAltShuffle())
+          if (S && !S.isAltShuffle()) {
+            const auto *E1 = dyn_cast<ExtractElementInst>(I1);
+            const auto *E2 = dyn_cast<ExtractElementInst>(I2);
+            if (!E1 || !E2)
+              continue;
+
+            // Sort on ExtractElementInsts primarily by vector operands. Prefer
+            // program order of the vector operands.
+            if (E1->getVectorOperand() != E2->getVectorOperand()) {
+              const Instruction *V1 =
+                  dyn_cast<Instruction>(E1->getVectorOperand());
+              const Instruction *V2 =
+                  dyn_cast<Instruction>(E2->getVectorOperand());
+              if (!V1 || !V2)
+                continue;
+              if (V1->getParent() != V2->getParent())
+                continue;
+              return V1->comesBefore(V2);
----------------
jrbyrnes wrote:

This is important handling for my test case -- and does result in better code for that example. 

In AMDGPU/extract-ordering.ll we have %a_thread_buf0 occuring late in the PHI sequence. If we are unable to sort by the vector operand, we will not get the good vectorization as that PHI will not be vectorized with %a_thread_buf1.

I wonder why you are concerned it is not safe -- I see similar handling in getReorderingData::PHICompare? Maybe we can find a good alternative solution?

https://github.com/llvm/llvm-project/pull/131229