[PATCH] D146737: [AMDGPU] Default component broadcast store

Fri Mar 24 07:56:29 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:354
+// zeros at the end of the passed vector.
+static APInt findDemandedElts(Value *V, unsigned DMaskVal = 15) {
+  unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
----------------
Can you use possiblyDemandedEltsInMask or something else from vector utils? This doesn't feel like something you would need to invent yourself

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:356
+  unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
+  Instruction *CurInst = dyn_cast<Instruction>(V);
+  Value *FirstArg = CurInst;
----------------
This is an unchecked dyn_cast. You already did a dyn_cast at the call site, so just pass in Instruction to begin with?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:359
+
+  std::vector<int> Components(VWidth, -1);
+  std::map<Value *, int> MapValueInt;
----------------
SmallVector

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:360
+  std::vector<int> Components(VWidth, -1);
+  std::map<Value *, int> MapValueInt;
+  int LastElem = 0;
----------------
Don't use std::map. Also, can this just be a SmallVector in the reverse direction per component?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:380
+    }
+    if (CurInst->getOpcode() == Instruction::InsertElement) {
+      InsertElementInst *IEI = dyn_cast<InsertElementInst>(CurInst);
----------------
Replace the opcode check with the dyn_cast

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:395-396
+      FirstArg = CurInst->getOperand(0);
+    } else if (CurInst->getOpcode() == Instruction::ShuffleVector) {
+      ShuffleVectorInst *Shuffle = dyn_cast<ShuffleVectorInst>(CurInst);
+
----------------
Replace the opcode check with the dyn_cast

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:1177
+    // that means that the intrinsic has already been optimized.
+    if (4 - APInt(4, DMaskVal).countLeadingZeros() > VWidth)
+      break;
----------------
Don't bother using APInt here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146737/new/

https://reviews.llvm.org/D146737