[PATCH] D146737: [AMDGPU] Default component broadcast store

Fri Mar 24 08:43:39 PDT 2023

matejam added a comment.

Thanks for the review!

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:354
+// zeros at the end of the passed vector.
+static APInt findDemandedElts(Value *V, unsigned DMaskVal = 15) {
+  unsigned VWidth = cast<FixedVectorType>(V->getType())->getNumElements();
----------------
arsenm wrote:
> Can you use possiblyDemandedEltsInMask or something else from vector utils? This doesn't feel like something you would need to invent yourself
I thought about using that, but this is a bit more than that.
It doesn't just find which elements were set, but also takes into consideration the DMask for image intrinsics and also ignores zeros that were inserted at the end of the vector, which is the primary goal of this patch.
There is also findDemandedEltsByAllUsers, but that works in a different direction, meaning that it will find the uses of the given Value * and update DemandedElts that way.

In my case I look at the definitions (insertelement, shufflevector or ConstantVector) of the given Value * and "recursively" go up and update DemandedElts accordingly.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:359
+
+  std::vector<int> Components(VWidth, -1);
+  std::map<Value *, int> MapValueInt;
----------------
arsenm wrote:
> SmallVector
Will be done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:360
+  std::vector<int> Components(VWidth, -1);
+  std::map<Value *, int> MapValueInt;
+  int LastElem = 0;
----------------
arsenm wrote:
> Don't use std::map. Also, can this just be a SmallVector in the reverse direction per component?
I will look into that.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:380
+    }
+    if (CurInst->getOpcode() == Instruction::InsertElement) {
+      InsertElementInst *IEI = dyn_cast<InsertElementInst>(CurInst);
----------------
arsenm wrote:
> Replace the opcode check with the dyn_cast
Will be done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:395-396
+      FirstArg = CurInst->getOperand(0);
+    } else if (CurInst->getOpcode() == Instruction::ShuffleVector) {
+      ShuffleVectorInst *Shuffle = dyn_cast<ShuffleVectorInst>(CurInst);
+
----------------
arsenm wrote:
> Replace the opcode check with the dyn_cast
Will be done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp:1177
+    // that means that the intrinsic has already been optimized.
+    if (4 - APInt(4, DMaskVal).countLeadingZeros() > VWidth)
+      break;
----------------
arsenm wrote:
> Don't bother using APInt here?
I see I can do the same thing a lot more easier.
Like "if (DMaskVal > (1 << VWidth))"

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146737/new/

https://reviews.llvm.org/D146737