[llvm] [LLVM] Add `llvm.masked.compress` intrinsic (PR #92289)

Fri Jun 7 02:41:35 PDT 2024

================
@@ -7502,6 +7504,48 @@ LegalizerHelper::lowerShuffleVector(MachineInstr &MI) {
   return Legalized;
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::lowerMASKED_COMPRESS(llvm::MachineInstr &MI) {
+  auto [Dst, DstTy, Vec, VecTy, Mask, MaskTy] = MI.getFirst3RegLLTs();
+
+  MachinePointerInfo PtrInfo;
+  Register StackPtr =
+      createStackTemporary(TypeSize::getFixed(VecTy.getSizeInBytes()),
+                           getStackTemporaryAlignment(VecTy), PtrInfo)
+          .getReg(0);
+
+  LLT IdxTy = LLT::scalar(32);
+  LLT ValTy = VecTy.getElementType();
+  Align ValAlign = getStackTemporaryAlignment(ValTy);
+
+  Register OutPos = MIRBuilder.buildConstant(IdxTy, 0).getReg(0);
+
+  unsigned NumElmts = VecTy.getNumElements();
+  for (unsigned I = 0; I < NumElmts; ++I) {
+    auto Idx = MIRBuilder.buildConstant(IdxTy, I);
+    auto Val = MIRBuilder.buildExtractVectorElement(ValTy, Vec, Idx);
+    Register ElmtPtr = getVectorElementPointer(StackPtr, VecTy, OutPos);
+    MIRBuilder.buildStore(Val, ElmtPtr, PtrInfo, ValAlign);
+
+    if (I < NumElmts - 1) {
+      LLT MaskITy = MaskTy.getElementType();
+      auto MaskI = MIRBuilder.buildExtractVectorElement(MaskITy, Mask, Idx);
+      if (MaskITy.getSizeInBits() > 1)
+        MaskI = MIRBuilder.buildTrunc(LLT::scalar(1), MaskI);
+
+      MaskI = MIRBuilder.buildZExt(IdxTy, MaskI);
+      OutPos = MIRBuilder.buildAdd(IdxTy, OutPos, MaskI).getReg(0);
+    }
+  }
+
+  MachineFrameInfo &MFI = MI.getMF()->getFrameInfo();
+  MIRBuilder.buildLoad(Dst, StackPtr, PtrInfo,
+                       MFI.getObjectAlign(PtrInfo.StackID));
----------------
lawben wrote:

Just to validate my understanding: it is possible for `CreateStackObject` to change the alignment, so we technically have no guarantee that the alignment that we pass in is the stack's alignment that we actually use?

I've seen two other approaches to this in `LegalizerHelper`. 

1) pass the alignment to `createStackTemporary()` and then just reuse that alignment object later on (somewhat similar to my first approach but without the redundant call to get the alignment). If my understanding is correct, this could technically lead to wrong alignment if the requested alignment is larger than `MachineFrameInfo::StackAlignment`. 

2) a somewhat awkward `FIDef->getOperand(1).getIndex()` in `lowerMemset()`, which I think could easily break if something changes.
https://github.com/llvm/llvm-project/blob/d5fca0f83971c8a3b9aa4a34cb8301cd4e0bf363/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp#L8615

I think changing the API to make it easier to get the real alignment makes sense. But this is probably a separate PR, as this would mean each caller must think about what happens when the input alignment is `!=` the frame's alignment. In light of this, I'd suggest to use `VecAlign` here, as done in approach 1) with a TODO to fix this when the API changes.

https://github.com/llvm/llvm-project/pull/92289