[llvm] [LLVM] Add `llvm.masked.compress` intrinsic (PR #92289)

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 7 01:26:29 PDT 2024


================
@@ -7502,6 +7504,48 @@ LegalizerHelper::lowerShuffleVector(MachineInstr &MI) {
   return Legalized;
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::lowerMASKED_COMPRESS(llvm::MachineInstr &MI) {
+  auto [Dst, DstTy, Vec, VecTy, Mask, MaskTy] = MI.getFirst3RegLLTs();
+
+  MachinePointerInfo PtrInfo;
+  Register StackPtr =
+      createStackTemporary(TypeSize::getFixed(VecTy.getSizeInBytes()),
+                           getStackTemporaryAlignment(VecTy), PtrInfo)
+          .getReg(0);
+
+  LLT IdxTy = LLT::scalar(32);
+  LLT ValTy = VecTy.getElementType();
+  Align ValAlign = getStackTemporaryAlignment(ValTy);
+
+  Register OutPos = MIRBuilder.buildConstant(IdxTy, 0).getReg(0);
+
+  unsigned NumElmts = VecTy.getNumElements();
+  for (unsigned I = 0; I < NumElmts; ++I) {
+    auto Idx = MIRBuilder.buildConstant(IdxTy, I);
+    auto Val = MIRBuilder.buildExtractVectorElement(ValTy, Vec, Idx);
+    Register ElmtPtr = getVectorElementPointer(StackPtr, VecTy, OutPos);
+    MIRBuilder.buildStore(Val, ElmtPtr, PtrInfo, ValAlign);
+
+    if (I < NumElmts - 1) {
+      LLT MaskITy = MaskTy.getElementType();
+      auto MaskI = MIRBuilder.buildExtractVectorElement(MaskITy, Mask, Idx);
+      if (MaskITy.getSizeInBits() > 1)
+        MaskI = MIRBuilder.buildTrunc(LLT::scalar(1), MaskI);
+
+      MaskI = MIRBuilder.buildZExt(IdxTy, MaskI);
+      OutPos = MIRBuilder.buildAdd(IdxTy, OutPos, MaskI).getReg(0);
+    }
+  }
+
+  MachineFrameInfo &MFI = MI.getMF()->getFrameInfo();
+  MIRBuilder.buildLoad(Dst, StackPtr, PtrInfo,
+                       MFI.getObjectAlign(PtrInfo.StackID));
----------------
arsenm wrote:

The StackID is not the frame index, This just happens to work because the default stack ID is 0 and the one object you have is also a 0. The ABI for this is unfortunately awkward.

The API here could probably use work. createStackTemporary returns the G_FRAME_INDEX which contains the index you really want. We could maybe return the instruction and the actual index as a pair from createStackTemporary? 


https://github.com/llvm/llvm-project/pull/92289


More information about the llvm-commits mailing list