[clang] [llvm] [AMDGPU] Restrict `@llvm.amdgcn.image.*` intrinsic's dmask argument (PR #179511)

Thu Feb 12 07:04:28 PST 2026

================
@@ -7147,6 +7147,40 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
           &Call, Op);
     break;
   }
+  case Intrinsic::amdgcn_image_load_1d:
+  case Intrinsic::amdgcn_image_load_1darray:
+  case Intrinsic::amdgcn_image_load_2d:
+  case Intrinsic::amdgcn_image_load_2darray:
+  case Intrinsic::amdgcn_image_load_2darraymsaa:
+  case Intrinsic::amdgcn_image_load_2dmsaa:
+  case Intrinsic::amdgcn_image_load_3d:
+  case Intrinsic::amdgcn_image_load_cube:
+  case Intrinsic::amdgcn_image_load_mip_1d:
+  case Intrinsic::amdgcn_image_load_mip_1darray:
+  case Intrinsic::amdgcn_image_load_mip_2d:
+  case Intrinsic::amdgcn_image_load_mip_2darray:
+  case Intrinsic::amdgcn_image_load_mip_3d:
+  case Intrinsic::amdgcn_image_load_mip_cube: {
+    // LLVM IR definition of those intrinsics says that they can return any
+    // type. The logic below is based on what is covered by the
+    // llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll test.
+    Type *T = Call.getType();
+    if (auto *ST = dyn_cast<StructType>(Call.getType()))
+      T = ST->getElementType(0);
+
+    unsigned VWidth = 1;
+    if (auto *FVT = dyn_cast<FixedVectorType>(T))
+      VWidth = FVT->getNumElements();
+
+    Value *V = Call.getArgOperand(0);
+    unsigned DMask = cast<ConstantInt>(V)->getZExtValue();
+    unsigned NumActiveBits = popcount(DMask);
+    Check(NumActiveBits <= VWidth,
----------------
jayfoad wrote:

> > Should probably check that it is ==, not just <=.
> 
> Why would we mandate that the mask is always all-ones, or why would we even have it as an explicit intrinsic argument then?

dmask lets you select which components you want returned (for a load), e.g. you could set it to 0x9 to select components 0 and 3 (e.g. R and A from an RGBA format). Then the load instruction will return those two values to two consecutive VGPRs, so _normal_ intrinsic usage would be to have a `<2 x float>` return type. So here I am suggesting we should check `popcount(9) == 2`. However...

> But my main counter-argument would be that we actually use those intrinsics with a dmask that is not all ones [in `OCLK` at `ROCm/llvm-project`](https://github.com/ROCm/llvm-project/blob/d21b12e24eb12832200f484c996f25f7ee6986c0/amd/device-libs/ockl/src/extended-image-intrinsics.ll#L413).

... yeah, I guess the intrinsics have always been more lenient, we probably can't just break that code.

> However, I do think that just checking number of active bits is not exactly correct: considering that the dmask is a 4 bit value, we should probably limit the mask to those 4 bits (or smaller depending on the return type) - similar to what I'd done in the clang patch.

Well, setting any bits other than the low 4 bits could be a separate verifier error.

https://github.com/llvm/llvm-project/pull/179511