[clang] [llvm] [AMDGPU] Restrict `@llvm.amdgcn.image.*` intrinsic's dmask argument (PR #179511)
Jay Foad via cfe-commits
cfe-commits at lists.llvm.org
Thu Feb 12 07:04:28 PST 2026
================
@@ -7147,6 +7147,40 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
&Call, Op);
break;
}
+ case Intrinsic::amdgcn_image_load_1d:
+ case Intrinsic::amdgcn_image_load_1darray:
+ case Intrinsic::amdgcn_image_load_2d:
+ case Intrinsic::amdgcn_image_load_2darray:
+ case Intrinsic::amdgcn_image_load_2darraymsaa:
+ case Intrinsic::amdgcn_image_load_2dmsaa:
+ case Intrinsic::amdgcn_image_load_3d:
+ case Intrinsic::amdgcn_image_load_cube:
+ case Intrinsic::amdgcn_image_load_mip_1d:
+ case Intrinsic::amdgcn_image_load_mip_1darray:
+ case Intrinsic::amdgcn_image_load_mip_2d:
+ case Intrinsic::amdgcn_image_load_mip_2darray:
+ case Intrinsic::amdgcn_image_load_mip_3d:
+ case Intrinsic::amdgcn_image_load_mip_cube: {
+ // LLVM IR definition of those intrinsics says that they can return any
+ // type. The logic below is based on what is covered by the
+ // llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll test.
+ Type *T = Call.getType();
+ if (auto *ST = dyn_cast<StructType>(Call.getType()))
+ T = ST->getElementType(0);
+
+ unsigned VWidth = 1;
+ if (auto *FVT = dyn_cast<FixedVectorType>(T))
+ VWidth = FVT->getNumElements();
+
+ Value *V = Call.getArgOperand(0);
+ unsigned DMask = cast<ConstantInt>(V)->getZExtValue();
+ unsigned NumActiveBits = popcount(DMask);
+ Check(NumActiveBits <= VWidth,
----------------
jayfoad wrote:
> > Should probably check that it is ==, not just <=.
>
> Why would we mandate that the mask is always all-ones, or why would we even have it as an explicit intrinsic argument then?
dmask lets you select which components you want returned (for a load), e.g. you could set it to 0x9 to select components 0 and 3 (e.g. R and A from an RGBA format). Then the load instruction will return those two values to two consecutive VGPRs, so _normal_ intrinsic usage would be to have a `<2 x float>` return type. So here I am suggesting we should check `popcount(9) == 2`. However...
> But my main counter-argument would be that we actually use those intrinsics with a dmask that is not all ones [in `OCLK` at `ROCm/llvm-project`](https://github.com/ROCm/llvm-project/blob/d21b12e24eb12832200f484c996f25f7ee6986c0/amd/device-libs/ockl/src/extended-image-intrinsics.ll#L413).
... yeah, I guess the intrinsics have always been more lenient, we probably can't just break that code.
> However, I do think that just checking number of active bits is not exactly correct: considering that the dmask is a 4 bit value, we should probably limit the mask to those 4 bits (or smaller depending on the return type) - similar to what I'd done in the clang patch.
Well, setting any bits other than the low 4 bits could be a separate verifier error.
https://github.com/llvm/llvm-project/pull/179511
More information about the cfe-commits
mailing list