[PATCH] D135656: [IR] Add nocapture to pointer parameters of masked stores/loads

Fri Oct 14 07:42:01 PDT 2022

MattDevereau added inline comments.

================
Comment at: llvm/include/llvm/IR/Intrinsics.td:1800
              LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
-            [IntrReadMem, IntrArgMemOnly, IntrWillReturn, ImmArg<ArgIndex<1>>]>;
+            [IntrReadMem, IntrArgMemOnly, IntrWillReturn, ImmArg<ArgIndex<1>>, NoCapture<ArgIndex<0>>]>;

----------------
Trim line to be less than 80 chars. Breaking the 80+ character limit per line isn't enforced in .td files but similar def's near here seem to follow it.

================
Comment at: llvm/include/llvm/IR/Intrinsics.td:1831
              LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
-            [IntrWriteMem, IntrArgMemOnly, IntrWillReturn]>;
+            [IntrWriteMem, IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<1>>]>;

----------------
Put `NoCapture<ArgIndex<1>>]>;` on a new line

================
Comment at: llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll:21
 }

 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
----------------
Do we not need an equivalent test for `expandload` and `compressstore` as we've added `nocapture` to those intrisics too? For example:
```
define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
  %1 = alloca [10 x i64]
  call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
  %2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
  call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
  ret void
}
```
Can you verify this test does not optimize away `alloca` and `memcpy` when `nocapture` is missing from the intrinsic definitions, and that it does optimize them away when `nocapture` is present from the intrisic definitions? Can you please also verify the correctness of optimizing `alloca` and `memcpy` away for this test?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135656/new/

https://reviews.llvm.org/D135656