[PATCH] D135656: [IR] Add nocapture to pointer parameters of masked stores/loads
Matt Devereau via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 14 07:42:01 PDT 2022
MattDevereau added inline comments.
================
Comment at: llvm/include/llvm/IR/Intrinsics.td:1800
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
- [IntrReadMem, IntrArgMemOnly, IntrWillReturn, ImmArg<ArgIndex<1>>]>;
+ [IntrReadMem, IntrArgMemOnly, IntrWillReturn, ImmArg<ArgIndex<1>>, NoCapture<ArgIndex<0>>]>;
----------------
Trim line to be less than 80 chars. Breaking the 80+ character limit per line isn't enforced in .td files but similar def's near here seem to follow it.
================
Comment at: llvm/include/llvm/IR/Intrinsics.td:1831
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
- [IntrWriteMem, IntrArgMemOnly, IntrWillReturn]>;
+ [IntrWriteMem, IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<1>>]>;
----------------
Put `NoCapture<ArgIndex<1>>]>;` on a new line
================
Comment at: llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll:21
}
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
----------------
Do we not need an equivalent test for `expandload` and `compressstore` as we've added `nocapture` to those intrisics too? For example:
```
define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
%1 = alloca [10 x i64]
call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
%2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
ret void
}
```
Can you verify this test does not optimize away `alloca` and `memcpy` when `nocapture` is missing from the intrinsic definitions, and that it does optimize them away when `nocapture` is present from the intrisic definitions? Can you please also verify the correctness of optimizing `alloca` and `memcpy` away for this test?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135656/new/
https://reviews.llvm.org/D135656
More information about the llvm-commits
mailing list