[PATCH] D135656: [IR] Add nocapture to pointer parameters of masked stores/loads
Benjamin Maxwell via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 17 04:09:15 PDT 2022
benmxwl-arm added inline comments.
================
Comment at: llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll:21
}
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
----------------
MattDevereau wrote:
> Do we not need an equivalent test for `expandload` and `compressstore` as we've added `nocapture` to those intrisics too? For example:
> ```
> define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
> %1 = alloca [10 x i64]
> call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
> %2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
> call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
> ret void
> }
> ```
> Can you verify this test does not optimize away `alloca` and `memcpy` when `nocapture` is missing from the intrinsic definitions, and that it does optimize them away when `nocapture` is present from the intrisic definitions? Can you please also verify the correctness of optimizing `alloca` and `memcpy` away for this test?
I get the same results for that test case (not optimized before, optimized after).
How would you like me to verify the correctness? The resulting code looks correct to me:
```
define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
%1 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull @contant_int_array, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <10 x i64> zeroinitializer)
call void @llvm.masked.compressstore.v10i64(<10 x i64> %1, ptr %ptr, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
ret void
}
```
Assuming adding nocapture to these intrinsics is valid and the existing optimization is correct, I see no reason this change could generate invalid code.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135656/new/
https://reviews.llvm.org/D135656
More information about the llvm-commits
mailing list