[PATCH] D135656: [IR] Add nocapture to pointer parameters of masked stores/loads

Benjamin Maxwell via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 17 04:09:15 PDT 2022


benmxwl-arm added inline comments.


================
Comment at: llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll:21
 }
 
 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
----------------
MattDevereau wrote:
> Do we not need an equivalent test for `expandload` and `compressstore` as we've added `nocapture` to those intrisics too? For example:
> ```
> define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
>   %1 = alloca [10 x i64]
>   call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
>   %2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
>   call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
>   ret void
> }
> ```
> Can you verify this test does not optimize away `alloca` and `memcpy` when `nocapture` is missing from the intrinsic definitions, and that it does optimize them away when `nocapture` is present from the intrisic definitions? Can you please also verify the correctness of optimizing `alloca` and `memcpy` away for this test?
I get the same results for that test case (not optimized before, optimized after).

How would you like me to verify the correctness? The resulting code looks correct to me:

```
define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
  %1 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull @contant_int_array, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <10 x i64> zeroinitializer)
  call void @llvm.masked.compressstore.v10i64(<10 x i64> %1, ptr %ptr, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
  ret void
}
```

Assuming adding nocapture to these intrinsics is valid and the existing optimization is correct, I see no reason this change could generate invalid code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135656/new/

https://reviews.llvm.org/D135656



More information about the llvm-commits mailing list