[PATCH] D135656: [IR] Add nocapture to pointer parameters of masked stores/loads

Matt Devereau via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 17 05:03:27 PDT 2022


MattDevereau added inline comments.


================
Comment at: llvm/test/Transforms/InstCombine/load-store-masked-constant-array.ll:21
 }
 
 declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
----------------
benmxwl-arm wrote:
> MattDevereau wrote:
> > Do we not need an equivalent test for `expandload` and `compressstore` as we've added `nocapture` to those intrisics too? For example:
> > ```
> > define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
> >   %1 = alloca [10 x i64]
> >   call void @llvm.memcpy.p0.p0.i64(ptr %1, ptr @contant_int_array, i64 80, i1 false)
> >   %2 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull %1, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>, <10 x i64> zeroinitializer)
> >   call void @llvm.masked.compressstore.nxv10i64.p0(<10 x i64> %2, ptr %ptr, <10 x i1> <i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1, i1 1>)
> >   ret void
> > }
> > ```
> > Can you verify this test does not optimize away `alloca` and `memcpy` when `nocapture` is missing from the intrinsic definitions, and that it does optimize them away when `nocapture` is present from the intrisic definitions? Can you please also verify the correctness of optimizing `alloca` and `memcpy` away for this test?
> I get the same results for that test case (not optimized before, optimized after).
> 
> How would you like me to verify the correctness? The resulting code looks correct to me:
> 
> ```
> define void @combine_masked_expandload_compressstore_from_constant_array_2(ptr %ptr) {
>   %1 = call <10 x i64> @llvm.masked.expandload.v10i64(ptr nonnull @contant_int_array, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <10 x i64> zeroinitializer)
>   call void @llvm.masked.compressstore.v10i64(<10 x i64> %1, ptr %ptr, <10 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
>   ret void
> }
> ```
> 
> Assuming adding nocapture to these intrinsics is valid and the existing optimization is correct, I see no reason this change could generate invalid code.
If you agree the test is correct then feel free to go ahead and add it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135656/new/

https://reviews.llvm.org/D135656



More information about the llvm-commits mailing list