[llvm] [Intrinsics][AArch64] Add intrinsic to mask off aliasing vector lanes (PR #117007)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 10 00:21:37 PST 2025
================
@@ -23624,6 +23624,92 @@ Examples:
%active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 %elem0, i64 429)
%wide.masked.load = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* %3, i32 4, <4 x i1> %active.lane.mask, <4 x i32> poison)
+.. _int_experimental_get_noalias_lane_mask:
+
+'``llvm.experimental.get.noalias.lane.mask.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+ declare <4 x i1> @llvm.experimental.get.noalias.lane.mask.v4i1(ptr %ptrA, ptr %ptrB, i64 immarg %elementSize, i1 immarg %writeAfterRead)
----------------
davemgreen wrote:
One of the places whilewr might be most useful would be in checking that there is no aliasing in the standard way that we check for aliasing in vector loops. i.e what gcc uses them for in https://godbolt.org/z/b5o3KEaod, it replaces a sub+cmp so long as the whilewr is quick enough. If we could just pattern recognise those, that would be ideal. Unfortunately it doesn't mathematically work for arbitrary i64 values because of the overflow issue. (The instruction uses an i65 internal subtract).
I'm not sure about the general idea of using these whiles to update a mask that is used in the loop, if we try to do that for any loop that we don't know aliases. If it adds extra instructions to the loop body and the actual chance of the aliasing overlapping is very low - it might not be worth it compared to just having alias checks outside the loop. I may be mistaken about how useful it is, but perhaps it is worth investigating it further before we decide anything here.
For the intrinsic used for the simpler alias checks, whether that is using these or something else, it is generally useful to have fewer instructions to make cost-modelling simpler, and it would go good if after they are introduced (during LTO for example) we can constant-fold "noalias" pointers. i.e if we later find that the two pointers are noalias, that should imply that the mask is all-true or all-false.
https://github.com/llvm/llvm-project/pull/117007
More information about the llvm-commits
mailing list