[llvm] [LLVM] Add `llvm.masked.compress` intrinsic (PR #92289)
Lawrence Benson via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 25 01:11:59 PDT 2024
================
@@ -19234,6 +19234,78 @@ the follow sequence of operations:
The ``mask`` operand will apply to at least the gather and scatter operations.
+
+.. _int_masked_compress:
+
+'``llvm.masked.compress.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+LLVM provides an intrinsic for compressing data within a vector based on a selection mask.
+Semantically, this is similar to :ref:`llvm.masked.compressstore <int_compressstore>` but with weaker assumptions
+and without storing the results to memory, i.e., the data remains in the vector.
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected
+from an input vector and placed adjacently within the result vector. A mask defines which elements to collect from the vector.
+The remaining lanes are filled with values from ``passthru``.
+
+:: code-block:: llvm
+
+ declare <8 x i32> @llvm.masked.compress.v8i32(<8 x i32> <value>, <8 x i1> <mask>, <8 x i32> <passthru>)
+ declare <16 x float> @llvm.masked.compress.v16f32(<16 x float> <value>, <16 x i1> <mask>, <16 x float> undef)
+
+Overview:
+"""""""""
+
+Selects elements from input vector '``value``' according to the '``mask``'.
+All selected elements are written into adjacent lanes in the result vector, from lower to higher.
+The mask holds an entry for each vector lane, and is used to select elements to be kept.
+If ``passthru`` is undefined, the number of valid lanes is equal to the number of ``true`` entries in the mask, i.e., all lanes >= number-of-selected-values are undefined.
+If a ``passthru`` vector is given, all remaining lanes are filled with the corresponding lane's value from ``passthru``.
+The main difference to :ref:`llvm.masked.compressstore <int_compressstore>` is that the we do not need to guard against memory access for unselected lanes.
----------------
lawben wrote:
Ah, okay. I think my understanding of when a freeze is needed was wrong. In this case, I'm not sure we can specify one or the other behavior if a mask element is undef/poison. Depending on the code path, we perform a vec_reduce to get a passthru value or not. So our result depends on how vec_reduce behaves with undef/poison in addition to how our extracted mask element if frozen, at which point this probably gets weird to handle consistently.
Before changing the docs again: Is it valid for us to say "If the mask is undef or contains undef elements, the entire result is undef"? That would give us a bit of wiggle room in the implementation.
https://github.com/llvm/llvm-project/pull/92289
More information about the llvm-commits
mailing list