[llvm] [IR] Split vector.splice into vector.splice.down and vector.splice.up (PR #170796)

Mon Dec 8 02:00:04 PST 2025

================
@@ -20729,30 +20729,79 @@ Arguments:
 All arguments must be vectors of the same type whereby their logical
 concatenation matches the result type.
 
-'``llvm.vector.splice``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.vector.splice.down``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <2 x double> @llvm.vector.splice.down.v2f64(<2 x double> %vec1, <2 x double> %vec2, i32 %imm)
+      declare <vscale x 4 x i32> @llvm.vector.splice.down.nxv4i32(<vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, i32 %imm)
+
+Overview:
+"""""""""
+
+The '``llvm.vector.splice.down.*``' intrinsics construct a vector by
+concatenating two vectors together, shifting the elements down by ``imm``, and
+extracting the lower half.
+
+This is equivalent to :ref:`llvm.fshr.* <int_fshr>`, but operating on elements
+instead of bits.
+
+These intrinsics work for both fixed and scalable vectors. While this intrinsic
+supports all vector types the recommended way to express this operation for
+fixed-width vectors is still to use a shufflevector, as that may allow for more
+optimization opportunities.
+
+For example:
+
+.. code-block:: text
+
+ llvm.vector.splice.down(<A,B,C,D>, <E,F,G,H>, 1);
+		     ==> <A,B,C,D,E,F,G,H>
+		     ==> <B,C,D,E,F,G,H,_>
+		     ==> <B,C,D,E>
+
+
+Arguments:
+""""""""""
+
+The first two operands are vectors with the same type. The start index is imm
+modulo the runtime number of elements in the source vector. For a fixed-width
----------------
lukel97 wrote:

Good point, I'll remove allowing "runtime number of elements" for vector.splice.left from this PR. vector.splice.right still needs to permit it though since we previously allowed -N as a valid immediate. 

> I do think we should be more cautious before introducing this flexibility, especially if the goal is to support non-immediate indices because the original definition means an and is sufficient to mask the index but with this change we'll need a max operation. 

It looks like we never need to mask off the index because the verifier doesn't allow any immediates that would need masked to begin with. This includes scalable vectors because we enforce that `-vscale_range_min <= Imm < vscale_range_min`. 

My plan was that in the PR to allow variable offsets we would relax this to "immediates that exceed the runtime length of the vector are poison", which would include removing the vscale_range_min check. 

> Not sure, but perhaps allowing 0 for the shift.right case might also be problematic?

I don't think we can disallow 0 if we allow variable offsets. We allow it for splice.left and optimize it to just the left operand in `SelectionDAG::getNode`, so this PR handles it for splice.right too by optimizing it to the right operand.

https://github.com/llvm/llvm-project/pull/170796