[llvm] [IA][RISCV] Add support for vp.load/vp.store with shufflevector (PR #135445)
Min-Yih Hsu via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 16 12:10:51 PDT 2025
mshockwave wrote:
I've been looking into some potential correctness issue of this patch raised by @topperc previously. I thought it might be a good idea to write down some of the discussions here.
First, the motivation of this patch is that SLP is planning to use vp.load. Currently it will generate something like
``` llvm
%interleaved.vec = tail call <9 x i32> @llvm.vp.load.v9i32.p0(ptr %ptr, <9 x i1> <i1 1, i1 1, i1 0, i1 1, i1 1, i1 0, i1 1, i1 1, i1 0>, i32 9)
%v0 = shufflevector <9 x i32> %interleaved.vec, <9 x i32> poison, <3 x i32> <i32 0, i32 3, i32 6>
%v1 = shufflevector <9 x i32> %interleaved.vec, <9 x i32> poison, <3 x i32> <i32 1, i32 4, i32 7>
```
That is, we only extract the first two segments. All elements in the last segment is masked off in the load.
My current lowering would turn this into a segmented load with all ones mask, followed by extractions of only the first two segments. The problem here is that if we (segmented) load values with all ones mask on all 9 elements, we might trigger page fault on an address that would have been masked off. Most likely it would be a trialing address, like element 8 in the example above, which was masked off in the original vp.load.
Similar situation also happens on this:
``` llvm
%interleaved.vec = tail call <9 x i32> @llvm.vp.load.v9i32.p0(ptr %ptr, <9 x i1> <i1 0, i1 0, i1 0, i1 1, i1 1, i1 1, i1 0, i1 0, i1 0>, i32 9)
%v0 = shufflevector <9 x i32> %interleaved.vec, <9 x i32> poison, <3 x i32> <i32 0, i32 3, i32 6>
%v1 = shufflevector <9 x i32> %interleaved.vec, <9 x i32> poison, <3 x i32> <i32 1, i32 4, i32 7>
%v2 = shufflevector <9 x i32> %interleaved.vec, <9 x i32> poison, <3 x i32> <i32 2, i32 5, i32 8>
```
Where individual segment has a mask of `010`. But with the lowering of this patch, this code would be lowered to a vlseg3 with all ones mask -- the mismatching mask might trigger unwanted page faults.
So I think the conclusion is that we should propagate the mask. I already had rough a fix in my local tree.
That being said, I think propagating mask only fixes the second example. Because for the first example, the first two segments have masks of `111` while the third, unextracted one has a mask of `000`. @alexey-bataev do you have any thoughts on how to use segmented load to lower the first example?
https://github.com/llvm/llvm-project/pull/135445
More information about the llvm-commits
mailing list