[llvm-dev] masked-load endpoints optimization

Nema, Ashutosh via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 10 21:22:09 PST 2016


This looks interesting, the main motivation appears to be replacing masked vector load with a general vector load followed by a select.

Observed masked vector loads are in general expensive in comparison with a vector load.

But if first & last element of a masked vector load are guaranteed to be accessed then it can be transformed to a vector load.

In opt this can be driven by TTI, where the benefit of this transformation should be checked.

Regards,
Ashutosh

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Sanjay Patel via llvm-dev
Sent: Friday, March 11, 2016 3:37 AM
To: llvm-dev
Subject: [llvm-dev] masked-load endpoints optimization

If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load?

"The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to masked-off lanes."

I think the fact that we're loading the endpoints of the vector guarantees that a full vector load can't have any different faulting/exception behavior on x86 and most (?) other targets. We would, however, be reading memory that the program has not explicitly requested.
IR example:

define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
  ; load the first and last elements pointed to by %addr and shuffle those into %v
  %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>* %addr, i32 4, <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
  ret <4 x i32> %res
}
would become something like:

define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
  %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
  %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %vecload, <4 x i32> %v
  ret <4 x i32> %sel
}
If this isn't valid as an IR optimization, would it be acceptable as a DAG combine with target hook to opt in?

[1] http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160311/13463579/attachment-0001.html>


More information about the llvm-dev mailing list