[llvm-dev] masked-load endpoints optimization

Sanjay Patel via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 10 14:06:52 PST 2016


If we're loading the first and last elements of a vector using a masked
load [1], can we replace the masked load with a full vector load?

"The result of this operation is equivalent to a regular vector load
instruction followed by a ‘select’ between the loaded and the passthru
values, predicated on the same mask. However, using this intrinsic prevents
exceptions on memory access to masked-off lanes."

I think the fact that we're loading the endpoints of the vector guarantees
that a full vector load can't have any different faulting/exception
behavior on x86 and most (?) other targets. We would, however, be reading
memory that the program has not explicitly requested.

IR example:

define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
  ; load the first and last elements pointed to by %addr and shuffle those
into %v
  %res = call <4 x i32> @llvm.masked.load.v4i32(<4 x i32>* %addr, i32 4, <4
x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %v)
  ret <4 x i32> %res
}

would become something like:

define <4 x i32> @maskedload_endpoints(<4 x i32>* %addr, <4 x i32> %v) {
  %vecload = load <4 x i32>, <4 x i32>* %addr, align 4
  %sel = select <4 x i1> <i1 1, i1 0, i1 0, i1 1>, <4 x i32> %vecload, <4 x
i32> %v
  ret <4 x i32> %sel
}

If this isn't valid as an IR optimization, would it be acceptable as a DAG
combine with target hook to opt in?

[1] http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/749fad7c/attachment.html>


More information about the llvm-dev mailing list