[PATCH] [X86] tranform insertps to blendps when possible for better performance
Sanjay Patel
spatel at rotateright.com
Sun Mar 1 09:24:50 PST 2015
In http://reviews.llvm.org/D7866#131618, @chandlerc wrote:
> I'm not really sold on doing this as a target combine. Is there some reason we can't just produce the desired insertps or blendps when lowering? This doesn't seem likely to come up only after doing some other shuffle lowering, but maybe I'm not seeing why.
Let me answer this question first before going to wrangle up some data on the other question:
I made this a target combine because I don't know how else to handle this case given our current intrinsic lowering:
define <4 x float> @blendps(<4 x float> %x, <4 x float> %y) {
%0 = tail call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %x, <4 x float> %y, i8 0)
ret <4 x float> %0
}
This is the IR produced when a programmer uses SSE intrinsics in C source. It directly becomes an INSERTPS node via:
X86_INTRINSIC_DATA(sse41_insertps, INTR_TYPE_3OP, X86ISD::INSERTPS, 0)
http://reviews.llvm.org/D7866
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list