[PATCH] D17490: [InstCombine][SSE] Demanded vector elements for scalar intrinsics
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 18 10:39:05 PDT 2016
spatel added a comment.
> 4 - addss gets simplified to a fadd call if we aren't interested in the pass through elements (can we do this for div/sqrt?)
I think so - please add a TODO comment. But this change really should be a separate patch review because it has FP exception behavior implications, doesn't it?
> I'm happy to split this into multiple commits if anyone thinks its necessary.
Yes, this should definitely be split up for committing for easier tracking and reversion if needed. I'd argue that it should be done for review purposes too as there are so many test cases to consider.
================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1243-1258
@@ -1236,1 +1242,18 @@
+ case Intrinsic::x86_sse_add_ss:
+ case Intrinsic::x86_sse_sub_ss:
+ case Intrinsic::x86_sse_mul_ss:
+ case Intrinsic::x86_sse_div_ss:
+ case Intrinsic::x86_sse_min_ss:
+ case Intrinsic::x86_sse_max_ss:
+ case Intrinsic::x86_sse_cmp_ss:
+ case Intrinsic::x86_sse2_add_sd:
+ case Intrinsic::x86_sse2_sub_sd:
+ case Intrinsic::x86_sse2_mul_sd:
+ case Intrinsic::x86_sse2_div_sd:
+ case Intrinsic::x86_sse2_min_sd:
+ case Intrinsic::x86_sse2_max_sd:
+ case Intrinsic::x86_sse2_cmp_sd: {
+ // These intrinsics only demand the lowest element of the second input
+ // vector.
+ Value *Arg1 = II->getArgOperand(1);
----------------
Don't these also demand the lowest element of the first input vector?
================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1275-1278
@@ +1274,6 @@
+ unsigned VWidth = Arg0->getType()->getVectorNumElements();
+ if (Value *V = SimplifyDemandedVectorEltsHigh(Arg0, VWidth, VWidth - 1)) {
+ II->setArgOperand(0, V);
+ return II;
+ }
+ if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
----------------
Instead of returning immediately, isn't it more efficient to fall-through to the check of the 2nd arg? Ie, replace both args in one shot if possible. That's what the code in SimplifyDemandedVectorElts() is doing if I'm reading it correctly.
Repository:
rL LLVM
http://reviews.llvm.org/D17490
More information about the llvm-commits
mailing list