[PATCH] D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.

Dinar Temirbulatov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 30 07:32:02 PST 2018


dtemirbulatov added inline comments.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3657
+  for (Value *V : VL)
+    if (!extendSchedulingRegion(V, S.OpValue))
       return false;
----------------
ABataev wrote:
> dtemirbulatov wrote:
> > ABataev wrote:
> > > Seems to me, we should extend scheduling region only for instructions with the same or alternate opcode. Maybe, it will resolve all your problems.
> > No, in that case, we would not be able to extend scheduling region with non-alternative opcodes.
> Did you try that? Actually, we're not going to vectorize instructions with the different opcodes. So, maybe, we should exclude them from the scheduling region?
oh, I see what you mean now.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4079
+          LastScheduledInst = pickedInst;
+          std::set<Instruction *> Reschedule;
+          Reschedule.insert(pickedInst);
----------------
ABataev wrote:
> dtemirbulatov wrote:
> > ABataev wrote:
> > > 1. Use `SmallSet` instead
> > > 2. Why you can't use `SmallVector` instead?
> > Replaced to SmallSet, We could not use Vector here since it is not possible to have multiple entities to a single Instruction.
> The main problem with this code that you're checking only one level of dependency. What if you have dependency deeper, in 2, 3 or more level? Will it work? The code itself is very complex and, if we're going to keep this solution, must be outlined in a separate function.
yes, I thought the same, I have another heavier version that utilizes Dependancy Analyst, but I have not seen yet such an example where current implementation is not enough. I have 2 or 3 testcases for the issue so far. I will think again about the problem. 


================
Comment at: test/Transforms/SLPVectorizer/X86/internal-dep.ll:10
+; We are tring to arrange this bundle with an internal dependancy:
+; [  %sub.i = add nsw i32 undef, -1;  %shr.i.i = lshr i32 %conv31.i, 1;  %shr.1.i.i = lshr i32 %conv31.i, 2;  %shr.2.i.i = lshr i32 %conv31.i, 3;  %shr.3.i.i = lshr i32 %conv31.i, 4;  %shr.4.i.i = lshr i32 %conv31.i, 5;  %shr.5.i.i = lshr i32 %conv31.i, 6;  %shr.6.i.i = lshr i32 %conv31.i, 7;  %shr.7.i.i = lshr i32 %conv31.i, 8;  %shr.8.i.i = lshr i32 %conv31.i, 9;  %shr.9.i.i = lshr i32 %conv31.i, 10;  %shr.10.i.i = lshr i32 %conv31.i, 11;  %shr.11.i.i = lshr i32 %conv31.i, 12;  %shr.12.i.i = lshr i32 %conv31.i, 13;  %shr.13.i.i = lshr i32 %conv31.i, 14;  %shr.14.i.i = lshr i32 %conv31.i, 15]
+; There is a data dependancy for conv31.i that uses sub.i and other operations that use conv31.i.
----------------
ABataev wrote:
> That's why I suggest to try to exclude this instruction from the scheduling. Actually, we should not schedule `%sub.i`, we should schedule pseudo instruction `lshr i32 %sub.i, 0`. Shall we include it into scheduling?
ok, Thanks


https://reviews.llvm.org/D28907





More information about the llvm-commits mailing list