[PATCH] D27919: [Loop Vectorizer] Interleave vs Gather - in some cases Gather is better.

Tue Jan 17 01:59:34 PST 2017

delena marked an inline comment as done.
delena added a comment.

In https://reviews.llvm.org/D27919#644255, @mssimpso wrote:

> Hi Elena,
>
> This patch causes a crash in spec2006/povray on AArch64. I've pasted a test case over at https://reviews.llvm.org/P7951. The problem has to do with the analysis in collectLoopUniforms and the new decision to scalarize. collectLoopUniforms is very conservative about what instructions remain uniform after vectorization. If a memory access has the possibility of being scalarized (even though it may not be), it's pointer operand is not marked uniform. What's happening here is that you've introduced a new scalarization decision based on the cost model that collectLoopUniforms doesn't know about (and  likely can't know about). In the test case, collectLoopUniforms marks the pointer operands of the loads uniform even though the loads are now scalarized, which causes the pointer operands to be non-uniform.
>
> I'm not really sure what the best way to fix this would be. Any thoughts?  One way would be to remove scalarization from your widening decision.

I'm trying to revisit collections of Uniforms and Scalars after cost estimation and remove GEPs and induction variables if we decided to scalarize. Not finished yet..

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1702
+  /// Decision that was taken during cost calculation for memory instruction.
+  enum InstWidening {
+    LV_NONE,
----------------
mkuper wrote:
> I'm not a fan of the name of this enum, but don't really have any good ideas. Anyone else?
CM_DECISION_NONE,
CM_DECISION_WIDEN .. ?

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:1725
+    for (unsigned i = 0; i < Grp->getFactor(); ++i) {
+      if (auto *I = Grp->getMember(i))
+        WideningDecisions[std::make_pair(I, VF)] = W;
----------------
mkuper wrote:
> This looks like a suboptimal way to iterate over InterleaveGroup, because it requires a lookup for every index, instead of just iterating above the Members collection. (we don't care about the order here, right?)
> Perhaps add a better interface?
I agree, but iteration inside Factor is not a big overhead and it used in one more place, at least. This patch is complex enough, I'd postpone unrelated changes to another patch.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:2891
+
   Type *ScalarDataTy = LI ? LI->getType() : SI->getValueOperand()->getType();
   Type *DataTy = VectorType::get(ScalarDataTy, VF);
----------------
mkuper wrote:
> assert(!Legal->memoryInstructionMustBeScalarized(Instr, VF)), maybe?
> 
> Because now we leave the door open for the cost model decision being to widen, even though memoryInstructionMustBeScalarized(). 
> Another option is to replace the if with:
> 
> ```
> if (Decision == LoopVectorizationLegality::LV_SCALARIZE ||     
>     Legal->memoryInstructionMustBeScalarized(Instr, VF))
> ```
> 
> But I'm not sure it makes sense, because there's no good reason to always call memoryInstructionMustBeScalarized() in a non-asserts build - in theory, the cost model should have already checked this.
The form I used just prevents a redundant call to memoryInstructionMustBeScalarized(). Because we may have a decision (INTERLEAVE, WIDEN or GATHER_SCATTER) at this stage.

================
Comment at: ../lib/Transforms/Vectorize/LoopVectorize.cpp:7168
+        // Only calculate the cost once for the group.
+        InterleaveGrpCost = getInterleaveGroupCost(I, VF);
+
----------------
mkuper wrote:
> Wouldn't you calculate it several times per group if it's unprofitable?
LV_NONE means "not calculated yet". If we are here, each memory instruction will have " a decision".

================
Comment at: ../test/Analysis/CostModel/X86/interleave-load-i32.ll:13
 ;CHECK-LABEL: load_i32_interleave4
-;CHECK: Found an estimated cost of 1 for VF 1 For instruction:   %0 = load
-;CHECK: Found an estimated cost of 5 for VF 2 For instruction:   %0 = load
----------------
mkuper wrote:
> What happened to the VF = 1 cost?
There is no group for VF 1. Instruction cost is still there.

Repository:
  rL LLVM

https://reviews.llvm.org/D27919