[PATCH] D49491: [RFC][VPlan, SLP] Add simple SLP analysis on top of VPlan.

Mon Aug 6 22:18:04 PDT 2018

dcaballe added a comment.

Thanks for working on this, Florian, and sorry for my delayed response. I added some initial comments. I'll come back soon.

In https://reviews.llvm.org/D49491#1184938, @fhahn wrote:

> Ping. Thanks for all the comments so far and the initial code review.
>
> What do you think is the best way going forward overall: 1) evolving loop aware SLP initially in VPlan (independent of SLPVectorizer) and start using VPlan infrastructure for SLPVectorizer as it becomes available or 2) trying to share as much code between both from the start? Personally I think 1) would be better, as it allows us to evolve the VPlan parts quicker and I would like to avoid unnecessary code churn in SLPVectorizer and start using VPlan infrastructure there as it becomes clearly beneficial.

I agree with you. I think it's better to have an initial end-to-end loop aware SLP PoC, properly define the representation of SLP operations, define how SLP transformation comes into play and interacts with the rest of the VPlan infrastructure, etc.,  before we move forward on SLPVectorizer. In the VPlan native path we have some room for experimentation. We can take advantage of that and start with SLPVectorizer once we have a solid model for SLP in VPlan.

> In https://reviews.llvm.org/D49491#1184938, @fhahn wrote:
> 
>> In the long term, I think there is potential to share infrastructure with the SLPVectorizer on different levels: for example, we could potentially re-use VPlan based code generation, if the SLPVectorizer would emit VPInstructions; share the infrastructure to discover consecutive loads/stores; share different combination algorithms between VPlan and SLPVectorizer; potentially re-use part of the VPlan based cost model. One thing I want to try out is how easy/hard it would be to make the analysis work on VPInstruction and Instruction based basic blocks.
> 
> 
> ! In https://reviews.llvm.org/D49491#1166978, @rengolin wrote:
>  This is why we haven't gone too deep in the analysis, yet. Sharing code between SLPVec, which operates in IR, and VPlanSLP, which operates in VInstructions, can be confusing, limiting or even not possible. The analysis part mostly works, because VPInstruction is similar enough to Inst, but implementation and, worse, heuristics, can go wrong very quickly.

In terms of re-use, Value/VPValue templatization could work for some analyses. Templatizing IRBuilder/VPlanBuilder might also work for some parts of the implementation. However, we may want to think carefully. If for templatization to work we needed the whole VPValue hierarchy and internal details to be a carbon copy of Value hierarchy, we would be losing all the flexibility that VPValue/VPInstruction are bringing to VPlan. Of course, we would have to investigate this a bit more. Another alternative that I had in mind is introducing some kind of InstructionTraits. That would give us an unified interface while allowing more flexibility at VPValue/Value implementation level. Unfortunately, we haven't had have time to investigate this approach further.

================
Comment at: lib/Transforms/Vectorize/VPlan.h:592
+    CombinedLoad,
+    CombinedStore,
+  };
----------------
To be more specific, what about SLPLoad/SLPStore instead?

================
Comment at: lib/Transforms/Vectorize/VPlan.h:627
   unsigned getOpcode() const { return Opcode; }
+  void setOpcode(unsigned Op) { Opcode = Op; }

----------------
I wonder if having this interface is safe. Wouldn't it be problematic if we change the opcode of a VPInstruction object to an opcode of an operation which is represented with a VPInstruction subclass? Maybe it's better to not allow this and just create a VPInstruction using VPlanBuilder?

================
Comment at: lib/Transforms/Vectorize/VPlan.h:640
+
+  Instruction *getUnderlyingInstr() {
+    return cast_or_null<Instruction>(getUnderlyingValue());
----------------
Make it protected per design principle described in getUnderlyingValue in VPlanValue.h? Same for setUnderlyingInstr.

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1382

+/// Class that maps (parts) of an existing VPlan to trees of combined
+/// VPInstructions.
----------------
(parts) of -> (parts of)?

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1387
+  /// A DenseMapInfo implementation for holding DenseMaps and DenseSets of
+  /// sorted SmallVectors of const SCEV*.
+  struct BundleDenseMapInfo {
----------------
SCEV -> VPValue?

I guess we couldn't have a single templatized version for sorted SmallVectors of any pointer, right?

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1439
+public:
+  VPSlp(VPInterleavedAccessInfo &IAI) : IAI(IAI) {}
+
----------------
formatting

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1450
+  bool isCompletelySLP() const { return CompletelySLP; }
+};
 } // end namespace llvm
----------------
Something to think about is the long term of this class. I wonder if it would be better to split it multiple ones, matching what we have for loop vectorization. For example, we may want to have a special VPlan (VPlanSLP) to represent the SLP planning information, a VPlanSLPBuilder, a VPlanSLPContext (if necessary)... We don't have to do it as part of this patch but it's something to keep in mind.

================
Comment at: lib/Transforms/Vectorize/VPlanSLP.cpp:62
+
+static unsigned getNumUniqueUsers(VPValue *V) {
+  SmallPtrSet<VPUser *, 4> Users(V->user_begin(), V->user_end());
----------------
Would it make sense to move this to VPValue?

================
Comment at: lib/Transforms/Vectorize/VPlanSLP.cpp:66
+}
+
+VPInstruction *VPSlp::addCombined(ArrayRef<VPValue *> Operands,
----------------
Some doc would be great. Same for some of the methods below.

================
Comment at: lib/Transforms/Vectorize/VPlanSLP.cpp:210
+      B->getOpcode() != Instruction::Load &&
+      B->getOpcode() != Instruction::Store)
+    return true;
----------------
If opcodes are the same at this point, shouldn't we keep only A or B checks but not both?
You could add `assert(A->getOpcode() == B->getOpcode())` to be safe.

================
Comment at: lib/Transforms/Vectorize/VPlanSLP.cpp:219
+/// Implements getLAScore from Listing 7 in the paper.
+/// Traverses and compares operands of V1 and V2 to MaxLevel.
+static unsigned getLAScore(VPValue *V1, VPValue *V2, unsigned MaxLevel,
----------------
I would be great if you could elaborate a bit what 'score' is exactly.

https://reviews.llvm.org/D49491