[PATCH] D34473: [LV] Changing the interface of ValueMap

Sat Jun 24 15:39:43 PDT 2017

Ayal added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:685-687
-    friend const VectorParts &InnerLoopVectorizer::getVectorValue(Value *V);
-    friend Value *InnerLoopVectorizer::getScalarValue(Value *V, unsigned Part,
-                                                      unsigned Lane);
----------------
mssimpso wrote:
> Don't we still need to declare this functions, and the others, as friends?
Exposing this interface that manipulates individual Values only, keeps their storage internal to ValueMap, w/o providing **any** external befriended method access to it. This follows the thought of eventually moving this interface out of ILV and into a standalone storage class.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:699
+        for (unsigned Part = 0; Part < UF; ++Part)
+          Entry[Part].resize(VF);
+        ScalarMapStorage[Key] = Entry;
----------------
mssimpso wrote:
> `Entry[Part].resize(VF)` null initializes the values, right?
Ah, sure, if we remember to do `Entry[Part].resize(VF, nullptr)` ... Good catch, Thanks!

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:711
+    /// setVectorValue().
+    void unsetVectorValue(Value *Key, unsigned Part) {
+      assert(hasVectorValue(Key, Part) && "Vector value not set for part");
----------------
mkuper wrote:
> Since unset is only ever used before set, I think a nicer interface would be to have "reset" that combines an unset with a set.
> But if you prefer this, I'm ok with it.
Having "reset" was our original thought too, will return to it. Having "unset" may also undermine hasAnyVectorValue() btw.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2449-2450
+    VectorLoopValueMap.setVectorValue(EntryVal, Part, LastInduction);
+    if (isa<TruncInst>(EntryVal))
+      addMetadata(LastInduction, EntryVal);
     LastInduction = cast<Instruction>(addFastMathFlag(
----------------
mssimpso wrote:
> It might be more efficient in the common case if we pulled this out of the loop. Though, I don't really have a strong preference.
> 
> ```
> if (isa<TruncInst>(EntryVal))
>   for (unsigned Part = 0; Part < UF; ++Part)
>     addMetadata(getOrCreateVectorValue(EntryVal, Part);
> ```
> 
> Hopefully, I've understood the interface correctly...
We could indeed unswitch this invariant "if" manually; but its also good to reuse `LastInduction` rather than retrieve it. The trip count is probably small for this to have noticeable impact. It may also be conceptually better to keep `addMetadata()` right next to `setVectorValue()`.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2563-2564
+      VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);
+      if (Trunc)
+        addMetadata(EntryPart, Trunc);
+    }
----------------
mssimpso wrote:
> Same as above.
ditto

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2706
   // vector form, we will construct the vector values on demand.
-  if (VectorLoopValueMap.hasScalar(V)) {
+  if (VectorLoopValueMap.hasAnyScalarValue(V)) {

----------------
mssimpso wrote:
> Do we need to check/assert that the map contains a non-null entry for each Lane? Is it possible that some of the entries would be null?
> 
> Maybe uniform-after-vectorization values have null entries for `VF > 1`? If this is still the case, it might be useful to re-think that choice. It might be simpler to just broadcast the `VF == 1` value to the other lanes instead of leaving them null.
As originally documented, this check is essentially asking if the value has been scalarized, to any lane/part. For that we can (continue to) check if the map contains the value, w/o looking at any specific lane/parts. Perhaps it would be better instead to ask directly if the decision is to scalarize this value (going forward, by consulting the associated VPlan recipe), instead of asking ValueMap for this information.

The uniform-after-vectorization values are indeed entered per lane 0 only, keeping other lanes null. Another way to re-think this choice is to allocate only UF entries for such values, and separate the type (scalar/vector) of the values being stored from how many values are being stored (per part, or per part-and-lane). Can add a TODO comment.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3005
+    Ptr = getOrCreateScalarValue(Ptr, 0, 0);
+  // Else we should vector version of GEP for Gather or Scatter
+  assert((ConsecutiveStride || CreateGatherScatter) &&
----------------
mkuper wrote:
> This comment doesn't quite parse.
Right; will fix the comment as we move it.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4038
+  Builder.SetInsertPoint(
+      cast<Instruction>(VectorLoopValueMap.getVectorValue(Phi, 0)));

----------------
mssimpso wrote:
> Why do we use `VectorLoopValueMap.getVectorValue()` here instead of `getOrCreateVectorValue()`? Is it because we know the Phi must exist in the map already? This makes sense to me.
> 
> I think in the original code, we had to use `VectorLoopValueMap.getVector()` to get a non-const reference.
Yes, we use `VectorLoopValueMap.getVectorValue(Phi, 0)` here because we know the Phi must already exist in the map; the original code asserts so, and should have also asserted that PhiParts[0] is not null, as it is passed to SetInsertPoint().

Agreed, the original code uses `VectorLoopValueMap.getVector(Phi)` mainly to get a non-const reference for (re)setting `PhiPart[Part] = Shuffle` further below.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4219

-  VectorParts &RdxParts = VectorLoopValueMap.getVector(LoopExitInst);
+  VectorParts RdxParts(UF);
   setDebugLocFromInst(Builder, LoopExitInst);
----------------
mssimpso wrote:
> Can't we just use `VectorLoopValueMap.getVectorValue(LoopExitInst, Part)`, etc.?
We could just use `VectorLoopValueMap.getVectorValue(LoopExitInst, Part)` and hold a single `Value *RdxPart` instead of holding the temporary `VectorParts RdxParts(UF)`, if the for-part loop creating the SExt/Zext's is fused with the for-part loop creating the Trunc's; this requires moving the insertion point back and forth...

We should indeed do this in the for-part loop below which creates a BinOp or MinMaxOp. Will do so in a revised version to be uploaded.

https://reviews.llvm.org/D34473