[PATCH] D30710: [LV] Vectorize GEPs
Elena Demikhovsky via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 21 06:54:38 PDT 2017
delena added inline comments.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4710
+ if (Legal->isUniform(GEP)) {
+ auto *Clone = cast<GetElementPtrInst>(GEP->clone());
+ for (unsigned I = 0; I < GEP->getNumOperands(); ++I)
----------------
mssimpso wrote:
> mssimpso wrote:
> > delena wrote:
> > > mkuper wrote:
> > > > delena wrote:
> > > > > mssimpso wrote:
> > > > > > mssimpso wrote:
> > > > > > > delena wrote:
> > > > > > > > mssimpso wrote:
> > > > > > > > > delena wrote:
> > > > > > > > > > mssimpso wrote:
> > > > > > > > > > > mssimpso wrote:
> > > > > > > > > > > > delena wrote:
> > > > > > > > > > > > > GEP is uniform when the memory instruction (User) is uniform, right?
> > > > > > > > > > > > > Why do you need to broadcast it?
> > > > > > > > > > > > This is "loop-invariant" in the LoopAccessInfo::isUniform sense not in the LoopVectorizationCostModel::isUniformAfterVectorization sense (we really should come up with some better names for these concepts). Sometimes we end up with GEPs contained in the loop body that have loop-invariant operands. I'm not sure why these GEPs aren't hoisted out of the loop before the vectorizer runs.
> > > > > > > > > > > >
> > > > > > > > > > > > In theory, we should be able to hoist these GEPs out of the loop ourselves, but we have assumptions elsewhere that if an instruction existed in the original loop body, it will map to something inside the vectorized loop body. So I just clone and broadcast the original GEP inside the loop here. The change to the first-order-recurrence.ll test case is reflective of this.
> > > > > > > > > > > I just thought about a different way to implement this. Instead of the uniform check, we could check if the value returned by the IRBuilder has a vector type, and if not, do the broadcast. If all the operands are loop-invariant in the code below, the IRBuilder will return a scalar GEP. This will probably be fewer lines of code anyway. Let me give it a shot.
> > > > > > > > > > Do you have any real case with loop invariant GEP? Do you mean the last test case from first-order-recurrence.ll:
> > > > > > > > > > define void @PR29559() {
> > > > > > > > > > entry:
> > > > > > > > > > br label %scalar.body
> > > > > > > > > >
> > > > > > > > > > scalar.body:
> > > > > > > > > > %i = phi i64 [ 0, %entry ], [ %i.next, %scalar.body ]
> > > > > > > > > > %tmp2 = phi float* [ undef, %entry ], [ %tmp3, %scalar.body ]
> > > > > > > > > > %tmp3 = getelementptr inbounds [3 x float], [3 x float]* undef, i64 0, i64 0
> > > > > > > > > > %i.next = add nuw nsw i64 %i, 1
> > > > > > > > > > %cond = icmp eq i64 %i.next, undef
> > > > > > > > > > br i1 %cond, label %for.end, label %scalar.body
> > > > > > > > > >
> > > > > > > > > > for.end:
> > > > > > > > > > ret void
> > > > > > > > > > }
> > > > > > > > > > The %tmp2 and %tmp3 should be scalar.
> > > > > > > > > > In my understanding, if all operands of the GEP are loop invariant, the Load/Store is uniform. I just do not understand in which cases we'll need to broadcast the GEP.
> > > > > > > > > I should add a test for the broadcast case to make this clear, shouldn't I? If we have a GEP like %tmp3 in the example you pasted that was stored to memory with a vector store, we would need a vector version of it, and it wouldn't be "uniform-after-vectorization". It's still "uniform" in the LAA sense because it has loop-invariant operands. Because if this, IRBuilder will give us a scalar GEP that we will then broadcast. I'll add the test and update.
> > > > > > > > If %tmp3 should be stored as a vector, the user (store inst) will broadcast it. You can keep it scalar.
> > > > > > > I see your point, in that getVectorValue() will do the broadcast upon the first use of a loop-invariant value, and initialize the mapping. But I think your comment applies to the vectorization of all instructions, right? Not just GEPs? For example, if for some reason we found an "add i32 0, 0" inside the body of the loop, we currently vectorize it like any instruction. But we could instead skip vectorizing it and allow getVectorValue() to do the broadcast on-demand.
> > > > > > >
> > > > > > > Would it make sense to have something like:
> > > > > > >
> > > > > > > ```
> > > > > > > if (Legal->isUniform(&I))
> > > > > > > continue;
> > > > > > > ```
> > > > > > >
> > > > > > > Before the switch statement in the main loop of vectorizeBlockInLoop()?
> > > > > > To answer my previous question, no, we can't rely on getVectorValue() to perform the broadcast for "uniform" values. The function that performs the broadcast (getBroadcastInstrs) doesn't clone the original instruction from the old loop body into the new loop body. So we would end up with a broken module where the broadcast in the vector loop preheader uses a "uniform" value defined within the body of the scalar loop.
> > > > > >
> > > > > > So going back to Elena's comment about %tmp3 being broadcast by the store that uses it - this doesn't work. We have to either vectorize or scalarize the GEP. Otherwise, getBroadcastInsts would literally broadcasts %tmp3 from the body of the original loop into the vector loop preheader, violating dominance.
> > > > > >
> > > > > > Again, I'll add a test case to make this more concrete. Sorry for the noise.
> > > > > I agree that your code works. But I'd organize the code in another way:
> > > > >
> > > > > if (OrigLoop->hasLoopInvariantOperands(GEP) || VF ==1 || isUniformAfterVectorization(GEP, VF)) {
> > > > > NewGEP = GEP->clone();
> > > > > initScalar(GEP, NewGEP);
> > > > > } else {
> > > > > // build vector GEP exactly as you do
> > > > > ...
> > > > > VectorLoopValueMap.initVector(&I, Entry);
> > > > > }
> > > > >
> > > > > Michael, what do you think?
> > > > I think the current version is slightly less brittle, but I understand why you'd want to be explicit. I'm fine with either option, but if we go with the one you're suggesting, we'd still need an assert that the resulting GEP isn't scalar.
> > > >
> > > > Also, this is different from the current version, right? The current version looks equivalent to checking that "OrigLoop->hasLoopInvariantOperands(GEP) || VF ==1".
> > > isUniformAfterVectorization is redundant here, I agree.
> > > vectorizeMemoryInstruction() creates another GEP for consecutive loads and stores.
> > >
> > > > we'd still need an assert that the resulting GEP isn't scalar.
> > > Yes.
> > >
> > > Something like this?
> > > if (OrigLoop->hasLoopInvariantOperands(GEP)) {
> > >
> > > NewGEP = GEP->clone();
> > > initScalar(GEP, NewGEP);
> > > } else if (VF ==1) {
> > > NewGEP = GEP->clone();
> > > VectorLoopValueMap.initVector(GEP, NewGEP);
> > > } else {
> > >
> > > // build vector GEP exactly as you do
> > > . ..
> > > assert(NewGEP->getType()->isVectorTy());
> > > Entry[Part] = NewGep;
> > >
> > > VectorLoopValueMap.initVector(&I, Entry);
> > > }
> > >
> > I agree that we should be explicit, but I'm not a huge fan of this approach. This is not how initScalar and initVector work (unless you've intentionally omitted some code). They are called with a (Value *, EntryTy), so before using them, we have to create all the VF x UF values.
> >
> > For your call to initScalar, we would be better off just calling scalarizeInstruction (which does this). But if we really should scalarize the GEP, we should have already done so with the check at the top of the loop. Note, though, that scalarizing the GEP will change what it's vector version will look like. Instead of having the broadcast, we would build the vector with a bunch of inserts. But again, we already know we need a vector version of the GEP, so why scalarize?
> >
> > For the VF == 1 case, once you create the Entry and initialize all the elements, the code will look pretty much the same as the "else" case here. So why separate it?
> If we want to be more explicit here, I would suggest something like this (which is very similar to the first version of the patch), although I haven't tested this yet:
>
> ```
> if (VF > 1 && OrigLoop->hasLoopInvariantOperands(GEP)) {
> // clone GEP and broadcast clone
> for (unsigned Part = 0; Part < UF; ++Part)
> Entry[Part] = /* broadcast clone */
> } else {
> for (unsigned Part = 0; Part < UF; ++Part) {
> // build GEP as we do now
> // add an assert that the GEP we build has a vector type
> Entry[Part] = /* vector gep */
> }
> VectorLoopValueMap.initVector(&I, Entry);
> ```
>
> So the loop-invariant case is separate and more explicit, but we still vectorize the GEP instead of trying to scalarize it.
> Note, though, that scalarizing the GEP will change what it's vector version will look like. Instead of having the broadcast, we would build the vector with a bunch of inserts.
The scalarizeInstruction() works fine and I agree, we can use it instead of clone(). The "bunch of inserts" is inserted by getVectorvalue().
It happens because inside getVectorValue() we does not check for loop-invariant operands:
We only check that the instruction is in the list of uniforms:
if (Cost->isUniformAfterVectorization(I, VF)) {
VectorValue = getBroadcastInstrs(getScalarValue(V, Part, 0));
}
> But again, we already know we need a vector version of the GEP, so why scalarize?
How do you know this? You do not check the users. If you build a splat vector and the GEP
is used in a scalar form, the scalar value will be extracted from the broadcast. Hopefully, another pass will resolve this redundancy.
Can you try to change getvectorValue() to
if (Cost->isUniformAfterVectorization(I, VF) || OrigLoop->hasLoopInvariantOperands()) {
VectorValue = getBroadcastInstrs(getScalarValue(V, Part, 0));
}
https://reviews.llvm.org/D30710
More information about the llvm-commits
mailing list