[llvm-commits] [PATCH] LoopVectorizer
Hal Finkel
hfinkel at anl.gov
Wed Oct 17 18:24:01 PDT 2012
----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Ralf Karrenberg" <karrenberg at cdl.uni-saarland.de>
> Cc: "llvm-commits at cs.uiuc.edu LLVM" <llvm-commits at cs.uiuc.edu>
> Sent: Wednesday, October 17, 2012 11:21:49 AM
> Subject: Re: [llvm-commits] [PATCH] LoopVectorizer
>
>
> On Oct 17, 2012, at 1:59 AM, Ralf Karrenberg
> <karrenberg at cdl.uni-saarland.de> wrote:
>
> > Hi,
> >
> > Thanks Nadav, this is a great start for the loop vectorization
> > implementation!
> > I haven't done a code review for LLVM so far, so I'll be happy to
> > get some feedback for the feedback as well :).
> >
> > On 10/17/12 8:30 AM, Hal Finkel wrote:
> >>> + //Check that all of the users of the loop are inside the BB.
> >>> + for (Value::use_iterator it = I->use_begin(), e =
> >>> I->use_end();
> >>> + it != e; ++it) {
> >>> + Instruction *U = cast<Instruction>(*it);
> >>> + BasicBlock *Parent = U->getParent();
> >>> + if (Parent != &BB) {
> >>> + DEBUG(dbgs() << "LV: Found an outside user for : "<< *U
> >>> << "\n");
> >>> + return false;
> >>> + }
> >>> + }
> >> It seems like handling this case requires just as much code as
> >> this check. If there is an outside user, you just need to extract
> >> the last vector element and call replaceAllUsesWith(), right?
> > I may be mistaken, but I think that would require introducing
> > additional blocks:
> > You cannot put an extractelement operation behind the loop since if
> > the loop body is not executed, you have a value that is not a
> > vector. You also don't want to put it inside the loop because then
> > the extract would be executed in every iteration although it is
> > only required once. That would leave you the option of creating a
> > branch around the loop and an additional block between the loop
> > exit and the join block (which holds a phi for each of those
> > values).
> >
>
> I think that this is incorrect. The in-loop value is a vector while
> the outside user is a scalar. Think of variable "sum" which adds
> values from the loop. We can't just pick the last element. In the
> case of reduction variables (which we need to address at some point)
> we can just redo the ops on all of the vector elements.
The current pass handles only fixed trip-count (single-exit) loops without inter-iteration dependencies, correct? For this class of loops, I believe that what I proposed is correct. Maybe I'm thinking about this incorrectly, can you please provide a counterexample?
Thanks again,
Hal
>
> > Other comments:
> >
> > + /// Create a broadcast instruction. This method generates a
> > broadcast
> > + /// instruction (shuffle) for loop invariant values and for the
> > induction
> > + /// value. If this is the induction variable then we extend it
> > to N, N+2, ...
> > + /// this is needed because each iteration in the loop
> > corresponds to a SIMD
> > + /// element.
> > + Value *getBroadcastInstrs(Value *V);
> >
> > Shouldn't it be <N, N+1, ... N+W-1> where W is the vectorization
> > factor?
>
> Yes. Fixed the comment.
>
> > + /// Returns the maximum vectorization factor that we *can* use
> > to vectorize
> > + /// this loop. This does not mean that it is profitable to
> > vectorize this
> > + /// loop, only that it is legal to do so. This may be a large
> > number. We
> > + /// can vectorize to any SIMD width below this number.
> > + unsigned getLoopMaxVF();
> >
> > Denote that the number can only be a power of two.
> >
> > + DataLayout *TD;
>
> Renamed.
>
> >
> > Should probably be "DL" now since it is not called "TargetData"
> > anymore ;).
> >
> > getBroadcastInstrs()
> > Do you handle constants with this method as well? Is LLVM's
> > constant propagation pass powerful enough to create vector
> > constants from the shuffles? Otherwise, I would have some code
> > that creates vector constants from scalars directly.
>
> This handles constants. In any case, this is hoisted outside of the
> loop.
>
> >
> > +Value *SingleBlockLoopVectorizer::getConsecutiveVector(Value* Val)
> > {
> > + assert(Val->getType()->isVectorTy() && "Must be a vector");
> > + assert(Val->getType()->getScalarType()->isIntegerTy() &&
> > + "Elem must be an integer");
> > + Type *ITy = Val->getType()->getScalarType();
> > + VectorType *Ty = cast<VectorType>(Val->getType());
> > + unsigned VF = Ty->getNumElements();
> >
> > VF is already a member variable. Remove the last line or convert it
> > into an assertion.
>
> Done.
>
> > Minor remark: I think getConsecutiveVector() is only required to
> > run once, so the resulting constant could be created in the
> > constructor already unless you don't want more member variables.
> > Also, the widening of the induction variable should probably only
> > be done once, so you may want to add another member variable for
> > the widened one or add it to the WidenMap.
> >
>
> I agree. I will fix this later.
>
> > + for (unsigned op = 0, e = Instr->getNumOperands(); op != e;
> > ++op) {
> >
> > Is there a reason not to use op_begin()/op_end()?
>
> I think that it is easier to access the last index this way.
>
> >
> > ScalarizeInstruction()
> > You create new extract operations every time an instruction is used
> > as an operand to a non-vectorizable instruction. Can we rely on
> > subsequent optimization phases to remove this redundancy again?
> >
>
> This is okay. Because of two optimizations:
>
> 1. dead code removes unused instructions.
>
> 2. extract from a vector which is build from a sequence of
> insert-elements is optimized to using the original elements.
>
> So, even if we have a sequence of scalar instructions, we won't see
> the accordion effect where we open-and-close vectors.
>
>
> > More importantly however, you create extract/insert operations for
> > every non-vectorizable instruction independently. At some point,
> > we probably need some heuristic that analyzes chains of
> > instructions: in cases like alternating vectorizable and
> > non-vectorizable instructions it may be beneficial to scalarize
> > instructions that are vectorizable to reduce pack/unpack costs
> > [1].
>
> We need a cost model, but this is independent of the optimizations i
> mentioned above.
>
> >
> > + // Get the trip count from the BE count by adding 1.
> > + ExitCount = SE->getAddExpr(ExitCount,
> > + SE->getConstant(ExitCount->getType(), 1));
> >
> > Bad alignment on the second argument. "BE" is a typo I think, it
> > does not appear anywhere else.
> >
> > case Instruction::Select: {
> > + // Widen selects.
> > + Value *A = getVectorValue(Inst->getOperand(0));
> > + Value *B = getVectorValue(Inst->getOperand(1));
> > + Value *C = getVectorValue(Inst->getOperand(2));
> > + WidenMap[Inst] = Builder->CreateSelect(A, B, C);
> > + break;
> > + }
>
>
>
> >
> > If the condition of a select is uniform we don't want to widen it,
> > so I would suggest a special case here instead of calling
> > getBroadcastInstrs() via getVectorValue() if there is no mapping
> > in WidenMap.
> >
>
> Yes. We need to check if the selector is uniform. This is a good
> optimization.
>
>
> > + case Instruction::Store: {
> > + case Instruction::Load: {
> >
> > What about loads/stores with entirely uniform indices?
>
> Also a good optimization. Will do.
>
> > We could simply insert a single scalar operation there instead of
> > cloning them to do W times the same thing.
> >
> > + // Check that the underlying objects of the reads are writes are
> > either
> >
> > Typo, should be "reads and writes".
>
> Fixed.
>
> >
> >
> > Otherwise, the code looks pretty good to me. :)
> > I did not take a close look at the tests, though.
> >
>
> Thanks for the review ralf!
>
>
> > Cheers,
> > Ralf
> >
> > [1] "Efficient SIMD Code Generation for Irregular Kernels", Kim and
> > Han, PPoPP'12
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list