[llvm-commits] [PATCH] LoopVectorizer
Ralf Karrenberg
karrenberg at cdl.uni-saarland.de
Wed Oct 17 01:59:11 PDT 2012
Hi,
Thanks Nadav, this is a great start for the loop vectorization
implementation!
I haven't done a code review for LLVM so far, so I'll be happy to get
some feedback for the feedback as well :).
On 10/17/12 8:30 AM, Hal Finkel wrote:
>> + //Check that all of the users of the loop are inside the BB.
>> + for (Value::use_iterator it = I->use_begin(), e = I->use_end();
>> + it != e; ++it) {
>> + Instruction *U = cast<Instruction>(*it);
>> + BasicBlock *Parent = U->getParent();
>> + if (Parent != &BB) {
>> + DEBUG(dbgs() << "LV: Found an outside user for : "<< *U << "\n");
>> + return false;
>> + }
>> + }
> It seems like handling this case requires just as much code as this check. If there is an outside user, you just need to extract the last vector element and call replaceAllUsesWith(), right?
I may be mistaken, but I think that would require introducing additional
blocks:
You cannot put an extractelement operation behind the loop since if the
loop body is not executed, you have a value that is not a vector. You
also don't want to put it inside the loop because then the extract would
be executed in every iteration although it is only required once. That
would leave you the option of creating a branch around the loop and an
additional block between the loop exit and the join block (which holds a
phi for each of those values).
Other comments:
+ /// Create a broadcast instruction. This method generates a broadcast
+ /// instruction (shuffle) for loop invariant values and for the induction
+ /// value. If this is the induction variable then we extend it to N,
N+2, ...
+ /// this is needed because each iteration in the loop corresponds to
a SIMD
+ /// element.
+ Value *getBroadcastInstrs(Value *V);
Shouldn't it be <N, N+1, ... N+W-1> where W is the vectorization factor?
+ /// Returns the maximum vectorization factor that we *can* use to
vectorize
+ /// this loop. This does not mean that it is profitable to vectorize this
+ /// loop, only that it is legal to do so. This may be a large number. We
+ /// can vectorize to any SIMD width below this number.
+ unsigned getLoopMaxVF();
Denote that the number can only be a power of two.
+ DataLayout *TD;
Should probably be "DL" now since it is not called "TargetData" anymore ;).
getBroadcastInstrs()
Do you handle constants with this method as well? Is LLVM's constant
propagation pass powerful enough to create vector constants from the
shuffles? Otherwise, I would have some code that creates vector
constants from scalars directly.
+Value *SingleBlockLoopVectorizer::getConsecutiveVector(Value* Val) {
+ assert(Val->getType()->isVectorTy() && "Must be a vector");
+ assert(Val->getType()->getScalarType()->isIntegerTy() &&
+ "Elem must be an integer");
+ Type *ITy = Val->getType()->getScalarType();
+ VectorType *Ty = cast<VectorType>(Val->getType());
+ unsigned VF = Ty->getNumElements();
VF is already a member variable. Remove the last line or convert it into
an assertion.
Minor remark: I think getConsecutiveVector() is only required to run
once, so the resulting constant could be created in the constructor
already unless you don't want more member variables.
Also, the widening of the induction variable should probably only be
done once, so you may want to add another member variable for the
widened one or add it to the WidenMap.
+ for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
Is there a reason not to use op_begin()/op_end()?
ScalarizeInstruction()
You create new extract operations every time an instruction is used as
an operand to a non-vectorizable instruction. Can we rely on subsequent
optimization phases to remove this redundancy again?
More importantly however, you create extract/insert operations for every
non-vectorizable instruction independently. At some point, we probably
need some heuristic that analyzes chains of instructions: in cases like
alternating vectorizable and non-vectorizable instructions it may be
beneficial to scalarize instructions that are vectorizable to reduce
pack/unpack costs [1].
+ // Get the trip count from the BE count by adding 1.
+ ExitCount = SE->getAddExpr(ExitCount,
+ SE->getConstant(ExitCount->getType(), 1));
Bad alignment on the second argument. "BE" is a typo I think, it does
not appear anywhere else.
case Instruction::Select: {
+ // Widen selects.
+ Value *A = getVectorValue(Inst->getOperand(0));
+ Value *B = getVectorValue(Inst->getOperand(1));
+ Value *C = getVectorValue(Inst->getOperand(2));
+ WidenMap[Inst] = Builder->CreateSelect(A, B, C);
+ break;
+ }
If the condition of a select is uniform we don't want to widen it, so I
would suggest a special case here instead of calling
getBroadcastInstrs() via getVectorValue() if there is no mapping in
WidenMap.
+ case Instruction::Store: {
+ case Instruction::Load: {
What about loads/stores with entirely uniform indices? We could simply
insert a single scalar operation there instead of cloning them to do W
times the same thing.
+ // Check that the underlying objects of the reads are writes are either
Typo, should be "reads and writes".
Otherwise, the code looks pretty good to me. :)
I did not take a close look at the tests, though.
Cheers,
Ralf
[1] "Efficient SIMD Code Generation for Irregular Kernels", Kim and Han,
PPoPP'12
More information about the llvm-commits
mailing list