[llvm-commits] [PATCH] LoopVectorizer

Wed Oct 17 01:59:11 PDT 2012

Hi,

Thanks Nadav, this is a great start for the loop vectorization 
implementation!
I haven't done a code review for LLVM so far, so I'll be happy to get 
some feedback for the feedback as well :).

On 10/17/12 8:30 AM, Hal Finkel wrote:
>> +    //Check that all of the users of the loop are inside the BB.
>> +    for (Value::use_iterator it = I->use_begin(), e = I->use_end();
>> +         it != e; ++it) {
>> +      Instruction *U = cast<Instruction>(*it);
>> +      BasicBlock *Parent = U->getParent();
>> +      if (Parent != &BB) {
>> +        DEBUG(dbgs() << "LV: Found an outside user for : "<< *U << "\n");
>> +        return false;
>> +      }
>> +    }
> It seems like handling this case requires just as much code as this check. If there is an outside user, you just need to extract the last vector element and call replaceAllUsesWith(), right?
I may be mistaken, but I think that would require introducing additional 
blocks:
You cannot put an extractelement operation behind the loop since if the 
loop body is not executed, you have a value that is not a vector. You 
also don't want to put it inside the loop because then the extract would 
be executed in every iteration although it is only required once. That 
would leave you the option of creating a branch around the loop and an 
additional block between the loop exit and the join block (which holds a 
phi for each of those values).

Other comments:

+  /// Create a broadcast instruction. This method generates a broadcast
+  /// instruction (shuffle) for loop invariant values and for the induction
+  /// value. If this is the induction variable then we extend it to N, 
N+2, ...
+  /// this is needed because each iteration in the loop corresponds to 
a SIMD
+  /// element.
+  Value *getBroadcastInstrs(Value *V);

Shouldn't it be <N, N+1, ... N+W-1> where W is the vectorization factor?

+  /// Returns the maximum vectorization factor that we *can* use to 
vectorize
+  /// this loop. This does not mean that it is profitable to vectorize this
+  /// loop, only that it is legal to do so. This may be a large number. We
+  /// can vectorize to any SIMD width below this number.
+  unsigned getLoopMaxVF();

Denote that the number can only be a power of two.

+  DataLayout *TD;

Should probably be "DL" now since it is not called "TargetData" anymore ;).

getBroadcastInstrs()
Do you handle constants with this method as well? Is LLVM's constant 
propagation pass powerful enough to create vector constants from the 
shuffles? Otherwise, I would have some code that creates vector 
constants from scalars directly.

+Value *SingleBlockLoopVectorizer::getConsecutiveVector(Value* Val) {
+  assert(Val->getType()->isVectorTy() && "Must be a vector");
+  assert(Val->getType()->getScalarType()->isIntegerTy() &&
+         "Elem must be an integer");
+  Type *ITy = Val->getType()->getScalarType();
+  VectorType *Ty = cast<VectorType>(Val->getType());
+  unsigned VF = Ty->getNumElements();

VF is already a member variable. Remove the last line or convert it into 
an assertion.
Minor remark: I think getConsecutiveVector() is only required to run 
once, so the resulting constant could be created in the constructor 
already unless you don't want more member variables.
Also, the widening of the induction variable should probably only be 
done once, so you may want to add another member variable for the 
widened one or add it to the WidenMap.

+  for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {

Is there a reason not to use op_begin()/op_end()?

ScalarizeInstruction()
You create new extract operations every time an instruction is used as 
an operand to a non-vectorizable instruction. Can we rely on subsequent 
optimization phases to remove this redundancy again?

More importantly however, you create extract/insert operations for every 
non-vectorizable instruction independently. At some point, we probably 
need some heuristic that analyzes chains of instructions: in cases like 
alternating vectorizable and non-vectorizable instructions it may be 
beneficial to scalarize instructions that are vectorizable to reduce 
pack/unpack costs [1].

+  // Get the trip count from the BE count by adding 1.
+  ExitCount = SE->getAddExpr(ExitCount,
+ SE->getConstant(ExitCount->getType(), 1));

Bad alignment on the second argument. "BE" is a typo I think, it does 
not appear anywhere else.

case Instruction::Select: {
+        // Widen selects.
+        Value *A = getVectorValue(Inst->getOperand(0));
+        Value *B = getVectorValue(Inst->getOperand(1));
+        Value *C = getVectorValue(Inst->getOperand(2));
+        WidenMap[Inst] = Builder->CreateSelect(A, B, C);
+        break;
+      }

If the condition of a select is uniform we don't want to widen it, so I 
would suggest a special case here instead of calling 
getBroadcastInstrs() via getVectorValue() if there is no mapping in 
WidenMap.

+      case Instruction::Store: {
+      case Instruction::Load: {

What about loads/stores with entirely uniform indices? We could simply 
insert a single scalar operation there instead of cloning them to do W 
times the same thing.

+  // Check that the underlying objects of the reads are writes are either

Typo, should be "reads and writes".

Otherwise, the code looks pretty good to me. :)
I did not take a close look at the tests, though.

Cheers,
Ralf

[1] "Efficient SIMD Code Generation for Irregular Kernels", Kim and Han, 
PPoPP'12