[llvm] r250032 - [LoopVectorize] Shrink integer operations into the smallest type possible

Fri Nov 6 00:13:47 PST 2015

Hi James,

Thanks for working on this! Should we have the same issue in the SLP vectorizer as well?

Also, some remarks inline:

> @@ -434,3 +437,130 @@ llvm::Value *llvm::getSplatValue(Value *
> 
>   return InsertEltInst->getOperand(1);
> }
> +
> +DenseMap<Instruction*, uint64_t> llvm::computeMinimumValueSizes(
> +  ArrayRef<BasicBlock*> Blocks, DemandedBits &DB,
> +  const TargetTransformInfo *TTI) {
Would it be better to return the map by reference to avoid unnecessary copying?

> +
> +  // DemandedBits will give us every value's live-out bits. But we want
> +  // to ensure no extra casts would need to be inserted, so every DAG
> +  // of connected values must have the same minimum bitwidth.
> +  EquivalenceClasses<Value*> ECs;
> +  SmallVector<Value*,16> Worklist;
> +  SmallPtrSet<Value*,4> Roots;
> +  SmallPtrSet<Value*,16> Visited;
> +  DenseMap<Value*,uint64_t> DBits;
Why can’t we just use Instruction* instead of Value* in all these sets? It looks like it could save some casts in the code.

> @@ -2009,6 +2034,7 @@ InnerLoopVectorizer::getVectorValue(Valu
>   // If this scalar is unknown, assume that it is a constant or that it is
>   // loop invariant. Broadcast V and save the value for future uses.
>   Value *B = getBroadcastInstrs(V);
> +
Unneeded whitespace. There are also some other spots formatted badly - maybe reformat it while the code is still relatively fresh?

> @@ -5168,6 +5314,8 @@ LoopVectorizationCostModel::getInstructi
>   case Instruction::ICmp:
>   case Instruction::FCmp: {
>     Type *ValTy = I->getOperand(0)->getType();
> +    if (VF > 1 && MinBWs.count(dyn_cast<Instruction>(I->getOperand(0))))
> +      ValTy = IntegerType::get(ValTy->getContext(), MinBWs[I]);
We look for I->getOperand(0) in MinBWs, and then use MinBWs[I] (not operand 0) - is it intentional? If not, we can also avoid double querying MinBWs by using MinBWs.find.

> --- llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll (added)
> +++ llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll Mon Oct 12 07:34:45 2015
> @@ -0,0 +1,243 @@
> +; RUN: opt -S < %s -basicaa -loop-vectorize -simplifycfg -instsimplify -instcombine -licm -force-vector-interleave=1 2>&1 | FileCheck %s
Do we really need to run all these passes to check loop-vectorizer?

Thanks,
Michael