[PATCH] Reducing the costs of cast instructions to enable more vectorization of smaller types in LoopVectorize

Fri Jun 19 05:35:59 PDT 2015

Hi Sam,

Some comments from me inline.

Cheers,
Silviu


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1212
@@ -1168,1 +1211,3 @@
+  std::map<unsigned, std::map<Instruction*, Type*>> NarrowInstrs;
+  std::map<unsigned, std::map<Instruction*, Type*>> TBCNarrowInstrs;
 };
----------------
I think the name "TBCNarrowInstrs" is potentially confusing. Maybe TBC -> Candidate?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5236
@@ +5235,3 @@
+
+inline Type *getLargestType(Type *T0, Type *T1) {
+  if (T0->isIntegerTy()) {
----------------
Why use inline here? I would normally trust the compiler to do the right thing.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5298
@@ +5297,3 @@
+  // To get this value, both of the operands are i8 which they can only produce
+  // results that touch 16 bits.
+  if (I->getOpcode() == Instruction::Sub) {
----------------
The check for sub could potentially be removed, but it depends on exactly what we're checking for.
It would be interesting to have an example where sub is an issue that would prevent the narrowing.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5309
@@ +5308,3 @@
+  if (I->getOpcode() == Instruction::LShr ||
+      I->getOpcode() == Instruction::AShr) {
+    unsigned ShiftVal = cast<ConstantInt>(I->getOperand(1))->getZExtValue();
----------------
The comment says "logical shifts" but the code also allows ashr? Is this correct?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5386
@@ +5385,3 @@
+    return;
+
+  unsigned Opc = I->getOpcode();
----------------
It would be a good idea to have a comment that explains how these were chosen, and why some of them need extra checking.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5420
@@ +5419,3 @@
+  const APInt i8MaxValue = APInt::getMaxValue(8);
+  const APInt i16MaxValue = APInt::getMaxValue(16);
+  SmallVector<Type*, 2> NarrowOpTys;
----------------
Why consider here only narrowing to i8 or i16? Would narrowing from i64 to i32 also make sense?

http://reviews.llvm.org/D9822

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/