[PATCH] D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size

Sun Aug 12 19:32:19 PDT 2018

dcaballe added a comment.

Thanks, Ayal! Some comments below.
Do you see any potential issue that could make modeling this in the VPlan native path complicated once we have predication?

Thanks,
Diego

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4965
+    LLVM_DEBUG(dbgs() << "LV: Aborting - trip count below given threshold for "
+                      << "loop with scalar iterations.\n");
     return None;
----------------
I'm trying to understand the purpose of thsi check. Prevent masked vectorization if TC is lower than `TinyTripCountInterleaveThreshold` (i.e., 128)?. Should we use an independent threshold for this?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5218
+  // variable of the vector loop wraps to zero, when tail is folded by masking;
+  // this currently happens when OptForSize, inwhich case IC is set to 1 above.
   unsigned IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
----------------
inwhich -> in which?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6355
+  if (Legal->blockNeedsPredication(OrigLoop->getHeader()))
+    CM.InterleaveInfo.reset();
+
----------------
Just curious. Could we prevent the computation or interleave groups for these cases instead of doing a reset?

================
Comment at: lib/Transforms/Vectorize/VPlan.h:609
   /// VPlan opcodes, extending LLVM IR with idiomatics instructions.
-  enum { Not = Instruction::OtherOpsEnd + 1 };
+  enum { Not = Instruction::OtherOpsEnd + 1, ICmpULE };

----------------
I'm worried that this new opcode could be problematic since now we can have compare instructions represented as VPInstructions with Instruction::ICmp and Instruction::FCmp opcodes and VPInstructions with VPInstruction::ICmpULE. Internally, we have a VPCmpInst subclass to model I/FCmp opcodes and their predicates. Do you think it would be better to upstream that subclass first? 

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1126
+  VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {
+    BackedgeTakenCount = new VPValue();
+  }
----------------
Instead of using an "empty" VPValue to model the BTC, would it be possible to model the actual operations to compute the BTC? We would only need a sub, right?

Repository:
  rL LLVM

https://reviews.llvm.org/D50480