[PATCH] Reducing the costs of cast instructions to enable more vectorization of smaller types in LoopVectorize

Thu Jun 11 02:56:55 PDT 2015

Hi Sam,

Thanks for making this update. I have lots of coding style nits but primarily we need two algorithms changed from being purely recursive to being iterative. They'll still do the same thing just without overflowing the stack on large programs.

Cheers,

James

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2639
@@ +2638,3 @@
+void adjustForClampedInstrs(LoopVectorizationCostModel *CM, unsigned VF) {
+  auto &ClampedVecTys = CM->getClampedInstrTys(VF);
+  for (auto CV : ClampedVecTys) {
----------------
This is recursive, and unboundedly so. This means that with a large function, we may blow the stack.

Instead of recursing, you could change this to be a simple iterative algorithm. Just iterate as you are currently doing, but call visitClampedInstr() for ALL instructions that are clamped, not just TruncInsts.

Store the newly created clamped instructions in a map<Value*,Value*>, and when you're visiting a clamped instruction check the map for each of its operands. If a value exists, we've already visited this - use it. Otherwise insert a cast.

Then, when we hit a trunc instruction, do the replacement (using OldTrunc->replaceAllUsesWith(NewTrunc)) and that'll make the entire tree live.

This will only really work if you visit all instructions in dominator order, so we visit defs before uses. Luckily the caller of this function already finds such an ordering (LoopBlocksDFS), so you can just use that.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4581
@@ +4580,3 @@
+bool
+LoopVectorizationCostModel::isNarrowInstruction(Instruction *I,
+                                                unsigned VF,
----------------
This is also unboundedly recursive. But it can be changed into an iterative algorithm.

Instead of searching bottom-up, search top-down. For every instruction (visiting defs before uses, so in LoopBlocksDFS order!), check if it is narrow. That's easy because you just need to check its immediate operands. If so, store it.

Because we visit all instructions, and we visit all defs before uses, any single isNarrowInstruction() call only needs to check its immediate operands - it doesn't need to go crawling through a tree. This removes the recursion and makes for a more elegant algorithm.

http://reviews.llvm.org/D9822

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/