[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

hfinkel at anl.gov hfinkel at anl.gov
Thu Feb 19 17:10:28 PST 2015

Alright, there are several separable changes here:

1. The TTI change. I don't think this is the right way to solve the problem (and even if it were, I'd not change the backend to see IR types, that's not necessary, just convert the IR types into backend types (look for TLI->getTypeLegalizationCost(Ty)  in the default implementation of getArithmeticInstrCost, for example)).

  It seems like, in general, you want a way to measure the latency of some chain of instructions (other than just counting them). This is general problem, and I recommend going after that issue as follow-up work.

2. As noted below, I don't think you're counting the right thing (or at least, you don't seem to be counting what I'd expect). Can you please elaborate?

Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4564
@@ +4563,3 @@
+          U = *I++;
+          assert((I.atEnd()) &&
+                 "Expected exactly one use of reduction variable.");
I don't understand what this is doing. Don't you want to count the number of instructions needed to compute the value being 'added' to the reduction?

Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4598
@@ +4597,3 @@
+      Type *FloatTy = Type::getFloatTy(Context);
+      if (TTI.enableAggressiveFMAFusion(FloatTy) && FPDistance > 1)
+        FPDistance--;
This seems like an odd one-off to have here. FMAs are important, granted, but you don't even check if there are FMAs (or things likely to form FMAs) in the loop.



More information about the llvm-commits mailing list