[PATCH] D37737: [SLPVectorizer] Merge subsequent gather loads.

Mon Sep 18 06:59:03 PDT 2017

rengolin added a comment.

Hi Florian,

There are a lot of assumptions in there that I'm not sure hold on all cases (so they should be guarded) as well as flag-passing that is at least unnecessary (a static isGatherLoad or something could have done the job), plus a few other comments here and there.

I don't think it's worth splitting this into a separate pass (you're really just collecting the loads and vectorising them), but you do need the standard contract for performance changes:

- Appropriate guard to limit not only the (sub)architecture this will run
- Logic that account for which cases for them it is beneficial (TargetInfo hook)
- Benchmark results on all affected targets (whatever SGEMM benchmark you used)
- Show that no noticeable change happened in the test-suite (benchmark mode) because of it
- If possible, also run SPEC or some other, to show similar lack of negative impact

cheers,
--renato

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:558
   /// A negative number means that this is profitable.
-  int getTreeCost();
+  int getTreeCost(bool GatherLoad=false);

----------------
This looks like a very specific change to such a generic function

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2354
+    // instructions.
+    if (GatherLoads)
+      continue;
----------------
why is an argument flag inside the loop, and after two other non-constant checks? this doesn't make sense

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3509

+  if (!GatherLoads.empty()) {
+    SmallVector<Instruction *, 4> LoadV(GatherLoads.begin(), GatherLoads.end());
----------------
is this *always* correct? Can we merge the loads and will be this profitable for *all* architectures?

https://reviews.llvm.org/D37737