[PATCH] [CodeGenPrepare] Move extractelement close to store if they can be combined.

Thu Oct 23 15:02:05 PDT 2014

Hi Hal,

Thanks for the feedback, in particular for catching the corner case with division.

I’ll update the patch shortly in the meantime a couple of inline comments.

Cheers,
-Quentin

================
Comment at: include/llvm/Target/TargetLowering.h:268
@@ +267,3 @@
+  /// in one instruction.
+  bool canCombineStoreAndExtract() const { return CanCombineStoreAndExtract; }
+
----------------
hfinkel wrote:
> I would really like this to be a callback that passes the type and the extraction index. For PowerPC, for example, I can make this free for some types, but only for index 0. For other indices, I can do it with a shuffle (which obviously has some extra cost, although not much as the load/store-based domain crossing).
> 
> IMHO, a great interface for this could be:
> int getStoreExtractCombineCost(Type *VectorTy, Value *Idx);
> 
Although I like the direction of this, how do we get the cost of the original sequence?
In other words, what value do you propose we use to decide whether or not it is worth kicking this optimization given those two parameters?

Indeed, I expect that we will replace the condition that guards the optimization with something like:
TLI->getStoreExtractCombineCost(ExtractInst->getType(), ExtractInst->getOperand(1)) < CostOfIndividualStoreAndExtract

I assume you were thinking "== 0” since you talked about free. Is it what you had in mind?

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:3264
@@ +3263,3 @@
+    // scalar to vector.
+    // The vector chain does not.
+    unsigned ScalarCost =
----------------
hfinkel wrote:
> You should add: because the target has asserted that it can combine the extract with the store (for free).
Agreed.

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:3314
@@ +3313,3 @@
+      if (!isa<ConstantInt>(Val) && !isa<UndefValue>(Val)) {
+        if (isa<ConstantFP>(Val))
+          DEBUG(dbgs() << "Constant FP: could have handle that.\n");
----------------
hfinkel wrote:
> What is the complication to handling this case?
> 
> If you really can't test this on ARM, please explicitly note this as FIXME.
None, I just have not seen any motivating example.
I’ll see if I can come up with something, otherwise I will put a FIXME.

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:3324
@@ +3323,3 @@
+    return TLI.isOperationLegalOrCustom(ISDOpcode,
+                                        EVT::getEVT(getTransitionType(), true));
+  }
----------------
hfinkel wrote:
> You also need to exclude division (because introducing a division by undef is a bad idea), or make sure that if you do this for division, then the other fields are set to some non-zero quantity.
Good point, the problem may arise if the non-constant vector is used on the right hand side.

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:3387
@@ +3386,3 @@
+      NewVal =
+          ConstantInt::get(TransitionTy, cast<ConstantInt>(Val)->getValue());
+    else
----------------
hfinkel wrote:
> Not withstanding my previous comment on division, we could really put more undefs in this vector instead of always using a splat.
Although it would give more freedom to the lowering to use undef, I was afraid the backends would be able to simplify the sequence to a scalar operation. A quick test proved me I was wrong.
So I will update with more undef (unless we are on the right hand side of a division :)).
Are you seeing other cases where we shouldn’t use undef?

http://reviews.llvm.org/D5921