[PATCH] Fix PR19657 : SLP vectorization doesn't combine scalar load to vector loads

Wed May 21 16:51:37 PDT 2014

It is the relative depth of common uses.

> Without looking too deeply, would it be feasible to defer the insertion of the extractions after the whole tree has been vectorized? If that sounds potentially fruitful I can dig a bit more on that direction.

No because we would give up vectorization the tree at the scheduling conflict (that really is none but the algorithm we use can't tell because it uses program order).

But, I think we should get away (for this scheduling problem where the order we pick matters for the algorithm we use) of picking the subtree traversal based on basic block numbering (because the earlier user has to come in between the common use and the later user, so look at the later one first which will have a greater common depth if there is one). At least according to my on the back of the napkin reasoning ...

if (getLatestBBNumber(Left) > getLatestBBNumber(Right)) {
  buildTree_rec(Left, depth+1)
  buildTree_rec(Right, depth+1)
} else {
  buildTree_rec(Right, depth+1)
   buildTree_rec(Left, depth+1)
}

http://reviews.llvm.org/D3800