[PATCH] D78486: [SystemZ] Expand vector zero-extend into a shuffle.

Thu May 14 02:05:46 PDT 2020

jonpa updated this revision to Diff 263936.
jonpa added a comment.

Patch updated per suggestions.

> Ah, I see.  So this permute is one that had just been generated via getGeneralPermuteNode earlier in the loop?   It's a bit unfortunate to first create that ISD node and then right away delete it again.

Yes, that's the permute that was replaced.

> Maybe one option to try would be to do the check whether this permutation looks like an unpack (possibly requiring a permutation) at the top level *before* the loop. If this is true, then apply the permutation (in reverse) to the Bytes array, remove the zerovec from the Ops list, and run through the loop. At the end, remember to actually generate the unpack node. This would also remove the need to special-case handing the zerovec through the loop.

Ah, yes... :-) Now that it seems clear that we want to use just a single unpack, it is simpler to do it in this more integrated way in getNode().

> The whole operation would be worthwhile only if by performing the unpack we can remove the zerovec (i.e. it isn't still used elsewhere), and if by removing the zerovec the depth of the resulting tree (i.e. the critical path length) is also reduced. The depth is the log2 of the number of operands (rounded up), so if reducing the zerovec takes us from 3 to 2 ops, it is worthwhile, but if it takes us from 2 to 1 ops (or from 4 to 3 ops), then it probably is not, since adding the unpack would then just add one more instruction to the critical path.

That's a very good point - the idea behind building the tree of operations in the loop must be to avoid serialization so we shouldn't increase the depth with a trailing unpack. I added a check for this in case of >2 operands. For the case of a single source node shuffled with a zero vector (2 ops), there is instead a check that the unpack is enough by itself.

In rewriting this now I saw a (previous patch version) case where moving the zero-vector to last in Ops resulted in a vpkg (PACK) instead of a vperm. I am not sure if that was just a rare case, or if there are other operations than unpack that could also replace vperm if zero vector is put last. That is a separate issue from this, though, I think. Going to this version of the patch there were some minor changes which I think are the variations that arise from rearranging the Bytes vector when preparing for the unpack.

I still see two nice improvements over night: imagick:13% and x264: 4%.  :-)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78486/new/

https://reviews.llvm.org/D78486

Files:
  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
  llvm/lib/Target/SystemZ/SystemZISelLowering.h
  llvm/test/CodeGen/SystemZ/vec-move-16.ll
  llvm/test/CodeGen/SystemZ/vec-move-23.ll
  llvm/test/CodeGen/SystemZ/vec-move-24.ll
  llvm/test/CodeGen/SystemZ/vec-zext.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D78486.263936.patch
Type: text/x-patch
Size: 17089 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200514/af5315c5/attachment.bin>