[PATCH] D29489: Optimize SETCC + VSEL of incompatible or illegal types

Fri Feb 3 05:13:06 PST 2017

jonpa created this revision.

I noticed that SETCC+VSELECT resulted in unnecessary sign-extensions and truncations on SystemZ. On SystemZ, TypeWidenVec is returned for any legal vector type, and the boolean contents is zero or negative one (all one's or zero's).

The SETCC vector mask type is originally with element type i1. The type legalizer then promotes this vector to a legal vector type. For two elements, this always becomes v2i64, four elements always becomes v4i32, etc (same number of elements promoted to a legal vector type). The type legalizer then in turn fixes VSELECT to work with this promoted mask

This works well when the widths of the elements compared by SETCC matches those that are selected by VSELECT. However, in all the cases where those widths do not match, for example SETCC(v2i32) + VSELECT(v2i64), the promotion of the SETCC result remains unoptimized, as well as the truncation / extension of it to suit VSELECT.

For example:

  define <16 x i8> @fun(<16 x i8> %val1, <16 x i8> %val2,
                                        <16 x i8> %val3, <16 x i8> %val4) {
    %cmp = icmp eq <16 x i8> %val1, %val2
    %ret = select <16 x i1> %cmp, <16 x i8> %val3, <16 x i8> %val4
    ret <16 x i8> %ret
  }

  ->

  # BB#0:
      vceqb    %v0, %v24, %v26
      vsel    %v24, %v28, %v30, %v0
      br    %r14

while having just two elements:

  define <2 x i8> @fun(<2 x i8> %val1, <2 x i8> %val2,
                                      <2 x i8> %val3, <2 x i8> %val4) {
    %cmp = icmp eq <2 x i8> %val1, %val2
    %ret = select <2 x i1> %cmp, <2 x i8> %val3, <2 x i8> %val4
    ret <2 x i8> %ret
  }

  ->

  # BB#0:
      vceqb    %v0, %v24, %v26
      vuphb    %v0, %v0
      vuphh    %v0, %v0
      vuphf    %v0, %v0
      vrepih    %v1, 1807
      vperm    %v0, %v0, %v0, %v1
      veslb    %v0, %v0, 7
      vesrab    %v0, %v0, 7
      vsel    %v24, %v28, %v30, %v0
      br    %r14

This should really have been the same code as in the case of full vectors. In the longer version, there is first the vector compare element, then three unpacks to make the SETCC result legal, then a truncation (which gets the original vector back). Then, the second problem I have found is also highlighted here: after the truncation (vperm) there is an inreg sign extension of i1 (vector element shift left + artim. shift right). This is also completely unnecessary, due to the defined boolean contents of the target.

I have tried to tackle these two problems, and while not being sure of exactly how this should be done, the first attempt here at least achieves the right optimizations in simple tests for all these cases.

For the extensions / truncations back and forth produced by the type legalizer, I saw three alternatives.

1. Try to improve the legalizer to not do this. I thought this seemed bad because we probably don't want any optimizations happening in a legalize phase, right? That would however be simpler than trying to detect the cases during DAGCombiner.
2. Add code to handle this in DAGCombiner.
3. Do a custom lowering for the SystemZ target. It seems other targets do this, and perhaps this is what is expected?

I tried first to experiment with the DAGCombiner::visitVSELECT() method. It was not as simple as one would have hoped to detect these cases. I am not sure at all if this code is acceptable. For instance, can it be assumed that a BUILD_VECTOR used by VSELECT is always using the "normal" elements order from the vector(s) produced by SETCC(s)? Or should this (and other things) be verified?

For the second problem, of the unneeded sign extension, I added a handling for this in DAGCombiner::visitSIGN_EXTEND_INREG(). I am with this code saying that it is always true that a vector of i1's is always correct if the boolean contents is ZeroOrNegativeOneBooleanContent. So there is never a need to fix this up with a sign extension from i1. I am however neither sure here if i1 really can be treated like this. It makes sense in the examples I have worked with.

This is a first attempt, and I am not sure what the best way to improve on this is. Would it for instance be worth trying SELECT_CC instead for this target? I even tried to return TypeWidenVector in getPreferredVectorAction() for vectors of i1, but that didn't work at all in this case.

Feedback on this much appreciated.

https://reviews.llvm.org/D29489

Files:
  lib/CodeGen/SelectionDAG/DAGCombiner.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D29489.86951.patch
Type: text/x-patch
Size: 4295 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170203/3ce11914/attachment.bin>