[PATCH, PowerPC] Fix and improve vector comparisons

Ulrich Weigand Ulrich.Weigand at de.ibm.com
Fri Aug 1 09:06:18 PDT 2014



Hello,

I've run into a code-gen bug related to vector comparisons in the JIT code
generated by mesa/llvmpipe on PowerPC.   The problem happens when expanding
a "fcmp uge <4 x float>" instruction (which you don't usually get from C/C+
+ source code).  This is currently passed to common code to expand, which
tests for unordered operands and then falls back to creating a "fcmp ge <4
x float>".  This is processed by the back-end, but incorrectly.

In getVCmpInst, we run into this case:

    case ISD::SETGE:
      if (VecVT == MVT::v16i8)
        return PPC::VCMPGTSB;
      else if (VecVT == MVT::v8i16)
        return PPC::VCMPGTSH;
      else if (VecVT == MVT::v4i32)
        return PPC::VCMPGTSW;
      else if (VecVT == MVT::v4f32)
        return HasVSX ? PPC::XVCMPGTSP : PPC::VCMPGTFP;

so we return VCMPGTFP.  But then in SelectSETCC, we have:

      case ISD::SETGE:
      case ISD::SETOGE:
      case ISD::SETUGE: {
        // Small optimization: Altivec provides a 'Vector Compare Greater
Than
        // or Equal To' instruction (vcmpgefp), so in this case there is no
        // need for extra logic for the equal compare.
        if (VecVT.getSimpleVT().isFloatingPoint()) {
          return CurDAG->SelectNodeTo(N, VCmpInst, VecVT, LHS, RHS);

So even though the comment says we want to emit a Compare Greater Than or
Equal To, we simply emit a Compare Greate Than.

Note that for the SETOGE case, getVCmpInst does indeed return PPC::VCMPGTFP
-- but not for SETGE.

Also, falling back to common code to expand SETUGE is inefficient anyway:
we could simply implement SETUGE by negating the result of a SETOLT, which
in turn is simply a SETOGT with swapped inputs.  This results in just two
instructions instead of 6 or 7 the common code expander produces.  (Similar
optimizations apply to SETULE, SETUGT, SETULT of course.)

When looking at this code, I noticed a number of other cases where code
quality could be improved as well: SETOLE can be implemented as SETOGE with
swapped inputs, and for all the integer comparisons, we can do SET(U)GE by
negating the result of a SET(U)LT instead of performing two separate
comparisons for EQ and GT.

The current structure of SelectCC / getVCmpInst makes it somewhat hard to
fix this bug and implement those enhancements, however.  The two issues
that make the code more complex than it ought to be in my opinion are:
getVCmpInst tries to handle floating-point and integer comparisons using
the same logic, even though they are quite different; and, both getVCmpInst
and its caller make some decisions independently, while implicitly relying
on the other side to make the same decision.

The attached patch fixes all this by moving all decision logic to
getVCmpInst, which gets two extra boolean outputs indicating to its caller
whether it needs to swap the input operands and/or negate the result of the
comparion.  In no case do we need to create two separate comparisons any
more.

(See attached file: diff-llvm-vcmp)

Does this look OK?


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU/Linux compilers and toolchain
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-vcmp
Type: application/octet-stream
Size: 18760 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140801/9141d986/attachment.obj>


More information about the llvm-commits mailing list