[llvm-commits] [PATCH] UINT_TO_FP of vectors

Wed Mar 16 14:00:49 PDT 2011

Hi Nadav,

On 03/16/2011 04:37 PM, Rotem, Nadav wrote:
> Hi,
>
> I attached a patch for legalizing UINT_TO_FP of vectors on platforms
> which do not have this operation (such as X86). The legalized code uses
> the vector INT_TO_FP operations and is faster than scalarizing.
>
[snip]
> SDValue VectorLegalizer::ExpandUINT_TO_FLOAT(SDValue Op) {
> +
> +
> +  EVT VT = Op.getOperand(0).getValueType();
> +  DebugLoc DL = Op.getDebugLoc();
> +
> +  // Make sure that the SINT_TO_FP and SRL instructions are available.
> +  if (!TLI.isOperationLegalOrCustom(ISD::SINT_TO_FP, VT) ||
> +      !TLI.isOperationLegalOrCustom(ISD::SRL, VT))
> +      return DAG.UnrollVectorOp(Op.getNode());
> +
> + EVT SVT = VT.getScalarType();
> +  assert((SVT.getSizeInBits() == 64 || SVT.getSizeInBits() == 32) &&
> +      "Elements in vector-UINT_TO_FP must be 32 or 64 bits wide");
> +
> +  unsigned BW = SVT.getSizeInBits();
> +  SDValue HalfWord = DAG.getConstant(BW/2, VT);
> +
> +  // Constants to clear the upper part of the word.
> +  // Notice that we can also use SHL+SHR, but using a constant is slightly
> +  // faster on x86.
> +  uint64_t HWMask = (SVT.getSizeInBits()==64)?0x00000000FFFFFFFF:0x0000FFFF;
> +  SDValue HalfWordMask = DAG.getConstant(HWMask, VT);
> +
> +  // Two to the power of half-word-size.
> +  SDValue TWOHW = DAG.getConstantFP((1<<(BW/2)), Op.getValueType());
> +
> +  // Clear upper part of LO, lower HI
> +  SDValue HI = DAG.getNode(ISD::SRL, DL, VT, Op.getOperand(0), HalfWord);
> +  SDValue LO = DAG.getNode(ISD::AND, DL, VT, Op.getOperand(0), HalfWordMask);
> +
> +  // Convert hi and lo to floats
> +  // Convert the hi part back to the upper values
> +  SDValue fHI = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), HI);
> +          fHI = DAG.getNode(ISD::FMUL, DL, Op.getValueType(), fHI, TWOHW);
> +  SDValue fLO = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), LO);
> +
> +  // Add the two halves
> +  return DAG.getNode(ISD::FADD, DL, Op.getValueType(), fHI, fLO);
> +}
> +

thanks for working on this, but your code seems suboptimal to me. If I'm 
not mistaken, you should be able to turn
    UINT_TO_FP(a) into SINT_TO_FP(a & ~SIGNBIT) - SINT_TO_FP(a & SIGNBIT)
which gets rid of one floating point multiplication, and replaces one
shift by an AND, but at the cost of one extra vector constant. In 
theory, using PANDN on x86, one memory load should be enough, but 
well... What do you think?

Dirk