r177819 - Make clang to mark static stack allocations with lifetime markers to enable a more aggressive stack coloring.

Mon Mar 25 12:58:31 PDT 2013

Ulrich Weigand/Germany/IBM wrote on 25.03.2013 20:31:20:

> However, after the next "Combine redundant instructions" pass, we get:
>
> for.end9:                                         ; preds = %for.body4
>   %bitcast = extractelement <8 x double> %add, i32 7
>   %2 = extractelement <8 x double> %add, i32 6
>   %3 = extractelement <8 x double> %add, i32 5
>   %4 = extractelement <8 x double> %add, i32 4
>   %5 = extractelement <8 x double> %add, i32 3
>   %6 = extractelement <8 x double> %add, i32 2
>   %7 = extractelement <8 x double> %add, i32 1
>   %8 = extractelement <8 x double> %add, i32 0
>   %call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr
> inbounds ([25 x i8]* @.str, i64 0, i64 0), double %bitcast, double %
> 2, double %3, double %4, double %5, double %6, double %7, double %8) #1
>
> which looks incorrect to me; those extractelement operations would
> have been equivalent to the above shifts and truncations on a
> little-endian machine, but not a big-endian one.

Right, it seems the bug is in InstCombineCasts.cpp:
OptimizeIntToFloatBitCast

  // bitcast(trunc(lshr(bitcast(somevector), cst))
  ConstantInt *ShAmt = 0;
  if (match(Src, m_Trunc(m_LShr(m_BitCast(m_Value(VecInput)),
                                m_ConstantInt(ShAmt)))) &&
      isa<VectorType>(VecInput->getType())) {
    VectorType *VecTy = cast<VectorType>(VecInput->getType());
    unsigned DestWidth = DestTy->getPrimitiveSizeInBits();
    if (VecTy->getPrimitiveSizeInBits() % DestWidth == 0 &&
        ShAmt->getZExtValue() % DestWidth == 0) {
      // If the element type of the vector doesn't match the result type,
      // bitcast it to be a vector type we can extract from.
      if (VecTy->getElementType() != DestTy) {
        VecTy = VectorType::get(DestTy,
                                VecTy->getPrimitiveSizeInBits() /
DestWidth);
        VecInput = IC.Builder->CreateBitCast(VecInput, VecTy);
      }

      unsigned Elt = ShAmt->getZExtValue() / DestWidth;
      return ExtractElementInst::Create(VecInput, IC.Builder->getInt32
(Elt));
    }
  }

The computation of "Elt" above looks to be correct only for little-endian
platforms ...

If I disable the whole OptimizeIntToFloatBitCast optimization, I get the
correct result again.

Chris, this optimization came in via your commits:

------------------------------------------------------------------------
r112232 | lattner | 2010-08-26 17:14:59 -0500 (Thu, 26 Aug 2010) | 5 lines

optimize "integer extraction out of the middle of a vector" as produced
by SRoA.  This is part of rdar://7892780, but needs another xform to
expose this.

------------------------------------------------------------------------
r112227 | lattner | 2010-08-26 16:55:42 -0500 (Thu, 26 Aug 2010) | 43 lines

optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x'
is a vector to be a vector element extraction.

Would you agree that this is incorrect on big-endian machines, or am I
missing something here?

Bye,
Ulrich