r177819 - Make clang to mark static stack allocations with lifetime markers to enable a more aggressive stack coloring.
Ulrich Weigand
Ulrich.Weigand at de.ibm.com
Mon Mar 25 12:58:31 PDT 2013
Ulrich Weigand/Germany/IBM wrote on 25.03.2013 20:31:20:
> However, after the next "Combine redundant instructions" pass, we get:
>
> for.end9: ; preds = %for.body4
> %bitcast = extractelement <8 x double> %add, i32 7
> %2 = extractelement <8 x double> %add, i32 6
> %3 = extractelement <8 x double> %add, i32 5
> %4 = extractelement <8 x double> %add, i32 4
> %5 = extractelement <8 x double> %add, i32 3
> %6 = extractelement <8 x double> %add, i32 2
> %7 = extractelement <8 x double> %add, i32 1
> %8 = extractelement <8 x double> %add, i32 0
> %call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr
> inbounds ([25 x i8]* @.str, i64 0, i64 0), double %bitcast, double %
> 2, double %3, double %4, double %5, double %6, double %7, double %8) #1
>
> which looks incorrect to me; those extractelement operations would
> have been equivalent to the above shifts and truncations on a
> little-endian machine, but not a big-endian one.
Right, it seems the bug is in InstCombineCasts.cpp:
OptimizeIntToFloatBitCast
// bitcast(trunc(lshr(bitcast(somevector), cst))
ConstantInt *ShAmt = 0;
if (match(Src, m_Trunc(m_LShr(m_BitCast(m_Value(VecInput)),
m_ConstantInt(ShAmt)))) &&
isa<VectorType>(VecInput->getType())) {
VectorType *VecTy = cast<VectorType>(VecInput->getType());
unsigned DestWidth = DestTy->getPrimitiveSizeInBits();
if (VecTy->getPrimitiveSizeInBits() % DestWidth == 0 &&
ShAmt->getZExtValue() % DestWidth == 0) {
// If the element type of the vector doesn't match the result type,
// bitcast it to be a vector type we can extract from.
if (VecTy->getElementType() != DestTy) {
VecTy = VectorType::get(DestTy,
VecTy->getPrimitiveSizeInBits() /
DestWidth);
VecInput = IC.Builder->CreateBitCast(VecInput, VecTy);
}
unsigned Elt = ShAmt->getZExtValue() / DestWidth;
return ExtractElementInst::Create(VecInput, IC.Builder->getInt32
(Elt));
}
}
The computation of "Elt" above looks to be correct only for little-endian
platforms ...
If I disable the whole OptimizeIntToFloatBitCast optimization, I get the
correct result again.
Chris, this optimization came in via your commits:
------------------------------------------------------------------------
r112232 | lattner | 2010-08-26 17:14:59 -0500 (Thu, 26 Aug 2010) | 5 lines
optimize "integer extraction out of the middle of a vector" as produced
by SRoA. This is part of rdar://7892780, but needs another xform to
expose this.
------------------------------------------------------------------------
r112227 | lattner | 2010-08-26 16:55:42 -0500 (Thu, 26 Aug 2010) | 43 lines
optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x'
is a vector to be a vector element extraction.
Would you agree that this is incorrect on big-endian machines, or am I
missing something here?
Bye,
Ulrich
More information about the cfe-commits
mailing list