r177819 - Make clang to mark static stack allocations with lifetime markers to enable a more aggressive stack coloring.
Ulrich Weigand
Ulrich.Weigand at de.ibm.com
Mon Mar 25 12:31:20 PDT 2013
Nadav Rotem <nrotem at apple.com> wrote on 25.03.2013 20:08:34:
> -mllvm -print-after-all should work with debug builds of clang.
OK, thanks. Using that, I see after function inlining:
%V = getelementptr inbounds %union.D8V* %sumV, i64 0, i32 0
store <8 x double> %sum.0, <8 x double>* %V, align 64, !tbaa !1
%A.i = bitcast %union.D8V* %sumV to [8 x double]*
%arrayidx.i = getelementptr inbounds %union.D8V* %sumV, i64 0, i32 0, i64
0
%2 = load double* %arrayidx.i, align 8, !tbaa !0
%arrayidx2.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 1
%3 = load double* %arrayidx2.i, align 8, !tbaa !0
%arrayidx4.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 2
%4 = load double* %arrayidx4.i, align 8, !tbaa !0
%arrayidx6.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 3
%5 = load double* %arrayidx6.i, align 8, !tbaa !0
%arrayidx8.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 4
%6 = load double* %arrayidx8.i, align 8, !tbaa !0
%arrayidx10.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 5
%7 = load double* %arrayidx10.i, align 8, !tbaa !0
%arrayidx12.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 6
%8 = load double* %arrayidx12.i, align 8, !tbaa !0
%arrayidx14.i = getelementptr inbounds [8 x double]* %A.i, i64 0, i64 7
%9 = load double* %arrayidx14.i, align 8, !tbaa !0
%call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([25 x i8]* @.str, i64 0, i64 0), double %2, double %3, double %4, double
%5, double %6, double %7, double %8, double %9) #1
call void @llvm.lifetime.end(i64 64, i8* %0) #1
which looks correct to me.
This stays unchanged until SROA, after which I get:
store <8 x double> %sum.0, <8 x double>* %sumV.sroa.0
%sumV.sroa.0.0.idx35 = getelementptr inbounds <8 x double>* %sumV.sroa.0,
i64 0, i64 0
%sumV.sroa.0.0.load36 = load double* %sumV.sroa.0.0.idx35, align 64
%sumV.sroa.0.8.idx21 = getelementptr inbounds <8 x double>* %sumV.sroa.0,
i64 0, i64 1
%sumV.sroa.0.8.load22 = load double* %sumV.sroa.0.8.idx21
%sumV.sroa.0.16.idx23 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 2
%sumV.sroa.0.16.load24 = load double* %sumV.sroa.0.16.idx23, align 16
%sumV.sroa.0.24.idx25 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 3
%sumV.sroa.0.24.load26 = load double* %sumV.sroa.0.24.idx25
%sumV.sroa.0.32.idx27 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 4
%sumV.sroa.0.32.load28 = load double* %sumV.sroa.0.32.idx27, align 32
%sumV.sroa.0.40.idx29 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 5
%sumV.sroa.0.40.load30 = load double* %sumV.sroa.0.40.idx29
%sumV.sroa.0.48.idx31 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 6
%sumV.sroa.0.48.load32 = load double* %sumV.sroa.0.48.idx31, align 16
%sumV.sroa.0.56.idx33 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 7
%sumV.sroa.0.56.load34 = load double* %sumV.sroa.0.56.idx33
%call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([25 x i8]* @.str, i64 0, i64 0), double %sumV.sroa.0.0.load36, double
%sumV.sroa.0.8.load22, double %sumV.sroa.0.16.load24, double
%sumV.sroa.0.24.load26, double %sumV.sroa.0.32.load28, double
%sumV.sroa.0.40.load30, double %sumV.sroa.0.48.load32, double
%sumV.sroa.0.56.load34) #1
%sumV.sroa.0.0.cast20 = bitcast <8 x double>* %sumV.sroa.0 to i8*
call void @llvm.lifetime.end(i64 64, i8* %sumV.sroa.0.0.cast20)
which still looks correct.
The next significant change happens after Global Value Numbering:
store <8 x double> %add, <8 x double>* %sumV.sroa.0, align 64
%sumV.sroa.0.0.idx35 = getelementptr inbounds <8 x double>* %sumV.sroa.0,
i64 0, i64 0
%2 = bitcast <8 x double> %add to i512
%tmp = lshr i512 %2, 448
%trunc = trunc i512 %tmp to i64
%bitcast = bitcast i64 %trunc to double
%sumV.sroa.0.8.idx21 = getelementptr inbounds <8 x double>* %sumV.sroa.0,
i64 0, i64 1
%3 = lshr i512 %2, 384
%4 = trunc i512 %3 to i64
%5 = bitcast i64 %4 to double
%sumV.sroa.0.16.idx23 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 2
%6 = lshr i512 %2, 320
%7 = trunc i512 %6 to i64
%8 = bitcast i64 %7 to double
%sumV.sroa.0.24.idx25 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 3
%9 = lshr i512 %2, 256
%10 = trunc i512 %9 to i64
%11 = bitcast i64 %10 to double
%sumV.sroa.0.32.idx27 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 4
%12 = lshr i512 %2, 192
%13 = trunc i512 %12 to i64
%14 = bitcast i64 %13 to double
%sumV.sroa.0.40.idx29 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 5
%15 = lshr i512 %2, 128
%16 = trunc i512 %15 to i64
%17 = bitcast i64 %16 to double
%sumV.sroa.0.48.idx31 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 6
%18 = lshr i512 %2, 64
%19 = trunc i512 %18 to i64
%20 = bitcast i64 %19 to double
%sumV.sroa.0.56.idx33 = getelementptr inbounds <8 x double>*
%sumV.sroa.0, i64 0, i64 7
%21 = trunc i512 %2 to i64
%22 = bitcast i64 %21 to double
%call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([25 x i8]* @.str, i64 0, i64 0), double %bitcast, double %5, double %8,
double %11, double %14, double %17, double %20, double %22) #1
call void @llvm.lifetime.end(i64 64, i8* %sumV.sroa.0.0.cast19)
which still looks good (treating the array of 8 doubles as an i512, and
computing the various shift counts as appropriate for a big-endian
platform).
However, after the next "Combine redundant instructions" pass, we get:
for.end9: ; preds = %for.body4
%bitcast = extractelement <8 x double> %add, i32 7
%2 = extractelement <8 x double> %add, i32 6
%3 = extractelement <8 x double> %add, i32 5
%4 = extractelement <8 x double> %add, i32 4
%5 = extractelement <8 x double> %add, i32 3
%6 = extractelement <8 x double> %add, i32 2
%7 = extractelement <8 x double> %add, i32 1
%8 = extractelement <8 x double> %add, i32 0
%call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([25 x i8]* @.str, i64 0, i64 0), double %bitcast, double %2, double %3,
double %4, double %5, double %6, double %7, double %8) #1
which looks incorrect to me; those extractelement operations would have
been equivalent to the above shifts and truncations on a little-endian
machine, but not a big-endian one.
So there seems to be an endian bug somewhere; it's still unclear to me what
this has to do with the lifetime markers, however ...
Bye,
Ulrich
More information about the cfe-commits
mailing list