<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>It sounds like one of the IR-level optimization breaks when lifetime markers are added. Do you know which optimization it may be ? I see the name "SROA" in the _correct_ version of the IR. Is there another optimization that may mess with vector loads ? </div><br><div><div>On Mar 25, 2013, at 11:27 AM, Ulrich Weigand <<a href="mailto:Ulrich.Weigand@de.ibm.com">Ulrich.Weigand@de.ibm.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Nadav Rotem <<a href="mailto:nrotem@apple.com">nrotem@apple.com</a>> wrote on 25.03.2013 19:22:00:<br><br><blockquote type="cite">Thanks for looking at this failure. Does the IR look correct ?<br></blockquote><br>No ... the arguments to printf are reversed already in the IR,<br>as output via clang -S -emit-llvm.<br><br>Before the change we had:<br><br><blockquote type="cite">- %sumV.sroa.0.0.vec.extract = extractelement <8 x double> %add, i32 0<br>- %sumV.sroa.0.8.vec.extract = extractelement <8 x double> %add, i32 1<br>- %sumV.sroa.0.16.vec.extract = extractelement <8 x double> %add, i32 2<br>- %sumV.sroa.0.24.vec.extract = extractelement <8 x double> %add, i32 3<br>- %sumV.sroa.0.32.vec.extract = extractelement <8 x double> %add, i32 4<br>- %sumV.sroa.0.40.vec.extract = extractelement <8 x double> %add, i32 5<br>- %sumV.sroa.0.48.vec.extract = extractelement <8 x double> %add, i32 6<br>- %sumV.sroa.0.56.vec.extract = extractelement <8 x double> %add, i32 7<br>- %call.i = tail call signext i32 (i8*, ...)* @printf(i8* getelementptr<br>inbounds ([25 x i8]* @.str, i64 0, i64 0), double<br>%sumV.sroa.0.0.vec.extract, double %sumV.s<br>roa.0.8.vec.extract, double %sumV.sroa.0.16.vec.extract, double<br>%sumV.sroa.0.24.vec.extract, double %sumV.sroa.0.32.vec.extract, double<br>%sumV.sroa.0.40.vec.extract,<br>double %sumV.sroa.0.48.vec.extract, double %sumV.sroa.0.56.vec.extract)<br></blockquote>#1<br><br>so the first printf element is extractelement <8 x double> %add, i32 0<br><br><br>After the change we have:<br><br><blockquote type="cite">+ %bitcast = extractelement <8 x double> %add, i32 7<br>+ %11 = extractelement <8 x double> %add, i32 6<br>+ %12 = extractelement <8 x double> %add, i32 5<br>+ %13 = extractelement <8 x double> %add, i32 4<br>+ %14 = extractelement <8 x double> %add, i32 3<br>+ %15 = extractelement <8 x double> %add, i32 2<br>+ %16 = extractelement <8 x double> %add, i32 1<br>+ %17 = extractelement <8 x double> %add, i32 0<br>+ %call.i = call signext i32 (i8*, ...)* @printf(i8* getelementptr<br>inbounds ([25 x i8]* @.str, i64 0, i64 0), double %bitcast, double %11,<br>double %12, double %13, d<br>ouble %14, double %15, double %16, double %17) #1<br></blockquote><br>so the first printf element is extractelement <8 x double> %add, i32 7<br><br><br>The PowerPC assembler output in both cases correctly reflects the IR.<br><br>Bye,<br>Ulrich</div></blockquote></div><br></body></html>