<div dir="ltr"><div>Tried a bunch of them there (x86-64, haswell, znver2) and they all defaulted to 4-wide - haswell additionally caused some extra loop unrolling but still with 8-wide pows.</div><div><br></div><div>Cheers,</div><div>-Neil.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 16, 2020 at 2:39 PM Roman Lebedev <<a href="mailto:lebedev.ri@gmail.com">lebedev.ri@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>Did you specify the target CPU the code should be optimized for?<br></div><div>For clang that is -march=native/znver2/... / -mtune=<same></div><div>For opt/llc that is <span style="color:rgb(0,0,0);font-family:monospace">--mcpu=<same></span></div></div>I would expect that by default, some generic baseline is picked.<div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 16, 2020 at 4:25 PM Neil Henning via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hey list,</div><div><br></div><div>I've recently done the first test run of bumping our Burst compiler from LLVM 10 -> 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the ymm registers.</div><div><br></div><div><span style="font-family:monospace"><font size="1">; ModuleID = '../test.ll'<br>source_filename = "main"<br>target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"<br>target triple = "x86_64-pc-windows-msvc-coff"<br><br>%"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0" = type { float*, i32, [4 x i8] }<br><br>; Function Attrs: nofree<br>define dllexport void @func(float* noalias nocapture %output, %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0"* nocapture nonnull readonly dereferenceable(16) %a, %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0"* nocapture nonnull readonly dereferenceable(16) %b) local_unnamed_addr #0 !ubaa. !1 {<br>entry:<br> %0 = getelementptr %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0", %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0"* %a, i64 0, i32 1<br> %1 = load i32, i32* %0, align 1<br> %.not = icmp eq i32 %1, 0<br> br i1 %.not, label %BL.0042, label %<a href="http://BL.0005.lr.ph" target="_blank">BL.0005.lr.ph</a><br><br><a href="http://BL.0005.lr.ph" target="_blank">BL.0005.lr.ph</a>: ; preds = %entry<br> %2 = bitcast %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0"* %a to i8**<br> %3 = load i8*, i8** %2, align 1<br> %4 = bitcast %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0"* %b to i8**<br> %5 = load i8*, i8** %4, align 1<br> %wide.trip.count = zext i32 %1 to i64<br> br label %BL.0005<br><br>BL.0005: ; preds = %BL.0005, %<a href="http://BL.0005.lr.ph" target="_blank">BL.0005.lr.ph</a><br> %indvars.iv = phi i64 [ 0, %<a href="http://BL.0005.lr.ph" target="_blank">BL.0005.lr.ph</a> ], [ %indvars.iv.next, %BL.0005 ]<br> %6 = shl nuw nsw i64 %indvars.iv, 2<br> %7 = getelementptr float, float* %output, i64 %indvars.iv<br> %8 = getelementptr i8, i8* %3, i64 %6<br> %9 = bitcast i8* %8 to float*<br> %10 = load float, float* %9, align 4<br> %11 = getelementptr i8, i8* %5, i64 %6<br> %12 = bitcast i8* %11 to float*<br> %13 = load float, float* %12, align 4<br> %14 = tail call float @llvm.pow.f32(float %10, float %13)<br> store float %14, float* %7, align 4<br> %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br> %exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count<br> br i1 %exitcond.not, label %BL.0042, label %BL.0005<br><br>BL.0042: ; preds = %BL.0005, %entry<br> ret void<br>}<br><br>; Function Attrs: norecurse readnone<br>define dllexport void @burst.initialize(i8* (i8*)* nocapture readnone %callback) local_unnamed_addr #1 !ubaa. !0 {<br>entry:<br> ret void<br>}<br><br>; Function Attrs: nounwind readnone speculatable willreturn<br>declare float @llvm.pow.f32(float, float) #2<br><br>attributes #0 = { nofree }<br>attributes #1 = { norecurse readnone }<br>attributes #2 = { nounwind readnone speculatable willreturn }<br><br>!ubaa.Burst.Compiler.IL.Tests.VectorsMaths\2FFloatPointer.0 = !{!0, !0, !0, !0}<br><br>!0 = !{i1 false}<br>!1 = !{i1 true, i1 false, i1 false}</font></span></div><div><span style="font-family:monospace"><font size="1"><br></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif">If I run this with ../llvm-project/llvm/build/bin/opt.exe -o - -S -O3 ../avx_sad_4.ll -mattr=avx -debug, I can see that the loop vectorizer correctly considers using 8-wide ymm registers for this, but has decided that the 4-wide variant is cheaper based on some cost modelling I don't understand.</font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif"><br></font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif">So is this expected behaviour? I know there was some cost model changes in the 10->11 timeframe.<br></font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif"><br></font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif">Thanks for any help,</font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif"><br></font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif">Cheers,</font></font></font></span></div><div><span style="font-family:monospace"><font size="1"><font size="2"><font face="arial,sans-serif">-Neil.</font></font></font></span></div></div></blockquote><div>Roman</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>-- <br><div dir="ltr"><div dir="ltr"><table style="border-collapse:collapse;border-spacing:0px;color:rgb(90,90,91);font-size:13px;margin:0px 0px 20px;padding:0px" width="100%" cellspacing="0" cellpadding="0" border="0"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px 0px 20px;vertical-align:top" align="left"><table style="border-collapse:collapse;border-spacing:0px;margin:0px;padding:0px" cellspacing="0" cellpadding="0" border="0" align="left"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:1.12em;line-height:1.5em;padding:0px;vertical-align:top;width:64px"><img style="border: medium none; border-radius: 0px; display: block; font-size: 13px; height: auto; line-height: 100%; margin: 0px; max-width: 100%; outline-style: none; outline-width: medium; padding: 20px 0px 0px; width: 100%;" alt="" src="https://unity3d.com/profiles/unity3d/themes/unity/images/ui/other/unity-logo-dark-email.png" width="64" height="auto"></td></tr></tbody></table></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:14px;font-weight:600;line-height:15px;margin:0px;padding:0px">Neil Henning</div></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:14px;line-height:15px;margin:0px;padding:0px 0px 10px">Senior Software Engineer Compiler</div></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:12px;line-height:15px;margin:0px;padding:0px"><a href="http://unity.com" target="_blank">unity.com</a></div></td></tr></tbody></table></div></div></div></div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><table style="border-collapse:collapse;border-spacing:0px;color:rgb(90,90,91);font-size:13px;margin:0px 0px 20px;padding:0px" width="100%" cellspacing="0" cellpadding="0" border="0"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px 0px 20px;vertical-align:top" align="left"><table style="border-collapse:collapse;border-spacing:0px;margin:0px;padding:0px" cellspacing="0" cellpadding="0" border="0" align="left"><tbody style="margin:0px;padding:0px"><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:1.12em;line-height:1.5em;padding:0px;vertical-align:top;width:64px"><img style="border: medium none; border-radius: 0px; display: block; font-size: 13px; height: auto; line-height: 100%; margin: 0px; max-width: 100%; outline-style: none; outline-width: medium; padding: 20px 0px 0px; width: 100%;" alt="" src="https://unity3d.com/profiles/unity3d/themes/unity/images/ui/other/unity-logo-dark-email.png" width="64" height="auto"></td></tr></tbody></table></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:14px;font-weight:600;line-height:15px;margin:0px;padding:0px">Neil Henning</div></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:14px;line-height:15px;margin:0px;padding:0px 0px 10px">Senior Software Engineer Compiler</div></td></tr><tr style="margin:0px;padding:0px"><td style="border-collapse:collapse;font-size:0px;line-height:1.5em;padding:0px;vertical-align:top" align="left"><div style="color:rgb(0,0,0);font-family:Roboto,Arial;font-size:12px;line-height:15px;margin:0px;padding:0px"><a href="http://unity.com" target="_blank">unity.com</a></div></td></tr></tbody></table></div></div>