<div dir="ltr"><div>Thank you!</div><div>Adding `-aarch64-sve-vector-bits-min=512` does solve the problem:<br></div><div><a href="https://godbolt.org/z/hYv4dePx6">https://godbolt.org/z/hYv4dePx6</a></div><div>E.g., now instead of 4 fmla with neon `v` registers:</div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:"Consolas, ""><div><span style="color:rgb(0,0,0)">  </span><span style="color:rgb(0,0,255)">fmla</span><span style="color:rgb(0,0,0)"> </span><span style="color:rgb(0,128,128)">v1.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v19.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v7.2d</span></div><div><span style="color:rgb(0,0,0)">  </span><span style="color:rgb(0,0,255)">fmla</span><span style="color:rgb(0,0,0)"> </span><span style="color:rgb(0,128,128)">v0.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v18.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v6.2d</span></div><div><span style="color:rgb(0,0,0)">  </span><span style="color:rgb(0,0,255)">fmla</span><span style="color:rgb(0,0,0)"> </span><span style="color:rgb(0,128,128)">v2.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v17.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v5.2d</span></div><div><span style="color:rgb(0,0,0)">  </span><span style="color:rgb(0,0,255)">fmla</span><span style="color:rgb(0,0,0)"> </span><span style="color:rgb(0,128,128)">v3.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v16.2d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">v4.2d</span></div><div><span style="color:rgb(0,128,128)"><br></span></div><div><span style="color:rgb(0,128,128)"></span>There is just a single fmla with a sve `z` register:</div></div></div><div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:"Consolas, ""><div><span style="color:rgb(0,0,0)">  </span><span style="color:rgb(0,0,255)">fmla</span><span style="color:rgb(0,0,0)"> </span><span style="color:rgb(0,128,128)">z0.d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">p0</span><span style="color:rgb(0,0,0)">/</span><span style="color:rgb(0,128,128)">m</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">z1.d</span><span style="color:rgb(0,0,0)">, </span><span style="color:rgb(0,128,128)">z2.d</span></div></div></div></div><div><br></div><div>Would it be possible to get a `-aarch64-sve-vector-bits=native` flag?</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 5, 2021 at 1:55 AM Craig Topper <<a href="mailto:craig.topper@gmail.com">craig.topper@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Chris,<div><br></div><div>I'm no an expert on SVE at all, but you might try the "aarch64-sve-vector-bits-min" that was introduced in this patch <a href="https://reviews.llvm.org/D80384" target="_blank">https://reviews.llvm.org/D80384</a>  Myself and others have been basing a similar feature for RISC-V off of this.</div><div><br clear="all"><div><div dir="ltr">~Craig</div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Apr 4, 2021 at 9:56 PM Chris Elrod via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Would it be possible to support generating CPU-specific SVE code?</div><div>This could be useful for JIT, e.g. Julia.</div><div><br></div><div>Currently, when using `-mcpu=a64fx`, `<8 x double>` gets split into 4 NEON instructions:</div><div><a href="https://godbolt.org/z/cEf1Pfvx8" target="_blank">https://godbolt.org/z/cEf1Pfvx8</a></div><div>If I understand correctly, I'd need to use `<vscale x 2 x double>` to actually generate SVE code. However, Julia currently has no way of representing such variable sized types without allocating to the heap -- awkward for a variable that's supposed to live in the registers! -- for writing intrinsics. Some libraries make extensive use of intrinsics operating on vector types like (`<8 x double>`) for defining compute kernels, and as is they are incompatible with SVE.</div><div><br></div><div><br></div></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><a href="https://github.com/chriselrod?tab=repositories" target="_blank">https://github.com/chriselrod?tab=repositories</a></div><div><a href="https://www.linkedin.com/in/chris-elrod-9720391a/" target="_blank">https://www.linkedin.com/in/chris-elrod-9720391a/</a><br></div></div></div>