<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none"><!--P{margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Folks, we encountered a problem: for vmull_high_p64 intrinsic there was not generated PMULL2 instruction.<br>
This happened because the vmull_high_p64 is implemented through vmull_p64:<br>
<br>
<span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">arm_neon.h:</span></span><br style="font-family: "Courier New", monospace;">
<span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">__ai poly128_t vmull_high_p64(poly64x2_t __p0, poly64x2_t __p1)</span></span><span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> {
</span><br style="font-family: "Courier New", monospace;">
</span></p>
<p><span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> poly128_t __ret;
</span><br style="font-family: "Courier New", monospace;">
</span></p>
<p><span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> __ret = vmull_p64((poly64_t)(vget_high_p64(__p0)), (poly64_t)(vget_high_p64(__p1)));
</span><br style="font-family: "Courier New", monospace;">
</span></p>
<p><span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;"> return __ret;
</span><br style="font-family: "Courier New", monospace;">
</span></p>
<p><span style="font-family: "Courier New", monospace;"><span style="font-family: "Courier New", monospace;">}
</span></span></p>
<p><span style="font-family: "Courier New", monospace;">__ai poly128_t vmull_p64(poly64_t __p0, poly64_t __p1)</span><span style="font-family: "Courier New", monospace;"> {
<br>
</span></p>
<div><span style="font-family: "Courier New", monospace;"> poly128_t __ret; <br>
</span></div>
<div><span style="font-family: "Courier New", monospace;"> __ret = (poly128_t) __builtin_neon_vmull_p64(__p0, __p1);
<br>
</span></div>
<div><span style="font-family: "Courier New", monospace;"> return __ret; <br>
</span></div>
<div><span style="font-family: "Courier New", monospace;">} </span></div>
<p><span style="font-family: "Courier New", monospace;"></span><br>
There also exist pattern to convert this into PMULL2:<br>
<br>
<span style="font-family: "Courier New", monospace;">def : Pat<(int_aarch64_neon_pmull64 (extractelt (v2i64 V128:$Rn), (i64 1)),</span><br style="font-family: "Courier New", monospace;">
<span style="font-family: "Courier New", monospace;">(extractelt (v2i64 V128:$Rm), (i64 1))),</span><br style="font-family: "Courier New", monospace;">
<span style="font-family: "Courier New", monospace;">(PMULLv2i64 V128:$Rn, V128:$Rm)>;</span><br>
<br>
The problem is that ISel apply that pattern only when corresponding IR is inside basic block.<br>
Some optimizations could hoist extraction operators out of current basic block(Loop invariant code motion).<br>
In the result PMULL2 is not used.<br>
<br>
GlobalISel could resolve that problem. But it does not handle this pattern yet and switched on by default for -O0 only.<br>
Another alternative to have PMULL2 is to create specific builtin for vmull_high_p64 intrinsic.</p>
<p><br>
Would it be OK to add extra builtin for vmull_high_p64 intrinsic to resolve this problem(</p>
<p><span style="font-family: "Courier New", monospace;">__builtin_neon_vmull_high_p64</span>/llvm.aarch64.neon.pmull_high_64) ?</p>
<p><br>
</p>
<p>Thank you, Alexey.<br>
</p>
<p><br>
</p>
<div id="Signature">
<div name="divtagdefaultwrapper" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:; margin:0">
<br>
<font size="2"><span style="font-size:11.0pt; font-family:"Trebuchet MS",sans-serif; color:black"></span></font></div>
</div>
</body>
</html>