<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/160886>160886</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
aarch64 NEON vector constants are not hoisted out of loops when there's a lot of non-NEON code around them
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
hsivonen
</td>
</tr>
</table>
<pre>
# Summary
An `ldr` corresponding to materizing `const uint8x16_t ZERO_LT_AMP_CR = {0, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 4, 1, 1, 1};` is obviously always loop-invariant. Furthermore, when the whole function has fewer vertor values than there are vector registers, there cannot be too much register pressure on the _vector registers_, and such a constant should be hoisted to the top of the function.
When the function is small, LICM happens properly for such constants.
When the function is large, the `ldr` for the constant moves to immediately before use, and the load is from static memory.
In between these cases, there each constant is loaded at the top of the function and immediately spilled to the stack. Then the constant is reloaded from the stack immediately before use.
# Steps to reproduce
1. Download [nsHtml5TokenizerSIMD-expanded.cpp](https://github.com/user-attachments/files/22561274/nsHtml5TokenizerSIMD-expanded.cpp)
2. On Apple Silicon Mac using clang trunk, compile it with `clang++ -Wno-everything -isysroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.2.sdk" nsHtml5TokenizerSIMD-expanded.cpp -O3 -S`
3. Inspect `nsHtml5TokenizerSIMD-expanded.s`.
4. Comment out the line following the comment `// Commenting out the following line allows vector constant LICM!`
5. Recompile and inspect the assembly again.
# Actual results
On first compilation, the only `tbl.16b` instruction is preceded by `ldr q3, [sp, #32] ; 16-byte Folded Reload`.
On the second compilation, the `ldr` has moved upwards out of the loops.
# Expected results
Expected vector constants always to be hoisted out of loops when there aren't more live vector values than there are vector registers regardless of how much ALU code exists in the overall function.
# Additional info
I tried flipping the various boolean return values in `MachineLICM.cpp` to see if one of them was at fault, but I didn't find the cause.
Compiling on Mac differs from Compiler Explorer's armv8-a target. LLVM says it used the following on Mac:
Features:+aes,+altnzcv,+ccdp,+ccidx,+ccpp,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8a
CPU:apple-m1
TuneCPU:apple-m1
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyMVktz4zgO_jX0BWWVTPl58MGO27Vdm0y6Oj07U3tJUSRkcZoitSRlx_3rt0BJTjbTj83BAV_ghw_EB4kQ9MkibtlizxaHiehi7fy2DvrsLNpJ6dR1y3gBT13TCH9l-Y7lu50FtsyN8myZg3TeY2idVdqeIDpoRESvv9GILXPpbIjQaRvXL7Plc4R_f_j8-Hz_5Xn38On57jOw4gBstc8ZvwNOP7Pv_BQ_Wkg_8_dzqwMr9oRNB3DlWbsumCsIcxHXAMa5dqrtWXgtbMzg2PlYo2-cRzp9qdFCrBEutTMIVWdl1M5CLQJUeEEPZ_TReTgL02GAWIu03yMIj3BGSYseTzpE9IFc9qtSWOsilAjROWg6Wd92QesxhM4juP7u5_dunsmPsAoCnROQWBU2QqhdZxR5rR1tVZQBchFdC65K5hhD1mfvjzHCW2w6QGiEMXTJ_ce7B6hF26IN0HrXojdXqJzvrx4vDj9zZoQ_4RD5m5dCTmjmBr5xZ2LQgW4aVFpENFcosXIeoQs4xkxnjBOKXFfeNRCiiFpCg43z1wHIRwslxgv2cALxHfAN_SjeoE8gnVCoQMQf0ZXufosstNqYV4ZDFPJrBl9GBt469zi4T3hvu38Q6BBCqrOIbaLEY-ud6iT2a7MMDu5iEw1ssbfhH7Exiy_uK1r9Df3Tx4fDFF9aYRWqTLYtWxwYX9cxtoEVO8aPjB9POtZdmUnXMH7sAvqpiFHIukEbA-PHShti7Mj5Yjnjqznjx1_fwzcs3_EMHi3s2tYgPGmjpbPwICR0gVRAGkHK4Dv7lfIhXdNqg6AjXHSsk0jQDsb3jO9h-od1Uzyjv8aaTk91uAbvXATGOeNHukVLQRkisH9KpzAThOR452wcYjngGQ09XsaPn4yIlfMNzT8I-fj0Z9YOU-92Ph3--bpptsh4FtRXxjn8kgeYPhYwfWLLnOW7IoOPNrQoIwX387OBLXPK_zyDO9dQKsB1_Zs02iJUzhh3SdKa3li_he5JOR0P0Ybx3OuR5EHQKIzKdHulVOiMz3rEiww-45iY9O4H_ORPhIBNSQp6Etq-eaw7GTthwGPoTAz9_KOFSvsQhzSnPI1S4Ky5EvRYmmy2LJNC2xB9d1OO1qNEqpvyOgpHvvlPkn-22Ic2GbwoOFsc4O9_rNjDbDktrxHh6Aw5-pwqcSC5x5fKEaWz6nsgX_WKJJ8kSkHXXoRXITE8qAS1kfCGiw8vxBeq_2XjNvuO_TB2o-jeavfgP_m-daK-s1jGV6SYnt7F-dZo_r8uRJbwymAI5L92l74B7e5_B6ofwBcdYgDdk-PO6IUx7xtHSrlSmqaEAW0rN2gvRK9J7Ixu2_GpUoN1XYDSOYPCgsfYeTsC1ukT4kHIWlukp5jEZJkTIQERdAXO4kB2AxcRSKkr0ZlIiSq7CB9BadXTUumhTUjxKqd3KbepMHo1UrqqiIykyf0qekqccZ7KfxVA-Oa8ngqI1MFiBvf3_3qAQInSkZRavSuw3jMpbLryiCJ2HnvJ3YvUf8gw0X6T534gpWpHS6uX0WzHSde05sV2zTD0sjeUjoPhInWGflAZcRp2Vu00oR9Hs2XVmHEQnbbD-aozhlb70V9BOjsgM6nlMr63mMqB8X1LH4SDib5qbvMelR_D82I0ZDug9WpAFcrhfy34zSoGq0XpMUSv5YAthHJwdV5nM3Ez-atZvJrzmyko3Z9-Z8VOUA-aNjOW7750Fv82yfLdRG0LtSk2YoLb2WqxXm-KzWo1qbdlKctys5jPNmrNi-VKLsqVQFEpPt9U-WY-0Vue80W-4ctZUWyKVSbLdYX5HMUcV3xdCjbPsRHaZMacm8z500SH0OF2tszX6-XEiBJNSN_anFu8QFqltrY4TPyWDk3L7hTYPDdUj69uoo4Gt0J4WS_n8NuHx9--oygegb4wfyEm_TsH49KqdXaa3CUZEN51fSU1k86b7U--Hwja8G_aevcXUg6PKSBqoUPE5y3_bwAAAP__VkUXIQ">