<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/150263>150263</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Improve handling of misaligned loads on RISC-V by introducing branching
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
newpavlov
</td>
</tr>
</table>
<pre>
Because of the (arguably) stupid handling of misaligned loads by the RISC-V spec LLVM generates extremely bad code for misaligned loads (see #110454 for the previous discussion). This can be seen in the Rust following snippet ([godbolt link](https://rust.godbolt.org/z/jeET5EKar)):
```rust
pub fn load_block(block: &[u8; 128]) -> u64 {
let buf: [u64; 16] = core::array::from_fn(|i| {
let chunk = block[8*i..][..8].try_into().unwrap();
u64::from_ne_bytes(chunk)
});
buf.iter().sum()
}
```
LLVM IR:
```llvm
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
define noundef i64 @load_block(ptr noalias nocapture noundef readonly align 1 dereferenceable(128) %block) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
%0 = load <16 x i64>, ptr %block, align 1, !alias.scope !3
%1 = tail call i64 @llvm.vector.reduce.add.v16i64(<16 x i64> %0)
ret i64 %1
}
```
LLVM emulates unaligned word loads with aligned byte loads, which results in an **extremely** inefficient codegen. Such loads are very common in cryptographic code (e.g. the `[u64; 16]` load is used in SHA-512).
Right now we have to work around it by introducing hacks like [this](https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/riscv_zknh_utils.rs) (use aligned load if the buffer is aligned, if not, perform 17 aligned `u64` loads and stitch results into `[u64; 16]`). Doing it manually for every algorithm implementation is obviously not practical (same for telling users to manually use `-mno-strict-align`.), and, frankly, I intend to remove the hack in the near future, performance on RISC-V be damned.
I suggest to implement the hack as an LLVM optimization. Yes, it introduces a branch, but it's the best option assuming that handling of the `Zicclsm` extension stays the same.
cc @asb
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx8Vk2P4zgO_TXKhSjDluPEdcjBSXWwhZ29dA8G2L0EskzbmpIlQx9Ju3_9gnIqXTU720AA27H4SD6Sjxbeq8EgHlh1ZNXLRsQwWncweJvFVdvrprXdcjiiFNEj2B7CiMB4LdwQRasXxp_BhzirDkZhOq3MQKcm5YUm4A60FZ2HdkmWX1-_nZ7-AD-jhN9---NfMKBBJwJ6wO_B4YR6gVZ0IG2H0Fv3v0iM1x4phrIo8m21TacIe3Z4VTZ66JSX0XtlDePPGfw-Kg9SGGgRPKIBZdZYog_QW63tjaL2Rs0zBsJn1XGwXWt1AK3MG6teGK_HEGbPyobxM-NnF33I7ocy6wbGzz8YP_-JX36vvvxTOMaf6Vc2LG_YLl9_ZMTyZo4t9Calc2m1lW-M1-u1bIDxHauOsWblEQoK5YUofmLlF4i7LbD9keUNAIDGAG3sk011jLttstix6gVY-QLSOiT3ZSOcE8t62zs7XXpDKe5Piu1PH_DeMeUYzVuCWGOqjjXjjcoyCqU6ZhnFlAW3XJQJlqD4cxbNzYl5fWDlR8gU2MO3wUu7BPSM18kPHc8bYPuXT4Zt7DMV0N3RfZzu0HlDRz9QyvIm9dHr179wrfV1oufyCOdoZFDWQBOCoxLCFH2YnR0ceg_G9g4RjHUoo_N05xcjwdhobsp0YKzR4sfS0v1Nae0wRGdgwsm6ZZ2FCSfCdSg6Kle8BdFqZHnTYa8MJqwOe1BUwm3-qfRzcGCs0EpQLFLMIbqfFgRpjV4gTQEU0KHDHh0aickHr4vEDTBe3SGfIRojJuwuouscjUoOMzpvjdAqLEAe2Ta14wXHy8dXaz_4IFxYCQXCzVM_UNTAylOxg--UCSu_MH5a0R6-T--B0i3jRcor89LONLNF-Q5ZJMgglAYptH4wo69TdkUZrMscdlFiJrouuxY7csjrz-5TbPceAodhReFV8f8bBaeok95E864rN-vexeWmwgjv_1Onrv9TLrdRyREc-qiDJw0RBhhvGG8eyrU-gjLY90oqNCHp2IAmg29RjncnwiFc0S0g7TTZpEfSLXOwgxPzqOQqfozXmA3ZKri7_C8zznb5Wg_lIXrsCOTbP5qnquA0MZR53nxVwxjA2BvcEEZxRQiWsn0D4ai9QAUSZmWCs12UpIKjkG8etHpDUpUwKv936jeoMMY2k3Zi_Ew6ekrxM34ehR9pvM-tti3j50n4NMdnPwpOFyfXhxTo2Skvr5cfb2a8xKC0z5xfW7mmdfNR-EGtu6eNfY-O0r6_pdqoHowNqRnR9dZNUOwf1sQeUXdnzIMwHfigwqdyBvv3LKcV8mKJGxVgEiYKrZe0dDAVUejBOhXGCdQ0a5zQBJHURnmwbdpIeqHwYHZCBiWFTjtMTOuCC6jT1owenacCPXwQBWyXP03GPvnglAxPKSe2y7O0X06UCl16J8wb9d8JXikXNB0hOZwsFX3EVNf3xWdQOOgjycwHyoSRCNa8r-gWoROTwe7eS6_g4zCgDwT8yPQntiBe151u56Am9SOxkMG_MY2PCo8-Qw8CWieMHOlNGwOowPjerwUmHwRhDQjv40TkhFGETx8Y96n4j5JS-4lqi98DGlr64INYViwi-R6_lCQvwrew6Q5l91w-iw0ein1V8ue84PVmPCAvq3rXbvdiV-N22xXbVgghUWxrLPZ9vVEHnvMq3_Oy2Oc8r7N2z_MOt2Xe8o5LKdk2x0konSUZs27YKO8jHooq57tyo0WL2qdPLc4N3iC9ZZzTl5c7kNFTGwdPOqh88D9hggoaD6_T7Kigv_zS-lDCz6O9Mq7MsIlOH34x0Gl1rpen2dk_UQbGzylWmux7MtcD_28AAAD__--9Sjk">