[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)
Artem Belevich via cfe-commits
cfe-commits at lists.llvm.org
Fri Oct 6 16:07:51 PDT 2023
Artem-B wrote:
Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers.
```
%t0 = type { [17 x i8] }
@shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1
define <4 x i8> @in_v4i8(<4 x i8> %x, <4 x i8> %y) nounwind {
%v = load <4 x i8>, ptr getelementptr inbounds (i8, ptr addrspacecast (ptr addrspace(3) @shared_storage to ptr), i64 9), align 1
ret <4 x i8> %v
}
```
```
mov.u64 %rd1, shared_storage;
cvta.shared.u64 %rd2, %rd1;
ld.u32 %r1, [%rd2+9];
st.param.b32 [func_retval0+0], %r1;
ret;
```
https://github.com/llvm/llvm-project/pull/67866
More information about the cfe-commits
mailing list