[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

Artem Belevich via cfe-commits cfe-commits at lists.llvm.org
Fri Oct 6 16:07:51 PDT 2023


Artem-B wrote:

Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers. 

```
%t0 = type { [17 x i8] }

@shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1

define <4 x i8> @in_v4i8(<4 x i8> %x, <4 x i8> %y) nounwind {
  %v = load <4 x i8>, ptr getelementptr inbounds (i8, ptr addrspacecast (ptr addrspace(3) @shared_storage to ptr), i64 9), align 1
  ret <4 x i8> %v
}
```

```
        mov.u64         %rd1, shared_storage;
        cvta.shared.u64         %rd2, %rd1;
        ld.u32  %r1, [%rd2+9];
        st.param.b32    [func_retval0+0], %r1;
        ret;
```

https://github.com/llvm/llvm-project/pull/67866


More information about the cfe-commits mailing list