[flang-commits] [flang] [NVPTX] Improve lowering of v4i8 (PR #67866)
    Artem Belevich via flang-commits 
    flang-commits at lists.llvm.org
       
    Fri Oct  6 16:07:48 PDT 2023
    
    
  
Artem-B wrote:
Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers. 
```
%t0 = type { [17 x i8] }
@shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1
define <4 x i8> @in_v4i8(<4 x i8> %x, <4 x i8> %y) nounwind {
  %v = load <4 x i8>, ptr getelementptr inbounds (i8, ptr addrspacecast (ptr addrspace(3) @shared_storage to ptr), i64 9), align 1
  ret <4 x i8> %v
}
```
```
        mov.u64         %rd1, shared_storage;
        cvta.shared.u64         %rd2, %rd1;
        ld.u32  %r1, [%rd2+9];
        st.param.b32    [func_retval0+0], %r1;
        ret;
```
https://github.com/llvm/llvm-project/pull/67866
    
    
More information about the flang-commits
mailing list