[PATCH] D127982: [X86][FP16] Enable vector support for FP16 emulation

Mon Jul 18 18:57:40 PDT 2022

pengfei added a comment.

In D127982#3660741 <https://reviews.llvm.org/D127982#3660741>, @aeubanks wrote:

> hi, we're seeing flaky crashes after this patch and I'm having trouble figuring out what's going wrong
>
> F23838652: xla.ll <https://reviews.llvm.org/F23838652>
>
> `llc -O3 /tmp/xla.ll`
>
> diff before/after this patch
>
>   3,4c3,4
>   <       .section        .rodata,"a", at progbits
>   <       .p2align        1                               # -- Begin function main.34
>   ---
>   >       .section        .rodata.cst16,"aM", at progbits,16
>   >       .p2align        4                               # -- Begin function main.34
>   6a7,17
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .short  0x6056                          # half 555
>   >       .section        .rodata,"a", at progbits
>   >       .p2align        1
>   > .LCPI0_1:
>   >       .short  0x6056                          # half 555
>   15a27,28
>   >       movdqa  .LCPI0_0(%rip), %xmm0           # xmm0 = [5.55E+2,5.55E+2,5.55E+2,5.55E+2,5.55E+2,5.55E+2,5.55E+2,5.55E+2]
>   >       movdqa  %xmm0, (%rax)
>   17,18d29
>   <       movq    %rsi, (%rax)
>   <       movq    %rsi, 8(%rax)
>   21c32
>   <       pinsrw  $0, .LCPI0_0(%rip), %xmm0
>   ---
>   >       pinsrw  $0, .LCPI0_1(%rip), %xmm0
>
> full assembly
> F23838693: good.s <https://reviews.llvm.org/F23838693>
> F23838695: bad.s <https://reviews.llvm.org/F23838695>
>
> we had theories that there was something to do with alignment with the `movdqa`?

Hi @aeubanks, I think it should be an inherent problem in the application and just exposed by this patch. The diff in the assembly is as expected. The problem is the `align 16` in below IR:

  %fusion = load ptr, ptr %buffer_table, align 8, !invariant.load !0, !dereferenceable !2, !align !1
  store half 0xH6056, ptr %fusion, align 16, !alias.scope !3, !noalias !6

which makes codegen to select `movdqa`, while the flaky crashes turn out `%fusion` is not always aligned to 16.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127982/new/

https://reviews.llvm.org/D127982