[llvm-dev] avx512 JIT backend generates wrong code on <4 x float>

Wed Jun 29 12:48:25 PDT 2016

Hi Frank,

I recommend trying trunk LLVM. AVX-512 development has been very active recently.

 -Hal

----- Original Message -----
> From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "LLVM Dev" <llvm-dev at lists.llvm.org>
> Sent: Wednesday, June 29, 2016 2:41:39 PM
> Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4 x float>
> 
> Hi!
> 
> When compiling the attached module with the JIT engine on an Intel
> KNL I
> see wrong code getting emitted. I attach a complete exploit program
> which shows the bug in LLVM 3.8. It loads and JIT compiles the module
> and prints the assembler. I stumbled on this since the result of an
> actual calculation was wrong. So, it's not only the text version of
> the
> assembler also the machine assembler is wrong.
> 
> When I execute the exploit program on an Intel KNL the following
> output
> is produced:
> 
> CPU name = knl
> -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
> Assembly:
>      .text
>      .file    "module_KFxOBX_i4_after.ll"
>      .globl    adjmul
>      .align    16, 0x90
>      .type    adjmul, at function
> adjmul:
>      .cfi_startproc
>      leaq    (%rdi,%r8), %rdx
>      addq    %rsi, %r8
>      testb    $1, %cl
>      cmoveq    %rdi, %rdx
>      cmoveq    %rsi, %r8
>      movq    %rdx, %rax
>      sarq    $63, %rax
>      shrq    $62, %rax
>      addq    %rdx, %rax
>      sarq    $2, %rax
>      movq    %r8, %rcx
>      sarq    $63, %rcx
>      shrq    $62, %rcx
>      addq    %r8, %rcx
>      sarq    $2, %rcx
>      movq    %rax, %rdx
>      shlq    $5, %rdx
>      leaq    16(%r9,%rdx), %rsi
>      orq    $16, %rdx
>      movq    16(%rsp), %rdi
>      addq    %rdx, %rdi
>      addq    8(%rsp), %rdx
>      .align    16, 0x90
> .LBB0_1:
>      vmovaps    -16(%rdx), %xmm0
>      vmovaps    (%rdx), %xmm1
>      vmovaps    -16(%rdi), %xmm2
>      vmovaps    (%rdi), %xmm3
>      vmulps    %xmm3, %xmm1, %xmm4
>      vmulps    %xmm2, %xmm1, %xmm1
>      vfmadd213ss    %xmm4, %xmm0, %xmm2
>      vfmsub213ss    %xmm1, %xmm0, %xmm3
>      vmovaps    %xmm2, -16(%rsi)
>      vmovaps    %xmm3, (%rsi)
>      addq    $1, %rax
>      addq    $32, %rsi
>      addq    $32, %rdi
>      addq    $32, %rdx
>      cmpq    %rcx, %rax
>      jl    .LBB0_1
>      retq
> .Lfunc_end0:
>      .size    adjmul, .Lfunc_end0-adjmul
>      .cfi_endproc
> 
> 
>      .section    ".note.GNU-stack","", at progbits
> 
> end assembly!
> 
> 
> The instructions 'vfmadd213ss' are 'Fused Multiply-Add of Scalar
> Single-Precision Floating-Point'. Those should be SIMD vector
> instructions. Note that the KNL has 16 wide float SIMD, while the
> exploit module uses only 4. However, the backend should be able to
> handle this.
> 
> Unless I receive further ideas I will file an official bug report.
> 
> Frank
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory