[llvm-dev] avx512 JIT backend generates wrong code on <4 x float>

Thu Jun 30 09:49:34 PDT 2016

Hi Hal!

Thanks, but unfortunately it didn't help. The exact same assembler 
instructions are generated for both 3.8 (yesterday) and trunk (from today).

So, this really looks like a bug.

Best,
Frank

On 06/29/2016 03:48 PM, Hal Finkel wrote:
> Hi Frank,
>
> I recommend trying trunk LLVM. AVX-512 development has been very active recently.
>
>   -Hal
>
> ----- Original Message -----
>> From: "Frank Winter via llvm-dev" <llvm-dev at lists.llvm.org>
>> To: "LLVM Dev" <llvm-dev at lists.llvm.org>
>> Sent: Wednesday, June 29, 2016 2:41:39 PM
>> Subject: [llvm-dev] avx512 JIT backend generates wrong code on <4 x float>
>>
>> Hi!
>>
>> When compiling the attached module with the JIT engine on an Intel
>> KNL I
>> see wrong code getting emitted. I attach a complete exploit program
>> which shows the bug in LLVM 3.8. It loads and JIT compiles the module
>> and prints the assembler. I stumbled on this since the result of an
>> actual calculation was wrong. So, it's not only the text version of
>> the
>> assembler also the machine assembler is wrong.
>>
>> When I execute the exploit program on an Intel KNL the following
>> output
>> is produced:
>>
>> CPU name = knl
>> -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
>> Assembly:
>>       .text
>>       .file    "module_KFxOBX_i4_after.ll"
>>       .globl    adjmul
>>       .align    16, 0x90
>>       .type    adjmul, at function
>> adjmul:
>>       .cfi_startproc
>>       leaq    (%rdi,%r8), %rdx
>>       addq    %rsi, %r8
>>       testb    $1, %cl
>>       cmoveq    %rdi, %rdx
>>       cmoveq    %rsi, %r8
>>       movq    %rdx, %rax
>>       sarq    $63, %rax
>>       shrq    $62, %rax
>>       addq    %rdx, %rax
>>       sarq    $2, %rax
>>       movq    %r8, %rcx
>>       sarq    $63, %rcx
>>       shrq    $62, %rcx
>>       addq    %r8, %rcx
>>       sarq    $2, %rcx
>>       movq    %rax, %rdx
>>       shlq    $5, %rdx
>>       leaq    16(%r9,%rdx), %rsi
>>       orq    $16, %rdx
>>       movq    16(%rsp), %rdi
>>       addq    %rdx, %rdi
>>       addq    8(%rsp), %rdx
>>       .align    16, 0x90
>> .LBB0_1:
>>       vmovaps    -16(%rdx), %xmm0
>>       vmovaps    (%rdx), %xmm1
>>       vmovaps    -16(%rdi), %xmm2
>>       vmovaps    (%rdi), %xmm3
>>       vmulps    %xmm3, %xmm1, %xmm4
>>       vmulps    %xmm2, %xmm1, %xmm1
>>       vfmadd213ss    %xmm4, %xmm0, %xmm2
>>       vfmsub213ss    %xmm1, %xmm0, %xmm3
>>       vmovaps    %xmm2, -16(%rsi)
>>       vmovaps    %xmm3, (%rsi)
>>       addq    $1, %rax
>>       addq    $32, %rsi
>>       addq    $32, %rdi
>>       addq    $32, %rdx
>>       cmpq    %rcx, %rax
>>       jl    .LBB0_1
>>       retq
>> .Lfunc_end0:
>>       .size    adjmul, .Lfunc_end0-adjmul
>>       .cfi_endproc
>>
>>
>>       .section    ".note.GNU-stack","", at progbits
>>
>> end assembly!
>>
>>
>> The instructions 'vfmadd213ss' are 'Fused Multiply-Add of Scalar
>> Single-Precision Floating-Point'. Those should be SIMD vector
>> instructions. Note that the KNL has 16 wide float SIMD, while the
>> exploit module uses only 4. However, the backend should be able to
>> handle this.
>>
>> Unless I receive further ideas I will file an official bug report.
>>
>> Frank
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>