[llvm-dev] AVX512 instruction generated when JIT compiling for an avx2 architecture
Frank Winter via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 23 10:39:41 PDT 2016
Thank you, guys! I got it to work by setting the CPU features like you
suggested.
Frank
On 06/23/2016 01:09 PM, Craig Topper wrote:
> I think there's a bug in 3.8 where skylake as cpu implies AVX512. I
> think later a skylake-avx512(or something similar) was added and
> avx512 feature was removed from the skylake cpu name.
>
> The right way to fix this is to call getHostCPUFeatures as well. This
> will return a StringMap of true and false values for each feature.
> Iterate through that and build a string of "+feature,-feature" based
> on each feature name its true/false value. Then pass that string to
> engineBuilder.setMAttrs.
>
> This will protect against other issues such as low end versions of
> SandyBridge, Haswell, and SkyLake processesors not supporting AVX at all.
>
> On Thu, Jun 23, 2016 at 10:07 AM, Keno Fischer
> <kfischer at college.harvard.edu <mailto:kfischer at college.harvard.edu>>
> wrote:
>
> You likely haven't set the cpu features correctly. See
> llvm::sys::getHostCPUFeatures. E.g. this is what we're doing in julia:
> https://github.com/JuliaLang/julia/blob/59b253031af87f62e7d70a7d8848cdfd4a84288b/src/codegen.cpp#L5627
>
> On Thu, Jun 23, 2016 at 1:00 PM, Frank Winter via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >
> >
> >
> > On 06/23/2016 12:56 PM, Craig Topper wrote:
> >
> > Can you check what value "getHostCPUName" returned?
> >
> > getHostCPUName() = skylake
> >
> >
> > On Thu, Jun 23, 2016 at 9:53 AM, Frank Winter via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> >>
> >> With LLVM 3.8 the JIT compiler engine generates an AVX512
> instruction although I target an 'avx2' CPU (intel Core I7).
> >> I just downloaded the most recent 3.8 and still it happens.
> >>
> >> It happens with this input module:
> >>
> >>
> >> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> >>
> >> define void @module_cFFEMJ(i64 %lo, i64 %hi, i64 %myId, i1
> %ordered, i64 %start, i32* noalias align 32 %arg0, i32* noalias
> align 32 %arg1) {
> >> entrypoint:
> >> %0 = add nsw i64 %lo, %start
> >> %1 = add nsw i64 %hi, %start
> >> %2 = select i1 %ordered, i64 %0, i64 %lo
> >> %3 = select i1 %ordered, i64 %1, i64 %hi
> >> %4 = sdiv i64 %2, 4
> >> %5 = sdiv i64 %3, 4
> >> %6 = bitcast i32* %arg1 to i64*
> >> %7 = load i64, i64* %6, align 32
> >> %8 = trunc i64 %7 to i32
> >> %9 = getelementptr i32, i32* %arg1, i64 1
> >> %10 = lshr i64 %7, 32
> >> %11 = trunc i64 %10 to i32
> >> %12 = getelementptr i32, i32* %arg1, i64 2
> >> %13 = bitcast i32* %12 to i64*
> >> %14 = load i64, i64* %13, align 8
> >> %15 = trunc i64 %14 to i32
> >> %16 = getelementptr i32, i32* %arg1, i64 3
> >> %17 = lshr i64 %14, 32
> >> %18 = trunc i64 %17 to i32
> >> br label %L5
> >>
> >> L5: ; preds = %L5, %entrypoint
> >> %19 = phi i64 [ %32, %L5 ], [ %4, %entrypoint ]
> >> %20 = shl i64 %19, 4
> >> %21 = or i64 %20, 4
> >> %22 = or i64 %20, 8
> >> %23 = or i64 %20, 12
> >> %broadcast.splatinsert9 = insertelement <4 x i32> undef, i32
> %8, i32 0
> >> %broadcast.splat10 = shufflevector <4 x i32>
> %broadcast.splatinsert9, <4 x i32> undef, <4 x i32> zeroinitializer
> >> %broadcast.splatinsert11 = insertelement <4 x i32> undef, i32
> %11, i32 0
> >> %broadcast.splat12 = shufflevector <4 x i32>
> %broadcast.splatinsert11, <4 x i32> undef, <4 x i32> zeroinitializer
> >> %broadcast.splatinsert13 = insertelement <4 x i32> undef, i32
> %15, i32 0
> >> %broadcast.splat14 = shufflevector <4 x i32>
> %broadcast.splatinsert13, <4 x i32> undef, <4 x i32> zeroinitializer
> >> %broadcast.splatinsert15 = insertelement <4 x i32> undef, i32
> %18, i32 0
> >> %broadcast.splat16 = shufflevector <4 x i32>
> %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer
> >> %24 = getelementptr i32, i32* %arg0, i64 %20
> >> %25 = bitcast i32* %24 to <4 x i32>*
> >> store <4 x i32> %broadcast.splat10, <4 x i32>* %25, align 16
> >> %26 = getelementptr i32, i32* %arg0, i64 %21
> >> %27 = bitcast i32* %26 to <4 x i32>*
> >> store <4 x i32> %broadcast.splat12, <4 x i32>* %27, align 16
> >> %28 = getelementptr i32, i32* %arg0, i64 %22
> >> %29 = bitcast i32* %28 to <4 x i32>*
> >> store <4 x i32> %broadcast.splat14, <4 x i32>* %29, align 16
> >> %30 = getelementptr i32, i32* %arg0, i64 %23
> >> %31 = bitcast i32* %30 to <4 x i32>*
> >> store <4 x i32> %broadcast.splat16, <4 x i32>* %31, align 16
> >> %32 = add nsw i64 %19, 1
> >> %33 = icmp slt i64 %32, %5
> >> br i1 %33, label %L5, label %L6
> >>
> >> L6: ; preds = %L5
> >> ret void
> >> }
> >>
> >>
> >> The following code line show how I call the JIT compiler.
> ('Mod' is pointing to the module).
> >>
> >> llvm::EngineBuilder
> engineBuilder(std::move(std::unique_ptr<llvm::Module>(Mod)));
> >> engineBuilder.setMCPU(llvm::sys::getHostCPUName());
> >> engineBuilder.setEngineKind(llvm::EngineKind::JIT);
> >> engineBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive);
> >> engineBuilder.setErrorStr(&mcjit_error);
> >>
> >> llvm::TargetOptions targetOptions;
> >> targetOptions.AllowFPOpFusion = llvm::FPOpFusion::Fast;
> >> engineBuilder.setTargetOptions( targetOptions );
> >>
> >> TheExecutionEngine = engineBuilder.create();
> >>
> >> targetMachine = engineBuilder.selectTarget();
> >> Mod->setDataLayout( targetMachine->createDataLayout() );
> >>
> >> TheExecutionEngine->finalizeObject(); // MCJIT
> >> fptr_mainFunc_extern =
> TheExecutionEngine->getPointerToFunction( mainFunc_extern );
> >>
> >>
> >> When calling the function an 'illegal instruction' is raised.
> >> Looking at the assembler reveals an AVX512 instruction which
> shouldn't be there.
> >>
> >> Assembly:
> >> .text
> >> .file "module"
> >> .globl main
> >> .align 16, 0x90
> >> .type main, at function
> >> main:
> >> .cfi_startproc
> >> movq 8(%rsp), %r10
> >> leaq (%rdi,%r8), %rdx
> >> addq %rsi, %r8
> >> testb $1, %cl
> >> cmoveq %rdi, %rdx
> >> cmoveq %rsi, %r8
> >> movq %rdx, %rax
> >> sarq $63, %rax
> >> shrq $62, %rax
> >> addq %rdx, %rax
> >> sarq $2, %rax
> >> movq %r8, %rcx
> >> sarq $63, %rcx
> >> shrq $62, %rcx
> >> addq %r8, %rcx
> >> sarq $2, %rcx
> >> movq (%r10), %r8
> >> movq 8(%r10), %r10
> >> movq %r8, %rdi
> >> shrq $32, %rdi
> >> movq %r10, %rsi
> >> shrq $32, %rsi
> >> movq %rax, %rdx
> >> shlq $6, %rdx
> >> leaq 48(%rdx,%r9), %rdx
> >> .align 16, 0x90
> >> .LBB0_1:
> >> vmovd %r8d, %xmm0
> >> vpbroadcastd %xmm0, %xmm0
> >> vmovd %edi, %xmm1
> >> vpbroadcastd %xmm1, %xmm1
> >> vmovd %r10d, %xmm2
> >> vpbroadcastd %xmm2, %xmm2
> >> vmovd %esi, %xmm3
> >> vpbroadcastd %xmm3, %xmm3
> >> vmovdqa32 %xmm0, -48(%rdx)
> >> vmovdqa32 %xmm1, -32(%rdx)
> >> vmovdqa32 %xmm2, -16(%rdx)
> >> vmovdqa32 %xmm3, (%rdx)
> >> addq $1, %rax
> >> addq $64, %rdx
> >> cmpq %rcx, %rax
> >> jl .LBB0_1
> >> retq
> >> .Lfunc_end0:
> >> .size main, .Lfunc_end0-main
> >> .cfi_endproc
> >>
> >>
> >> .section ".note.GNU-stack","", at progbits
> >>
> >> end assembly!
> >>
> >> I am not sure what instruction is the offending one, but the
> 'vmovdqa32' looks avx512.
> >>
> >> I wasn't able to reproduce this with 'opt' - it generates avx2
> instructions. And when I force it to use e.g. avx512f it rejects
> the CPU type.
> >>
> >> Any ideas?
> >>
> >>
> >> Frank
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> >
> > --
> > ~Craig
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
>
>
>
> --
> ~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160623/366bdf64/attachment.html>
More information about the llvm-dev
mailing list