[llvm-dev] [EXTERNAL] Re: ORC JIT error when using AVX2 vector instructions

Tue Aug 31 03:48:01 PDT 2021

> Should I switch to LLVM 13 release or is avx2 in JIT a trusted feature
> to be present in version 12?
I am not certain, but I'd assume yes. I used AVX in a JIT many years ago
in an experimental project. I see no reason why AVX2 wouldn't be
available in ORC.

Did you check with the results of sys::getHostCPUFeatures()? There's
lots of AVX variants:
https://github.com/llvm/llvm-project/blob/76a1a415302d06ceb4a3358493e897e98dd75f77/llvm/lib/Support/Host.cpp#L1499

And maybe have a look how it works in JITTargetMachineBuilder:
https://github.com/llvm/llvm-project/blob/76a1a415302d06ceb4a3358493e897e98dd75f77/llvm/lib/ExecutionEngine/Orc/JITTargetMachineBuilder.cpp#L24

On 31/08/2021 00:57, Frank Winter wrote:
> Hi Stefan.
>
> Thanks for the tip. But, it didn't do the trick - still only SSE. (the
> /proc/cpuinfo contains flag 'avx2')
>
> I instrumented a bit:
>
>     JITTargetMachineBuilder JTMB((*TPC)->getTargetTriple());
>
>     llvm::outs() << "feature string: " <<
> JTMB.getFeatures().getString() << "\n";
>     llvm::outs() << "adding features...\n";
>     JTMB.addFeatures({"+avx2"});
>    
>     llvm::outs() << "feature string: " <<
> JTMB.getFeatures().getString() << "\n";
>
> Output:
>
> Creating JIT
> feature string:
> adding features...
> feature string: +avx2
> Creating JIT successfu
>
> But still only SSE:
>
> .Leval0_intern:
> .cfi_startproc
> addl %esi, %edi
> shll $3, %edi
> movslq %edi, %rax
> shlq $5, %rax
> movaps (%r8,%rax), %xmm0
> movaps 16(%r8,%rax), %xmm1
> mulps 16(%rcx,%rax), %xmm1
> mulps (%rcx,%rax), %xmm0
> movaps %xmm0, (%rdx,%rax)
> movaps %xmm1, 16(%rdx,%rax)
> retq
>
> Should I switch to LLVM 13 release or is avx2 in JIT a trusted feature
> to be present in version 12?
>
> Best,
> Frank
>
>
>
> ------------------------------------------------------------------------
> *From:* Stefan Gränitz <stefan.graenitz at gmail.com>
> *Sent:* Monday, August 30, 2021 5:00 PM
> *To:* Frank Winter <fwinter at jlab.org>; Craig Topper
> <craig.topper at gmail.com>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] [EXTERNAL] Re: ORC JIT error when using AVX2
> vector instructions
>  
>
> Hi Frank
>
>> That makes me think that the ORC JIT Kaleidoscope doesn't use the
>> '+avx2' attribute.
>>
>> How can ORC JIT Kaleidoscope generate jitted code with AVX2 instructions?
> Did you try adding something like:
> JTMB.addFeatures({"+avx2"});
>
> Here?
> https://github.com/llvm/llvm-project/blob/7a2a765745973ebeb041276d2d9489a000ba9371/llvm/examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h#L71
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_llvm_llvm-2Dproject_blob_7a2a765745973ebeb041276d2d9489a000ba9371_llvm_examples_Kaleidoscope_BuildingAJIT_Chapter1_KaleidoscopeJIT.h-23L71&d=DwMD-g&c=CJqEzB1piLOyyvZjb8YUQw&r=tFpAzszScTWMAFcrGFW5xg&m=d7g1B6MAn9hV6ijrMKBgXHMYCQjKMfXQcxFZUzXsIIE&s=1xpxLmfuaVlN3_P_Uw-FHKkrxSRG2uXXxT44bMGkOWA&e=>
>
> Hope it helps.
> Best, Stefan
>
> On 30/08/2021 21:46, Frank Winter via llvm-dev wrote:
>> Thanks! Yeah, that was my silliness. Fixed and the module compiles
>> now with ORC JIT Kaleidoscope.
>>
>> However, looking at the assembler I only see SSE (128 bit vectors)
>> being generated:
>>
>> .Leval0_intern:
>> .cfi_startproc
>> addl %esi, %edi
>> shll $3, %edi
>> movslq %edi, %rax
>> shlq $5, %rax
>> movaps (%r8,%rax), %xmm0
>> movaps 16(%r8,%rax), %xmm1
>> mulps 16(%rcx,%rax), %xmm1
>> mulps (%rcx,%rax), %xmm0
>> movaps %xmm0, (%rdx,%rax)
>> movaps %xmm1, 16(%rdx,%rax)
>> retq
>>
>> I cross checked what LLC gives:
>>
>> Calling llc with no optional flags gives matching assembler, but when
>> adding '-mattr=+avx2' I get AVX2 (256 bit vectors)
>>
>> .Leval0_intern:                         # @eval0_intern
>>         .cfi_startproc
>> # %bb.0:                                # %stack
>>         addl    %esi, %edi
>>         shll    $3, %edi
>>         movslq  %edi, %rax
>>         shlq    $5, %rax
>>         vmovaps (%r8,%rax), %ymm0
>>         vmulps  (%rcx,%rax), %ymm0, %ymm0
>>         vmovaps %ymm0, (%rdx,%rax)
>>         vzeroupper
>>         retq
>>
>> That makes me think that the ORC JIT Kaleidoscope doesn't use the
>> '+avx2' attribute.
>>
>> How can ORC JIT Kaleidoscope generate jitted code with AVX2 instructions?
>>
>> Thanks again & Best wishes,
>> Frank
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Craig Topper <craig.topper at gmail.com>
>> <mailto:craig.topper at gmail.com>
>> *Sent:* Monday, August 30, 2021 3:20 PM
>> *To:* Frank Winter <fwinter at jlab.org> <mailto:fwinter at jlab.org>
>> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> <llvm-dev at lists.llvm.org> <mailto:llvm-dev at lists.llvm.org>
>> *Subject:* [EXTERNAL] Re: [llvm-dev] ORC JIT error when using AVX2
>> vector instructions
>>  
>> This is an illegal instruction. mul is an integer operation, but that
>> has floating point types. The correct operation would be fmul.
>>
>> %21 = mul <8 x float> %20, %10
>>
>> ~Craig
>>
>>
>> On Mon, Aug 30, 2021 at 12:08 PM Frank Winter via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>     Hi.
>>
>>     As soon as the module contains instructions operating on < 8 x
>>     float > the ORC JIT refuses to work.
>>
>>     Here's the module that provokes the error given further below:
>>
>>     ; ModuleID = 'module'
>>     source_filename = "module"
>>     target datalayout =
>>     "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
>>
>>     define private void @eval0_intern(i32 %arg0, i32 %arg1, <8 x
>>     float>* %arg2, <8 x float>* %arg3, <8 x float>* %arg4) {
>>     stack:
>>       br label %afterstack
>>
>>     afterstack:                                       ; preds = %stack
>>       %0 = add nsw i32 %arg0, %arg1
>>       %1 = add nsw i32 0, %0
>>       %2 = mul i32 %1, 1
>>       %3 = add nsw i32 %2, 0
>>       %4 = mul i32 %3, 1
>>       %5 = add nsw i32 %4, 0
>>       %6 = mul i32 %5, 1
>>       %7 = add nsw i32 %6, 0
>>       %8 = mul i32 %7, 8
>>       %9 = getelementptr <8 x float>, <8 x float>* %arg3, i32 %8
>>       %10 = load <8 x float>, <8 x float>* %9, align 32
>>       %11 = add nsw i32 0, %0
>>       %12 = mul i32 %11, 1
>>       %13 = add nsw i32 %12, 0
>>       %14 = mul i32 %13, 1
>>       %15 = add nsw i32 %14, 0
>>       %16 = mul i32 %15, 1
>>       %17 = add nsw i32 %16, 0
>>       %18 = mul i32 %17, 8
>>       %19 = getelementptr <8 x float>, <8 x float>* %arg4, i32 %18
>>       %20 = load <8 x float>, <8 x float>* %19, align 32
>>       %21 = mul <8 x float> %20, %10
>>       %22 = add nsw i32 0, %0
>>       %23 = mul i32 %22, 1
>>       %24 = add nsw i32 %23, 0
>>       %25 = mul i32 %24, 1
>>       %26 = add nsw i32 %25, 0
>>       %27 = mul i32 %26, 1
>>       %28 = add nsw i32 %27, 0
>>       %29 = mul i32 %28, 8
>>       %30 = getelementptr <8 x float>, <8 x float>* %arg2, i32 %29
>>       store <8 x float> %21, <8 x float>* %30, align 32
>>       ret void
>>     }
>>
>>     define void @eval0(i32 %idx, [8 x i8]* %arg_ptr) {
>>     entrypoint:
>>       %0 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 0
>>       %1 = bitcast [8 x i8]* %0 to i32*
>>       %2 = load i32, i32* %1, align 4
>>       %3 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 1
>>       %4 = bitcast [8 x i8]* %3 to <8 x float>**
>>       %5 = load <8 x float>*, <8 x float>** %4, align 8
>>       %6 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 2
>>       %7 = bitcast [8 x i8]* %6 to <8 x float>**
>>       %8 = load <8 x float>*, <8 x float>** %7, align 8
>>       %9 = getelementptr [8 x i8], [8 x i8]* %arg_ptr, i32 3
>>       %10 = bitcast [8 x i8]* %9 to <8 x float>**
>>       %11 = load <8 x float>*, <8 x float>** %10, align 8
>>       call void @eval0_intern(i32 %idx, i32 %2, <8 x float>* %5, <8 x
>>     float>* %8, <8 x float>* %11)
>>       ret void
>>     }
>>     --------------------------
>>
>>
>>     For the JIT part I'm using the Kaleidoscope ORC JIT as given in
>>     the LLVM examples. However, when it comes to the symbol lookup
>>     the program stops with output like this:
>>
>>     Lookup
>>     LLVM ERROR: Cannot select: 0x562e8bb6c268: v4f32 = mul
>>     0x562e8bb6bab0, 0x562e8bb6b6a0
>>       0x562e8bb6bab0: v4f32,ch = load<(load 16 from %ir.19 + 16,
>>     basealign 32)> 0x562e8baf8ca8, 0x562e8bb6c130, undef:i64
>>         0x562e8bb6c130: i64 = add nuw 0x562e8bb6bcb8, Constant:i64<16>
>>           0x562e8bb6bcb8: i64 = add 0x562e8bb6bc50, 0x562e8bb6b9e0
>>             0x562e8bb6bc50: i64,ch = CopyFromReg 0x562e8baf8ca8,
>>     Register:i64 %4
>>               0x562e8bb6bbe8: i64 = Register %4
>>             0x562e8bb6b9e0: i64 = shl 0x562e8bb6b910, Constant:i8<5>
>>               0x562e8bb6b910: i64 = sign_extend 0x562e8bb6b770
>>                 0x562e8bb6b770: i32 = shl 0x562e8bb6b500, Constant:i8<3>
>>                   0x562e8bb6b500: i32 = add nsw 0x562e8bb6b3c8,
>>     0x562e8bb6b498
>>                     0x562e8bb6b3c8: i32,ch = CopyFromReg
>>     0x562e8baf8ca8, Register:i32 %0
>>                       0x562e8bb6b360: i32 = Register %0
>>                     0x562e8bb6b498: i32,ch = CopyFromReg
>>     0x562e8baf8ca8, Register:i32 %1
>>                       0x562e8bb6b430: i32 = Register %1
>>                   0x562e8bb6ea28: i8 = Constant<3>
>>               0x562e8bb6c2d0: i8 = Constant<5>
>>           0x562e8bb6b638: i64 = Constant<16>
>>         0x562e8bb6bb18: i64 = undef
>>       0x562e8bb6b6a0: v4f32,ch = load<(load 16 from %ir.9 + 16,
>>     basealign 32)> 0x562e8baf8ca8, 0x562e8bb6c198, undef:i64
>>         0x562e8bb6c198: i64 = add nuw 0x562e8bb6ba48, Constant:i64<16>
>>           0x562e8bb6ba48: i64 = add 0x562e8bb6b8a8, 0x562e8bb6b9e0
>>             0x562e8bb6b8a8: i64,ch = CopyFromReg 0x562e8baf8ca8,
>>     Register:i64 %3
>>               0x562e8bb6b840: i64 = Register %3
>>             0x562e8bb6b9e0: i64 = shl 0x562e8bb6b910, Constant:i8<5>
>>               0x562e8bb6b910: i64 = sign_extend 0x562e8bb6b770
>>                 0x562e8bb6b770: i32 = shl 0x562e8bb6b500, Constant:i8<3>
>>                   0x562e8bb6b500: i32 = add nsw 0x562e8bb6b3c8,
>>     0x562e8bb6b498
>>                     0x562e8bb6b3c8: i32,ch = CopyFromReg
>>     0x562e8baf8ca8, Register:i32 %0
>>                       0x562e8bb6b360: i32 = Register %0
>>                     0x562e8bb6b498: i32,ch = CopyFromReg
>>     0x562e8baf8ca8, Register:i32 %1
>>                       0x562e8bb6b430: i32 = Register %1
>>                   0x562e8bb6ea28: i8 = Constant<3>
>>               0x562e8bb6c2d0: i8 = Constant<5>
>>           0x562e8bb6b638: i64 = Constant<16>
>>         0x562e8bb6bb18: i64 = undef
>>
>>
>>     The module compiles fine with LLC. So, I assume that's not the
>>     problem.
>>
>>     What might go wrong? Is there a way to initialize the ORC JIT
>>     with the AVX2 option somehow?
>>
>>     This is using LLVM release 12.
>>
>>     Best,
>>     Frank
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=tFpAzszScTWMAFcrGFW5xg&m=iIRT39rMHzg60BQQu6bv5Nzez97Rjf-90P-EHloWvtk&s=BpFT2lRfi7rhmDGQWAkBbHAKDe_9xPQKggZyX5VciuY&e=>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMD-g&c=CJqEzB1piLOyyvZjb8YUQw&r=tFpAzszScTWMAFcrGFW5xg&m=d7g1B6MAn9hV6ijrMKBgXHMYCQjKMfXQcxFZUzXsIIE&s=DFuYQvPLbEIViF8vTqwviU6LTKF7Qqjl8lhzJPVqUKY&e=>
> -- 
> https://weliveindetail.github.io/blog/about/ <https://urldefense.proofpoint.com/v2/url?u=https-3A__weliveindetail.github.io_blog_about_&d=DwMD-g&c=CJqEzB1piLOyyvZjb8YUQw&r=tFpAzszScTWMAFcrGFW5xg&m=d7g1B6MAn9hV6ijrMKBgXHMYCQjKMfXQcxFZUzXsIIE&s=bLfQtGMxBwWMWB3Eu_i92l-Fz8PNxn3xDCGrg-JxRNQ&e=>

-- 
https://weliveindetail.github.io/blog/about/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210831/b50aed2e/attachment-0001.html>