[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Quentin Colombet via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 17 09:27:31 PST 2017


Hi Oliver,

Thanks for trying this.
Could you file a different PR for each of the problem you found and reference the umbrella PR: http://llvm.org/PR35347? <http://llvm.org/PR35347?>

Thanks,
-Quentin

> On Nov 17, 2017, at 8:17 AM, Oliver Stannard <oliver.stannard at arm.com> wrote:
> 
> Hi Quentin,
>  
> One more reproducer, this time with small (<64bit) values being passed on the stack:
>  
>   int foo(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7,
>           int stack1) {
>     return stack1;
>   }
>  
>   int main() {
>     int ret = foo(0,1,2,3,4,5,6,7,8);
>     printf("%d\n", ret);
>   }
>  
> Global isel thinks that the incoming value of stack1 is stored in bytes [0,4) above SP, but for big-endian targets this should be in bytes [4,8):
>  
> // /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S
> _Z3fooiiiiiiiii:                        // @_Z3fooiiiiiiiii
> // BB#0:                                // %entry
>         sub     sp, sp, #48             // =48
>         ldr     w8, [sp, #48]   // <= Should be [sp, #52]
>         str     w0, [sp, #44]
>         str     w1, [sp, #40]
>         str     w2, [sp, #36]
>         str     w3, [sp, #32]
>         str     w4, [sp, #28]
>         str     w5, [sp, #24]
>         str     w6, [sp, #20]
>         str     w7, [sp, #16]
>         str     w8, [sp, #12]
>         ldr     w0, [sp, #12]
>         add     sp, sp, #48             // =48
>         ret
>  
> Oliver
>  
> From: Oliver Stannard 
> Sent: 17 November 2017 14:57
> To: 'qcolombet at apple.com <mailto:qcolombet at apple.com>'
> Cc: 'llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>'; nd; Kristof Beyls
> Subject: RE: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
>  
> Hi Quentin,
>  
> It seems that we also get the calling convention wrong for vector types on big-endian:
>   #include <arm_neon.h>
>   int32x2_t load_vector(int32x2_t *p) {
>     return *p;
>   }
>  
> Global-isel generates this:
>   // armclang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S
>   _Z11load_vectorP11__Int32x2_t:          // @_Z11load_vectorP11__Int32x2_t
>   // BB#0:                                // %entry
>           sub     sp, sp, #16             // =16
>           str     x0, [sp, #8]
>           ldr     x0, [sp, #8]
>           ld1     { v0.2s }, [x0]
>           add     sp, sp, #16             // =16
>           ret
>  
> With global-isel off, there is a rev64 instruction between the ld1 and the add, which fixes up the endianness of the vector.
>  
> Oliver
>  
> From: Oliver Stannard 
> Sent: 17 November 2017 13:32
> To: 'qcolombet at apple.com <mailto:qcolombet at apple.com>'
> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>; nd; Kristof Beyls
> Subject: RE: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
>  
> Hi Quentin,
>  
> At Kristof’s suggestion, I tried running our ABI test suite for a big-endian AArch64 target, and this found an ABI mismatch between global-isel and regular -O0. Here’s a reproducer for the first one I’ve investigated:
>  
>   struct foo {
>     float first;
>     float second;
>   };
>   float get_first(foo p) {
>     return p.first;
>   }
>  
> This is the code that global-isel currently generates:
>   // /work/llvm/build/bin/clang --target=aarch64--none-eabi -march=armv8-a -c callees.cpp -O0 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S
>  
>   _Z9get_first3foo:                       // @_Z9get_first3foo
>   // BB#0:                                // %entry
>           sub     sp, sp, #16             // =16
>                                           // implicit-def: %X8
>           fmov    w9, s0
>           mov     w10, w9
>           bfxil   x8, x10, #0, #32
>           fmov    w9, s1
>           mov     w10, w9
>           bfi     x8, x10, #32, #32
>           add     x10, sp, #8             // =8
>           str     x8, [sp, #8]
>           ldr     w9, [x10]
>           fmov    s0, w9
>           add     sp, sp, #16             // =16
>           ret
>  
> When run on a big-endian target, this incorrectly returns the second member of the struct, instead of the first.
>  
> Oliver
>  
> From: qcolombet at apple.com <mailto:qcolombet at apple.com> [mailto:qcolombet at apple.com <mailto:qcolombet at apple.com>] 
> Sent: 14 November 2017 23:11
> To: Quentin Colombet
> Cc: Oliver Stannard; llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>; Justin Bogner; Ahmed Bougacha; Aditya Nandakumar; nd
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
>  
> To give an update here, we actually are not missing a mapping. The code complains because we are copying around a fp16 into a gpr32 and that shouldn’t be done with a copy (default mapping).
> I extended the repairing code to issue G_ANYEXT in those cases instead of asserting.
>  
> However, now, I have to teach instruction select about those ANYEXT otherwise we’ll fallback in that case. But that’s a different story.
>  
> I’ll try to commit today or tomorrow (I have to strengthen the tests).
>  
> On Nov 14, 2017, at 9:29 AM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> Thanks Oliver.
> I’ll have a look. This typically means that we miss a mapping for this type/instruction, which is not surprising given how little code we have we fp16.
>  
> 
> On Nov 14, 2017, at 2:27 AM, Oliver Stannard <oliver.stannard at arm.com <mailto:oliver.stannard at arm.com>> wrote:
>  
> Hi Quentin,
>  
> I’ve started running an ABI test suite with global isel on AArch64, and while it hasn’t found any ABI issues it has hit an assertion in clang when using the __fp16 type. Here’s a reproducer:
>  
>   __fp16 pass_f16(__fp16 p) {
>     return p;
>   }
>  
>   $ /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c test.c -O0 -mllvm -global-isel -mllvm -global-isel-abort=0
>   clang-6.0: /work/llvm/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:446: static void llvm::RegisterBankInfo::applyDefaultMapping(const llvm::RegisterBankInfo::OperandsMapper &): Assertion `OrigTy.getSizeInBits() == NewTy.getSizeInBits() && "Types with difference size cannot be handled by the default " "mapping"' failed.
>   #0 0x000000000362a764 PrintStackTraceSignalHandler(void*) (/work/llvm/build/bin/clang-6.0+0x362a764)
>   #1 0x000000000362aac6 SignalHandler(int) (/work/llvm/build/bin/clang-6.0+0x362aac6)
>   #2 0x00007f9193b78330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
>   #3 0x00007f919276bc37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
>   #4 0x00007f919276f028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0
>   #5 0x00007f9192764bf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0
>   #6 0x00007f9192764ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
>   #7 0x0000000003d70eb9 (/work/llvm/build/bin/clang-6.0+0x3d70eb9)
>   #8 0x0000000003d6b00c llvm::RegBankSelect::applyMapping(llvm::MachineInstr&, llvm::RegisterBankInfo::InstructionMapping const&, llvm::SmallVectorImpl<llvm::RegBankSelect::RepairingPlacement>&) (/work/llvm/build/bin/clang-6.0+0x3d6b00c)
>   #9 0x0000000003d6b366 llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) (/work/llvm/build/bin/clang-6.0+0x3d6b366)
>   #10 0x0000000003d6b7f1 llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) (/work/llvm/build/bin/clang-6.0+0x3d6b7f1)
>   #11 0x0000000002d934c8 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x2d934c8)
>   #12 0x00000000030c998f llvm::FPPassManager::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x30c998f)
>   #13 0x00000000030c9c53 llvm::FPPassManager::runOnModule(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30c9c53)
>   #14 0x00000000030ca136 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30ca136)
>   #15 0x00000000037c3dcf clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/work/llvm/build/bin/clang-6.0+0x37c3dcf)
>   #16 0x0000000003d421a0 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/work/llvm/build/bin/clang-6.0+0x3d421a0)
>   #17 0x0000000004457376 clang::ParseAST(clang::Sema&, bool, bool) (/work/llvm/build/bin/clang-6.0+0x4457376)
>   #18 0x0000000003ca6ea0 clang::FrontendAction::Execute() (/work/llvm/build/bin/clang-6.0+0x3ca6ea0)
>   #19 0x0000000003c1fa31 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/work/llvm/build/bin/clang-6.0+0x3c1fa31)
>   #20 0x0000000003d3bf4b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/work/llvm/build/bin/clang-6.0+0x3d3bf4b)
>   #21 0x0000000001f85629 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/work/llvm/build/bin/clang-6.0+0x1f85629)
>   #22 0x0000000001f83096 main (/work/llvm/build/bin/clang-6.0+0x1f83096)
>   #23 0x00007f9192756f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0
>   #24 0x0000000001f80029 _start (/work/llvm/build/bin/clang-6.0+0x1f80029)
>   Stack dump:
>   0.      Program arguments: /work/llvm/build/bin/clang-6.0 -cc1 -triple aarch64-arm-none-eabi -emit-obj -mrelax-all -disable-free -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -fallow-half-arguments-and-returns -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /work/innovation/cctest/test.gcno -resource-dir /work/llvm/build/lib/clang/6.0.0 -O0 -fdebug-compilation-dir /work/innovation/cctest -ferror-limit 19 -fmessage-length 226 -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -mllvm -global-isel -mllvm -global-isel-abort=0 -o test.o -x c test.c 
>   1.      <eof> parser at end of file
>   2.      Code generation
>   3.      Running pass 'Function Pass Manager' on module 'test.c'.
>   4.      Running pass 'RegBankSelect' on function '@pass_f16'
>   clang-6.0: error: unable to execute command: Aborted (core dumped)
>   clang-6.0: error: clang frontend command failed due to signal (use -v to see invocation)
>   clang version 6.0.0 (ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/clang <ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/clang> aa2b9952ef98a5fe2d47384ef17106855b8bae51) (ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/llvm <ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/llvm> 29f89772107a79b5f2a816d4748ed9c19416c1b6)
>   Target: aarch64-arm-none-eabi
>   Thread model: posix
>   InstalledDir: /work/llvm/build/bin
>   clang-6.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ <http://llvm.org/bugs/> and include the crash backtrace, preprocessed source, and associated run script.
>   clang-6.0: note: diagnostic msg:
>   ********************
>  
>   PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
>   Preprocessed source(s) and associated run script(s) are located at:
>   clang-6.0: note: diagnostic msg: /tmp/test-e06964.c
>   clang-6.0: note: diagnostic msg: /tmp/test-e06964.sh
>   clang-6.0: note: diagnostic msg:
>  
>   ********************
>  
> Oliver
>  
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Quentin Colombet via llvm-dev
> Sent: 13 November 2017 18:27
> To: Kristof Beyls
> Cc: llvm-dev; nd; Ahmed Bougacha; Justin Bogner; Aditya Nandakumar
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
>  
> Hi Kristof,
>  
>  
> On Nov 13, 2017, at 9:10 AM, Kristof Beyls <Kristof.Beyls at arm.com <mailto:Kristof.Beyls at arm.com>> wrote:
>  
> Hi Quentin, 
>  
> My only remaining concern is around ABI compatibility.
> The following commit seems to indicate that in the previous round of evaluation, we didn’t find an existing ABI compatibility issue:
> http://llvm.org/viewvc/llvm-project?view=revision&revision=311388 <http://llvm.org/viewvc/llvm-project?view=revision&revision=311388>.
> I haven’t looked into the details of this issue - so maybe I’m worried over nothing?
>  
> No, you’re right. The problem with ABI is if you are consistently wrong, then you won’t notice :).
>  
> 
>  
> I’m wondering if since then on your side you did any testing around ABI compatibility?
> E.g. building software where you semi-randomly build some functions through GlobalISel and some functions through DAGISel?
>  
> Justin will look into that. Clang has utility script for that utils/ABITest.
>  
> Given we will only be able to check iOS ABI, you may want to follow the same kind of validation on your side.
>  
> I let you sync up with Justin for the method.
>  
> Cheers,
> -Quentin
>  
> 
>  
> Thanks,
>  
> Kristof
>  
> On 8 Nov 2017, at 00:42, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> Hi all, 
>  
> I’d like to resurrect this thread and ask if people are on board for enabling this by default for AArch64 O0.
>  
>  
> *** What Changed Since June? ***
>  
> - We added a way to describe the legalization actions for non-power-of-2 
> - We gave a tutorial that covers the best practices to target GlobalISel
> - We improved the TableGen backend to reuse existing SDISel patterns
> - We built and ran huge internal software with GISel
> - We evaluated the performance of GISel and are confident things are in a good shape (with https://reviews.llvm.org/D39034 <https://reviews.llvm.org/D39034>) and moving forward would look even better (see the last LLVM Dev talk: GlobalISel: Present, Past, and Future when it is available)
>  
>  
> *** So What’s he Plan? ***
>  
> - Switch the default instruction selector to GISel for AArch64 at O0
> - Enable the fallback path by default for AArch64 (with warnings enabled when that path is hit)
> - Provide a clang option to turn GISel off
>  
> What do you think?
>  
> Thanks,
> -Quentin
>  
> 
> On Jun 16, 2017, at 4:43 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
>  
> Hi all,
>  
> We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.
>  
>  
> *** Why Is That? ***
>  
> We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
> In particular,
> 1. The APIs are still evolving and can still possibly change significantly
> 2. The TableGen backend to reuse the existing SD patterns is still at its early stage
> 3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)
>  
> The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.
>  
> We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!
>  
>  
> *** Short-Term Proposal ***
>  
> What we would like to do instead short-term is:
> A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
> B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
> C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic
>  
> What do people think?
>  
>  
> *** Your Help Is Needed ***
>  
> - Please share your experience in using the GISel APIs and how we can make them better. Moving forward we’ll have those conversations on open source instead of internally/with a narrower audience.
> - Report any performance problem you identify
> - Propose patches!
>  
> Cheers,
> -Quentin
>  
>  
>  
> On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> 
> On Jun 14, 2017, at 7:27 AM, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
>  
> On 12 June 2017 at 18:54, Diana Picus <diana.picus at linaro.org <mailto:diana.picus at linaro.org>> wrote:
> Hi all,
>  
> I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.
>  
>  
> I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:
>  
> PAQ8p:
> Fast isel: 666.344
> Global isel: 731.384
>  
> SciMark2-C:
> Fast isel: 463.908
> Global isel: 496.22
>  
> The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark). 
>  
> Something along these lines works:
> https://reviews.llvm.org/differential/diff/102547/ <https://reviews.llvm.org/differential/diff/102547/>
>  
> What do you guys think about this approach?
>  
> Looks reasonable to me.
>  
> 
>  
> Thanks,
> Diana
>  
> PS: The buildbot is using the Makefiles because that's what our other AArch64 test-suite bots use. Moving all of them to CMake is a transition for another time.
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>  
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/0a257399/attachment-0001.html>


More information about the llvm-dev mailing list