[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Oliver Stannard via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 17 08:17:08 PST 2017


Hi Quentin,

One more reproducer, this time with small (<64bit) values being passed on the stack:

  int foo(int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7,
          int stack1) {
    return stack1;
  }

  int main() {
    int ret = foo(0,1,2,3,4,5,6,7,8);
    printf("%d\n", ret);
  }

Global isel thinks that the incoming value of stack1 is stored in bytes [0,4) above SP, but for big-endian targets this should be in bytes [4,8):

// /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S
_Z3fooiiiiiiiii:                        // @_Z3fooiiiiiiiii
// BB#0:                                // %entry
        sub     sp, sp, #48             // =48
        ldr     w8, [sp, #48]   // <= Should be [sp, #52]
        str     w0, [sp, #44]
        str     w1, [sp, #40]
        str     w2, [sp, #36]
        str     w3, [sp, #32]
        str     w4, [sp, #28]
        str     w5, [sp, #24]
        str     w6, [sp, #20]
        str     w7, [sp, #16]
        str     w8, [sp, #12]
        ldr     w0, [sp, #12]
        add     sp, sp, #48             // =48
        ret

Oliver

From: Oliver Stannard
Sent: 17 November 2017 14:57
To: 'qcolombet at apple.com'
Cc: 'llvm-dev at lists.llvm.org'; nd; Kristof Beyls
Subject: RE: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Quentin,

It seems that we also get the calling convention wrong for vector types on big-endian:
  #include <arm_neon.h>
  int32x2_t load_vector(int32x2_t *p) {
    return *p;
  }

Global-isel generates this:
  // armclang --target=aarch64-arm-none-eabi -march=armv8-a -c callees.cpp -O0 -Wall -std=c++11 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S
  _Z11load_vectorP11__Int32x2_t:          // @_Z11load_vectorP11__Int32x2_t
  // BB#0:                                // %entry
          sub     sp, sp, #16             // =16
          str     x0, [sp, #8]
          ldr     x0, [sp, #8]
          ld1     { v0.2s }, [x0]
          add     sp, sp, #16             // =16
          ret

With global-isel off, there is a rev64 instruction between the ld1 and the add, which fixes up the endianness of the vector.

Oliver

From: Oliver Stannard
Sent: 17 November 2017 13:32
To: 'qcolombet at apple.com'
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; nd; Kristof Beyls
Subject: RE: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Quentin,

At Kristof’s suggestion, I tried running our ABI test suite for a big-endian AArch64 target, and this found an ABI mismatch between global-isel and regular -O0. Here’s a reproducer for the first one I’ve investigated:

  struct foo {
    float first;
    float second;
  };
  float get_first(foo p) {
    return p.first;
  }

This is the code that global-isel currently generates:
  // /work/llvm/build/bin/clang --target=aarch64--none-eabi -march=armv8-a -c callees.cpp -O0 -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mbig-endian -o - -S

  _Z9get_first3foo:                       // @_Z9get_first3foo
  // BB#0:                                // %entry
          sub     sp, sp, #16             // =16
                                          // implicit-def: %X8
          fmov    w9, s0
          mov     w10, w9
          bfxil   x8, x10, #0, #32
          fmov    w9, s1
          mov     w10, w9
          bfi     x8, x10, #32, #32
          add     x10, sp, #8             // =8
          str     x8, [sp, #8]
          ldr     w9, [x10]
          fmov    s0, w9
          add     sp, sp, #16             // =16
          ret

When run on a big-endian target, this incorrectly returns the second member of the struct, instead of the first.

Oliver

From: qcolombet at apple.com<mailto:qcolombet at apple.com> [mailto:qcolombet at apple.com]
Sent: 14 November 2017 23:11
To: Quentin Colombet
Cc: Oliver Stannard; llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Justin Bogner; Ahmed Bougacha; Aditya Nandakumar; nd
Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

To give an update here, we actually are not missing a mapping. The code complains because we are copying around a fp16 into a gpr32 and that shouldn’t be done with a copy (default mapping).
I extended the repairing code to issue G_ANYEXT in those cases instead of asserting.

However, now, I have to teach instruction select about those ANYEXT otherwise we’ll fallback in that case. But that’s a different story.

I’ll try to commit today or tomorrow (I have to strengthen the tests).

On Nov 14, 2017, at 9:29 AM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Thanks Oliver.
I’ll have a look. This typically means that we miss a mapping for this type/instruction, which is not surprising given how little code we have we fp16.

On Nov 14, 2017, at 2:27 AM, Oliver Stannard <oliver.stannard at arm.com<mailto:oliver.stannard at arm.com>> wrote:

Hi Quentin,

I’ve started running an ABI test suite with global isel on AArch64, and while it hasn’t found any ABI issues it has hit an assertion in clang when using the __fp16 type. Here’s a reproducer:

  __fp16 pass_f16(__fp16 p) {
    return p;
  }

  $ /work/llvm/build/bin/clang --target=aarch64-arm-none-eabi -march=armv8-a -c test.c -O0 -mllvm -global-isel -mllvm -global-isel-abort=0
  clang-6.0: /work/llvm/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:446: static void llvm::RegisterBankInfo::applyDefaultMapping(const llvm::RegisterBankInfo::OperandsMapper &): Assertion `OrigTy.getSizeInBits() == NewTy.getSizeInBits() && "Types with difference size cannot be handled by the default " "mapping"' failed.
  #0 0x000000000362a764 PrintStackTraceSignalHandler(void*) (/work/llvm/build/bin/clang-6.0+0x362a764)
  #1 0x000000000362aac6 SignalHandler(int) (/work/llvm/build/bin/clang-6.0+0x362aac6)
  #2 0x00007f9193b78330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
  #3 0x00007f919276bc37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
  #4 0x00007f919276f028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0
  #5 0x00007f9192764bf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0
  #6 0x00007f9192764ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
  #7 0x0000000003d70eb9 (/work/llvm/build/bin/clang-6.0+0x3d70eb9)
  #8 0x0000000003d6b00c llvm::RegBankSelect::applyMapping(llvm::MachineInstr&, llvm::RegisterBankInfo::InstructionMapping const&, llvm::SmallVectorImpl<llvm::RegBankSelect::RepairingPlacement>&) (/work/llvm/build/bin/clang-6.0+0x3d6b00c)
  #9 0x0000000003d6b366 llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) (/work/llvm/build/bin/clang-6.0+0x3d6b366)
  #10 0x0000000003d6b7f1 llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) (/work/llvm/build/bin/clang-6.0+0x3d6b7f1)
  #11 0x0000000002d934c8 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x2d934c8)
  #12 0x00000000030c998f llvm::FPPassManager::runOnFunction(llvm::Function&) (/work/llvm/build/bin/clang-6.0+0x30c998f)
  #13 0x00000000030c9c53 llvm::FPPassManager::runOnModule(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30c9c53)
  #14 0x00000000030ca136 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/llvm/build/bin/clang-6.0+0x30ca136)
  #15 0x00000000037c3dcf clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/work/llvm/build/bin/clang-6.0+0x37c3dcf)
  #16 0x0000000003d421a0 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/work/llvm/build/bin/clang-6.0+0x3d421a0)
  #17 0x0000000004457376 clang::ParseAST(clang::Sema&, bool, bool) (/work/llvm/build/bin/clang-6.0+0x4457376)
  #18 0x0000000003ca6ea0 clang::FrontendAction::Execute() (/work/llvm/build/bin/clang-6.0+0x3ca6ea0)
  #19 0x0000000003c1fa31 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/work/llvm/build/bin/clang-6.0+0x3c1fa31)
  #20 0x0000000003d3bf4b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/work/llvm/build/bin/clang-6.0+0x3d3bf4b)
  #21 0x0000000001f85629 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/work/llvm/build/bin/clang-6.0+0x1f85629)
  #22 0x0000000001f83096 main (/work/llvm/build/bin/clang-6.0+0x1f83096)
  #23 0x00007f9192756f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0
  #24 0x0000000001f80029 _start (/work/llvm/build/bin/clang-6.0+0x1f80029)
  Stack dump:
  0.      Program arguments: /work/llvm/build/bin/clang-6.0 -cc1 -triple aarch64-arm-none-eabi -emit-obj -mrelax-all -disable-free -main-file-name test.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -fallow-half-arguments-and-returns -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /work/innovation/cctest/test.gcno -resource-dir /work/llvm/build/lib/clang/6.0.0 -O0 -fdebug-compilation-dir /work/innovation/cctest -ferror-limit 19 -fmessage-length 226 -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -mllvm -global-isel -mllvm -global-isel-abort=0 -o test.o -x c test.c
  1.      <eof> parser at end of file
  2.      Code generation
  3.      Running pass 'Function Pass Manager' on module 'test.c'.
  4.      Running pass 'RegBankSelect' on function '@pass_f16'
  clang-6.0: error: unable to execute command: Aborted (core dumped)
  clang-6.0: error: clang frontend command failed due to signal (use -v to see invocation)
  clang version 6.0.0 (ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/clang aa2b9952ef98a5fe2d47384ef17106855b8bae51) (ssh://olista01@ds-gerrit.euhpc.arm.com:29418/armcompiler/llvm 29f89772107a79b5f2a816d4748ed9c19416c1b6)
  Target: aarch64-arm-none-eabi
  Thread model: posix
  InstalledDir: /work/llvm/build/bin
  clang-6.0: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
  clang-6.0: note: diagnostic msg:
  ********************

  PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
  Preprocessed source(s) and associated run script(s) are located at:
  clang-6.0: note: diagnostic msg: /tmp/test-e06964.c
  clang-6.0: note: diagnostic msg: /tmp/test-e06964.sh
  clang-6.0: note: diagnostic msg:

  ********************

Oliver

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Quentin Colombet via llvm-dev
Sent: 13 November 2017 18:27
To: Kristof Beyls
Cc: llvm-dev; nd; Ahmed Bougacha; Justin Bogner; Aditya Nandakumar
Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Hi Kristof,


On Nov 13, 2017, at 9:10 AM, Kristof Beyls <Kristof.Beyls at arm.com<mailto:Kristof.Beyls at arm.com>> wrote:

Hi Quentin,

My only remaining concern is around ABI compatibility.
The following commit seems to indicate that in the previous round of evaluation, we didn’t find an existing ABI compatibility issue:
http://llvm.org/viewvc/llvm-project?view=revision&revision=311388.
I haven’t looked into the details of this issue - so maybe I’m worried over nothing?

No, you’re right. The problem with ABI is if you are consistently wrong, then you won’t notice :).


I’m wondering if since then on your side you did any testing around ABI compatibility?
E.g. building software where you semi-randomly build some functions through GlobalISel and some functions through DAGISel?

Justin will look into that. Clang has utility script for that utils/ABITest.

Given we will only be able to check iOS ABI, you may want to follow the same kind of validation on your side.

I let you sync up with Justin for the method.

Cheers,
-Quentin


Thanks,

Kristof

On 8 Nov 2017, at 00:42, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi all,

I’d like to resurrect this thread and ask if people are on board for enabling this by default for AArch64 O0.


*** What Changed Since June? ***

- We added a way to describe the legalization actions for non-power-of-2
- We gave a tutorial that covers the best practices to target GlobalISel
- We improved the TableGen backend to reuse existing SDISel patterns
- We built and ran huge internal software with GISel
- We evaluated the performance of GISel and are confident things are in a good shape (with https://reviews.llvm.org/D39034) and moving forward would look even better (see the last LLVM Dev talk: GlobalISel: Present, Past, and Future when it is available)


*** So What’s he Plan? ***

- Switch the default instruction selector to GISel for AArch64 at O0
- Enable the fallback path by default for AArch64 (with warnings enabled when that path is hit)
- Provide a clang option to turn GISel off

What do you think?

Thanks,
-Quentin

On Jun 16, 2017, at 4:43 PM, Quentin Colombet <qcolombet at apple.com<mailto:qcolombet at apple.com>> wrote:

Hi all,

We had some internal discussions about flipping the default for O0 and we concluded that we wanted to postpone it.


*** Why Is That? ***

We don’t want to send the wrong message that GlobalISel’s design is set in stone and ready for broader adoption.
In particular,
1. The APIs are still evolving and can still possibly change significantly
2. The TableGen backend to reuse the existing SD patterns is still at its early stage
3. We want to investigate closely the performance of global-isel (compile-time, runtime, code size, fallbacks)

The rationale behind those items is that we want to minimize the pain of moving forward for everybody. We also want the out-of-the-box experience to be pleasant (like all/most of the tablegen patterns just work, we have documentation on how to target a new backend, etc.) Finally, we want to gain confidence we are going to be able to address the performance issues we have with the current design and if not, derive a plan for that.

We purposely left out of the conversation what will be the right time and requirements to flip the switch. We want to gather more data first. Your help would be appreciated!


*** Short-Term Proposal ***

What we would like to do instead short-term is:
A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to enable GISel with fallbacks and warnings enables (i.e., equivalent of -global-isel -global-isel-abort=2)
B. Advertise this option in the next open source release to allow compiler enthusiastic to try it and report problems
C. Have GISel always built so we can push thing in the right place, MachineVerifier in mind, and stop doing some weird gymnastic

What do people think?


*** Your Help Is Needed ***

- Please share your experience in using the GISel APIs and how we can make them better. Moving forward we’ll have those conversations on open source instead of internally/with a narrower audience.
- Report any performance problem you identify
- Propose patches!

Cheers,
-Quentin



On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:


On Jun 14, 2017, at 7:27 AM, Diana Picus <diana.picus at linaro.org<mailto:diana.picus at linaro.org>> wrote:

On 12 June 2017 at 18:54, Diana Picus <diana.picus at linaro.org<mailto:diana.picus at linaro.org>> wrote:
Hi all,

I added a buildbot [1] running the test-suite with -O0 -global-isel. It runs into the same 2 timeouts that I reported previously on this thread (paq8p and scimark2). It would be nice to make it green before flipping the switch.


I did some more investigations on a machine similar to the one running the buildbot. For paq8p and scimark2, I get these results for O0:

PAQ8p:
Fast isel: 666.344
Global isel: 731.384

SciMark2-C:
Fast isel: 463.908
Global isel: 496.22

The current timeout is 500s (so in this particular case we didn't hit it for scimark2, and it ran successfully to completion). I don't think the difference between FastISel and GlobalISel is too atrocious, so I would propose increasing the timeout for these 2 benchmarks. I'm not sure if we can do this on a per-bot basis, but I see some precedent for setting custom timeout thresholds for various benchmarks on different architectures (sometimes with comments that it's done so we can run O0 on that particular benchmark).

Something along these lines works:
https://reviews.llvm.org/differential/diff/102547/

What do you guys think about this approach?

Looks reasonable to me.


Thanks,
Diana

PS: The buildbot is using the Makefiles because that's what our other AArch64 test-suite bots use. Moving all of them to CMake is a transition for another time.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171117/783ed617/attachment-0001.html>


More information about the llvm-dev mailing list