[llvm-dev] GlobalISel design update and goals

Thu Aug 2 19:46:42 PDT 2018

Hi Amara,

Thanks for your great job!

MIPS, RISCV and other targets have refactory requirement 
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120098.html

Please give us some suggestion for supporting custom CCState, CCAssignFn 
in D41700. And also RegisterBank in D41653. because it needs to consider 
about how to support variable-sized register classes concept implemented 
in D24631.

I am building Linux Kernel and OpenJDK8 with LLVM toolchain for mips64el:

http://lists.llvm.org/pipermail/llvm-dev/2018-July/124620.html

http://lists.llvm.org/pipermail/llvm-dev/2018-July/124717.html

And migrate to GlobalISel and Machine Scheduler for LoongISA 
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123608.html

My sincere thanks will goto LLVM, Linux Kernel and OpenJDK developers 
who teach me a lot!

在 2018年07月30日 22:01, Amara Emerson via llvm-dev 写道:
> Hi all,
>
> Over the past few months we’ve been doing work on the foundations for 
> the next stages of GlobalISel development. In terms of changes from 
> this time last year, the IR translator, the legalizer, and instruction 
> selector have seen moderate to major changes. The most significant of 
> these was the change to the legalizer API, allowing targets to use 
> predicates to express legality, which gives more precise control over 
> what forms of instructions are legal, and how to legalize them. This 
> was necessary to implement support for the new extending loads and 
> truncating stores, but also results in more concise and elegant 
> expressions of legality for each target. For example, you can now 
> apple a single definition to apply to multiples opcodes (G_ADD, G_SUB, 
> G_MUL etc).
>
> The IR translator has been modified to split aggregates rather than 
> handling them as one single large scalar. This change fixed some bugs 
> and was necessary in order handle big endian code correctly in future.
>
> The tablegen instruction selector also saw significant improvements in 
> performance, helping to keep overall compile time regression vs 
> fastisel to be <5% geomean on CTMark. There are still a few outliers 
> like sqlite3 which has a significant regression compared to FastISel, 
> but most of the other benchmarks show little difference or even 
> improvement.
>
> The tablegen importer has had improvements made to it, so that we can 
> import more SelectionDAG selection rules. For example, currently on 
> AArch64 we have about 40% of the rules being successfully imported.
>
> New additions from last year include the beginnings of a new combiner, 
> although there’s still significant work to be done here in terms of 
> the final design. The combiner will become a critical part of the 
> pipeline in order to begin improving runtime performance.
>
> *High levels goals*
>
> Going forward, we plan to improve GlobalISel in a number of key areas 
> to achieve the following targets:
>  * Keeping compile time under control, ideally within 5% of FastISel, 
> and when optimizations are enabled to maintain a compile time 
> advantage of SelectionDAG.
>  * Begin improving runtime performance by adding the most important 
> optimizations required to be competitive at -Os. We will be targeting 
> and measuring AArch64 for this goal but will endeavor to implement as 
> many optimizations as possible in generic code to benefit other targets.
>  * Improving overall stability and test coverage. Maintaining a high 
> level of code quality and minimizing regressions in correctness and 
> performance will be a significant challenge.
>  * Ensure that the overall design meets the needs of general targets, 
> not being overly tuned to a specific implementation.
>
> *Design work planned*
>
> These are some design changes coming in the near to medium term future:
>
>  * The G_MERGE and G_UNMERGE opcodes will be split into separate 
> opcodes to handle different use cases. At the moment the opcode is too 
> powerful, resulting in overly complex handling in places like the 
> legalizer. G_MERGE will be split so that it only handles merging of 
> scalars into one larger scalar. For other cases like merging scalars 
> into a vector we will create a new G_BUILD_VECTOR opcode, with a new 
> counterpart opcode for doing the opposite. For the current vector + 
> vector case a new G_CONCAT_VECTOR will be introduced. With these 
> changes it should simplify implementations for all targets.
>
>  * Constant representation at the MI level needs some investigation. 
> We currently represent constants as generic instructions, with each 
> instance of a constant being largely independent of each other, being 
> stored in the entry block except for a few places in IR translation 
> where we emit at the point of use. As a result we run a localizer pass 
> in an effort to reduce the live ranges of the constants (and the 
> consequent spilling), using some heuristics to decide where to sink 
> the constant definitions to.
>
> Since we don’t do any real caching of MI constants, multiple 
> G_CONSTANT definitions can exist for the same constant. This can also 
> result in a lot of redundant constants being created, especially for 
> things like address computation. Reducing the number of constants can 
> help reduce compile time and memory usage. Given this situation, one 
> possible approach is to encode constants into the operands of the 
> users, rather than have dedicated machine instructions. At instruction 
> selection time the constant can then be materialized into a register 
> or encoded as an immediate. Further investigation is needed to find 
> the right way forward here.
>
>  * For optimizations to be supported, the combiner will become a 
> crucial part of the GISel pipeline. We have already done some 
> preliminary work in a generic combiner, which will be used to 
> eventually support combines of extloads/truncstores. We’ve had 
> discussions on and off list about what we need from the new combiner. 
> The summary is that we want the combiner to be flexible for each 
> target to select from a library of combines, being as efficient as 
> possible. The expression of the combines are currently written in C++, 
> but one piece of investigation work we might do is to prototype using 
> the same tablegen driven instruction selector code to match 
> declarative combine patterns written in tablegen. Regardless, we will 
> need to support the custom C++ use case.
>
>  * CSE throughout the pipeline. From a theoretical perspective, having 
> a self contained CSE pass that operates as a single phase in the 
> pipeline is attractive for the simplicity and elegance. However, we 
> know empirically that this is expensive in compile time. Not only does 
> the CSE pass itself take a non-negligible time to run, but having it 
> as a late pass can result in the non-CSE’d code from the IRTranslator 
> onwards surviving for a long time, taking up time in analysis at each 
> stage of compilation. We believe running a light weight CSE early is a 
> win. SelectionDAG currently does CSE by default when building the DAG, 
> and this is something we could explore as part of a custom IRBuilder.
>
>  * Known bits computation. Some optimizations require the knowledge of 
> which bits in a value are known to be 1 or 0, and do this by using the 
> computeKnownBits() capability for SelectionDAG nodes. We will need 
> some way of getting the same information. In an ideal scenario the 
> replacement infrastructure for this will be more efficient, as this 
> part of the codebase seems to be disproportionately responsible for 
> pathological compile time regressions.
>
>  * Load/store ordering needs some thought, as we currently don’t have 
> a way to easily check at the MI level what the ordering requirements 
> are on a set of memory operations. SelectionDAG uses the chains to 
> ensure that they’re scheduled to respect the orderings. How to achieve 
> the same thing remains an open question for GlobalISel.
>
>  * More extensive tests that exercise multiple stages of the pipeline. 
> One advantage of using MIR with GISel is that individual passes can be 
> easily tested by feeding the exact input expected for a particular 
> pass, and checking the immediate output of the pass. However this 
> approach can leave holes in the test coverage. To help mitigate this, 
> we will be exploring writing/generating whole pipeline tests, tracking 
> some IR through each pass and checking how the MIR is mutated. We 
> currently also have a proposed change to allow usage of FileCheck as a 
> library, not just as a stand-alone tool. This would allow us to use 
> FileCheck style checks and Improve testing of currently unused code paths.
>
>
> *Roadmap for enabling optimizations*
>
> I’ve filed a few PRs that people can follow or comment on to track the 
> progress towards enabling the -Os optimization level. The rough 
> outline is:
>
> PR 38365 - [AArch64][GlobalISel] Never fall back on CTMark or 
> benchmarks (Darwin)
> PR 38366 - GlobalISel: Lightweight CSE
> PR 32561 - GlobalISel: placement of constants in the entry-block and 
> fast regalloc result in lots of reloaded constant
> PR 38367 - GlobalISel: Implement support for obtaining known bits 
> information
> PR 38368 - GlobalISel: Investigate an efficient way to ensure 
> load/store orderings
>
> These, along with general design and implementation work on the 
> combiner, will then lead onto a long road of performance analysis, 
> inevitable bug fixing, and implementing more optimizations.
>
> If anyone is interested in discussing in more detail, feel free to 
> reach out on the list, or to any of the GlobalISel developers. We’d 
> especially like to hear about any issues or concerns about porting 
> targets to GlobalISel.
>
> Thanks,
> Amara
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Regards,
Leslie Zhai