[llvm-dev] GlobalISel design update and goals
Leslie Zhai via llvm-dev
llvm-dev at lists.llvm.org
Thu Aug 2 19:46:42 PDT 2018
Hi Amara,
Thanks for your great job!
MIPS, RISCV and other targets have refactory requirement
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120098.html
Please give us some suggestion for supporting custom CCState, CCAssignFn
in D41700. And also RegisterBank in D41653. because it needs to consider
about how to support variable-sized register classes concept implemented
in D24631.
I am building Linux Kernel and OpenJDK8 with LLVM toolchain for mips64el:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124620.html
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124717.html
And migrate to GlobalISel and Machine Scheduler for LoongISA
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123608.html
My sincere thanks will goto LLVM, Linux Kernel and OpenJDK developers
who teach me a lot!
在 2018年07月30日 22:01, Amara Emerson via llvm-dev 写道:
> Hi all,
>
> Over the past few months we’ve been doing work on the foundations for
> the next stages of GlobalISel development. In terms of changes from
> this time last year, the IR translator, the legalizer, and instruction
> selector have seen moderate to major changes. The most significant of
> these was the change to the legalizer API, allowing targets to use
> predicates to express legality, which gives more precise control over
> what forms of instructions are legal, and how to legalize them. This
> was necessary to implement support for the new extending loads and
> truncating stores, but also results in more concise and elegant
> expressions of legality for each target. For example, you can now
> apple a single definition to apply to multiples opcodes (G_ADD, G_SUB,
> G_MUL etc).
>
> The IR translator has been modified to split aggregates rather than
> handling them as one single large scalar. This change fixed some bugs
> and was necessary in order handle big endian code correctly in future.
>
> The tablegen instruction selector also saw significant improvements in
> performance, helping to keep overall compile time regression vs
> fastisel to be <5% geomean on CTMark. There are still a few outliers
> like sqlite3 which has a significant regression compared to FastISel,
> but most of the other benchmarks show little difference or even
> improvement.
>
> The tablegen importer has had improvements made to it, so that we can
> import more SelectionDAG selection rules. For example, currently on
> AArch64 we have about 40% of the rules being successfully imported.
>
> New additions from last year include the beginnings of a new combiner,
> although there’s still significant work to be done here in terms of
> the final design. The combiner will become a critical part of the
> pipeline in order to begin improving runtime performance.
>
> *High levels goals*
>
> Going forward, we plan to improve GlobalISel in a number of key areas
> to achieve the following targets:
> * Keeping compile time under control, ideally within 5% of FastISel,
> and when optimizations are enabled to maintain a compile time
> advantage of SelectionDAG.
> * Begin improving runtime performance by adding the most important
> optimizations required to be competitive at -Os. We will be targeting
> and measuring AArch64 for this goal but will endeavor to implement as
> many optimizations as possible in generic code to benefit other targets.
> * Improving overall stability and test coverage. Maintaining a high
> level of code quality and minimizing regressions in correctness and
> performance will be a significant challenge.
> * Ensure that the overall design meets the needs of general targets,
> not being overly tuned to a specific implementation.
>
> *Design work planned*
>
> These are some design changes coming in the near to medium term future:
>
> * The G_MERGE and G_UNMERGE opcodes will be split into separate
> opcodes to handle different use cases. At the moment the opcode is too
> powerful, resulting in overly complex handling in places like the
> legalizer. G_MERGE will be split so that it only handles merging of
> scalars into one larger scalar. For other cases like merging scalars
> into a vector we will create a new G_BUILD_VECTOR opcode, with a new
> counterpart opcode for doing the opposite. For the current vector +
> vector case a new G_CONCAT_VECTOR will be introduced. With these
> changes it should simplify implementations for all targets.
>
> * Constant representation at the MI level needs some investigation.
> We currently represent constants as generic instructions, with each
> instance of a constant being largely independent of each other, being
> stored in the entry block except for a few places in IR translation
> where we emit at the point of use. As a result we run a localizer pass
> in an effort to reduce the live ranges of the constants (and the
> consequent spilling), using some heuristics to decide where to sink
> the constant definitions to.
>
> Since we don’t do any real caching of MI constants, multiple
> G_CONSTANT definitions can exist for the same constant. This can also
> result in a lot of redundant constants being created, especially for
> things like address computation. Reducing the number of constants can
> help reduce compile time and memory usage. Given this situation, one
> possible approach is to encode constants into the operands of the
> users, rather than have dedicated machine instructions. At instruction
> selection time the constant can then be materialized into a register
> or encoded as an immediate. Further investigation is needed to find
> the right way forward here.
>
> * For optimizations to be supported, the combiner will become a
> crucial part of the GISel pipeline. We have already done some
> preliminary work in a generic combiner, which will be used to
> eventually support combines of extloads/truncstores. We’ve had
> discussions on and off list about what we need from the new combiner.
> The summary is that we want the combiner to be flexible for each
> target to select from a library of combines, being as efficient as
> possible. The expression of the combines are currently written in C++,
> but one piece of investigation work we might do is to prototype using
> the same tablegen driven instruction selector code to match
> declarative combine patterns written in tablegen. Regardless, we will
> need to support the custom C++ use case.
>
> * CSE throughout the pipeline. From a theoretical perspective, having
> a self contained CSE pass that operates as a single phase in the
> pipeline is attractive for the simplicity and elegance. However, we
> know empirically that this is expensive in compile time. Not only does
> the CSE pass itself take a non-negligible time to run, but having it
> as a late pass can result in the non-CSE’d code from the IRTranslator
> onwards surviving for a long time, taking up time in analysis at each
> stage of compilation. We believe running a light weight CSE early is a
> win. SelectionDAG currently does CSE by default when building the DAG,
> and this is something we could explore as part of a custom IRBuilder.
>
> * Known bits computation. Some optimizations require the knowledge of
> which bits in a value are known to be 1 or 0, and do this by using the
> computeKnownBits() capability for SelectionDAG nodes. We will need
> some way of getting the same information. In an ideal scenario the
> replacement infrastructure for this will be more efficient, as this
> part of the codebase seems to be disproportionately responsible for
> pathological compile time regressions.
>
> * Load/store ordering needs some thought, as we currently don’t have
> a way to easily check at the MI level what the ordering requirements
> are on a set of memory operations. SelectionDAG uses the chains to
> ensure that they’re scheduled to respect the orderings. How to achieve
> the same thing remains an open question for GlobalISel.
>
> * More extensive tests that exercise multiple stages of the pipeline.
> One advantage of using MIR with GISel is that individual passes can be
> easily tested by feeding the exact input expected for a particular
> pass, and checking the immediate output of the pass. However this
> approach can leave holes in the test coverage. To help mitigate this,
> we will be exploring writing/generating whole pipeline tests, tracking
> some IR through each pass and checking how the MIR is mutated. We
> currently also have a proposed change to allow usage of FileCheck as a
> library, not just as a stand-alone tool. This would allow us to use
> FileCheck style checks and Improve testing of currently unused code paths.
>
>
> *Roadmap for enabling optimizations*
>
> I’ve filed a few PRs that people can follow or comment on to track the
> progress towards enabling the -Os optimization level. The rough
> outline is:
>
> PR 38365 - [AArch64][GlobalISel] Never fall back on CTMark or
> benchmarks (Darwin)
> PR 38366 - GlobalISel: Lightweight CSE
> PR 32561 - GlobalISel: placement of constants in the entry-block and
> fast regalloc result in lots of reloaded constant
> PR 38367 - GlobalISel: Implement support for obtaining known bits
> information
> PR 38368 - GlobalISel: Investigate an efficient way to ensure
> load/store orderings
>
> These, along with general design and implementation work on the
> combiner, will then lead onto a long road of performance analysis,
> inevitable bug fixing, and implementing more optimizations.
>
> If anyone is interested in discussing in more detail, feel free to
> reach out on the list, or to any of the GlobalISel developers. We’d
> especially like to hear about any issues or concerns about porting
> targets to GlobalISel.
>
> Thanks,
> Amara
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Regards,
Leslie Zhai
More information about the llvm-dev
mailing list