[llvm-dev] GlobalISel design update and goals
Leslie Zhai via llvm-dev
llvm-dev at lists.llvm.org
Thu Aug 2 20:56:20 PDT 2018
Hi LLVM and HotSpot developers,
I just experienced 4+ months HotSpot C1 MIPS porting. And my sincere
thanks will goto HotSpot developers who taught me a lot!
As an apprentice in the compiler world, I have some questions:
* There is no instruction selection "concept" equivalent to LLVM's
SelectionDAG and GlobalISel in HotSpot C1? Because I manually write
assembly[1] lowing HIR to LIR in HotSpot C1. So which one is better?
LLVM or HotSpot selection by human?
* Why not use Greedy, just like LLVM's RegAllocGreedy, to take place of
Linear Scan[2] for HotSpot C1's register allocation?
Please teach me, thanks a lot!
1.
http://hg.loongnix.org/jdk8-mips64-public/hotspot/file/tip/src/cpu/mips/vm/c1_LIRAssembler_mips.cpp#l1542
2.
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-March/028545.html
在 2018年08月03日 10:46, Leslie Zhai 写道:
> Hi Amara,
>
> Thanks for your great job!
>
> MIPS, RISCV and other targets have refactory requirement
> http://lists.llvm.org/pipermail/llvm-dev/2018-January/120098.html
>
> Please give us some suggestion for supporting custom CCState,
> CCAssignFn in D41700. And also RegisterBank in D41653. because it
> needs to consider about how to support variable-sized register classes
> concept implemented in D24631.
>
> I am building Linux Kernel and OpenJDK8 with LLVM toolchain for mips64el:
>
> http://lists.llvm.org/pipermail/llvm-dev/2018-July/124620.html
>
> http://lists.llvm.org/pipermail/llvm-dev/2018-July/124717.html
>
> And migrate to GlobalISel and Machine Scheduler for LoongISA
> http://lists.llvm.org/pipermail/llvm-dev/2018-May/123608.html
>
> My sincere thanks will goto LLVM, Linux Kernel and OpenJDK developers
> who teach me a lot!
>
>
> 在 2018年07月30日 22:01, Amara Emerson via llvm-dev 写道:
>> Hi all,
>>
>> Over the past few months we’ve been doing work on the foundations for
>> the next stages of GlobalISel development. In terms of changes from
>> this time last year, the IR translator, the legalizer, and
>> instruction selector have seen moderate to major changes. The most
>> significant of these was the change to the legalizer API, allowing
>> targets to use predicates to express legality, which gives more
>> precise control over what forms of instructions are legal, and how to
>> legalize them. This was necessary to implement support for the new
>> extending loads and truncating stores, but also results in more
>> concise and elegant expressions of legality for each target. For
>> example, you can now apple a single definition to apply to multiples
>> opcodes (G_ADD, G_SUB, G_MUL etc).
>>
>> The IR translator has been modified to split aggregates rather than
>> handling them as one single large scalar. This change fixed some bugs
>> and was necessary in order handle big endian code correctly in future.
>>
>> The tablegen instruction selector also saw significant improvements
>> in performance, helping to keep overall compile time regression vs
>> fastisel to be <5% geomean on CTMark. There are still a few outliers
>> like sqlite3 which has a significant regression compared to FastISel,
>> but most of the other benchmarks show little difference or even
>> improvement.
>>
>> The tablegen importer has had improvements made to it, so that we can
>> import more SelectionDAG selection rules. For example, currently on
>> AArch64 we have about 40% of the rules being successfully imported.
>>
>> New additions from last year include the beginnings of a new
>> combiner, although there’s still significant work to be done here in
>> terms of the final design. The combiner will become a critical part
>> of the pipeline in order to begin improving runtime performance.
>>
>> *High levels goals*
>>
>> Going forward, we plan to improve GlobalISel in a number of key areas
>> to achieve the following targets:
>> * Keeping compile time under control, ideally within 5% of FastISel,
>> and when optimizations are enabled to maintain a compile time
>> advantage of SelectionDAG.
>> * Begin improving runtime performance by adding the most important
>> optimizations required to be competitive at -Os. We will be targeting
>> and measuring AArch64 for this goal but will endeavor to implement as
>> many optimizations as possible in generic code to benefit other targets.
>> * Improving overall stability and test coverage. Maintaining a high
>> level of code quality and minimizing regressions in correctness and
>> performance will be a significant challenge.
>> * Ensure that the overall design meets the needs of general targets,
>> not being overly tuned to a specific implementation.
>>
>> *Design work planned*
>>
>> These are some design changes coming in the near to medium term future:
>>
>> * The G_MERGE and G_UNMERGE opcodes will be split into separate
>> opcodes to handle different use cases. At the moment the opcode is
>> too powerful, resulting in overly complex handling in places like the
>> legalizer. G_MERGE will be split so that it only handles merging of
>> scalars into one larger scalar. For other cases like merging scalars
>> into a vector we will create a new G_BUILD_VECTOR opcode, with a new
>> counterpart opcode for doing the opposite. For the current vector +
>> vector case a new G_CONCAT_VECTOR will be introduced. With these
>> changes it should simplify implementations for all targets.
>>
>> * Constant representation at the MI level needs some investigation.
>> We currently represent constants as generic instructions, with each
>> instance of a constant being largely independent of each other, being
>> stored in the entry block except for a few places in IR translation
>> where we emit at the point of use. As a result we run a localizer
>> pass in an effort to reduce the live ranges of the constants (and the
>> consequent spilling), using some heuristics to decide where to sink
>> the constant definitions to.
>>
>> Since we don’t do any real caching of MI constants, multiple
>> G_CONSTANT definitions can exist for the same constant. This can also
>> result in a lot of redundant constants being created, especially for
>> things like address computation. Reducing the number of constants can
>> help reduce compile time and memory usage. Given this situation, one
>> possible approach is to encode constants into the operands of the
>> users, rather than have dedicated machine instructions. At
>> instruction selection time the constant can then be materialized into
>> a register or encoded as an immediate. Further investigation is
>> needed to find the right way forward here.
>>
>> * For optimizations to be supported, the combiner will become a
>> crucial part of the GISel pipeline. We have already done some
>> preliminary work in a generic combiner, which will be used to
>> eventually support combines of extloads/truncstores. We’ve had
>> discussions on and off list about what we need from the new combiner.
>> The summary is that we want the combiner to be flexible for each
>> target to select from a library of combines, being as efficient as
>> possible. The expression of the combines are currently written in
>> C++, but one piece of investigation work we might do is to prototype
>> using the same tablegen driven instruction selector code to match
>> declarative combine patterns written in tablegen. Regardless, we will
>> need to support the custom C++ use case.
>>
>> * CSE throughout the pipeline. From a theoretical perspective,
>> having a self contained CSE pass that operates as a single phase in
>> the pipeline is attractive for the simplicity and elegance. However,
>> we know empirically that this is expensive in compile time. Not only
>> does the CSE pass itself take a non-negligible time to run, but
>> having it as a late pass can result in the non-CSE’d code from the
>> IRTranslator onwards surviving for a long time, taking up time in
>> analysis at each stage of compilation. We believe running a light
>> weight CSE early is a win. SelectionDAG currently does CSE by default
>> when building the DAG, and this is something we could explore as part
>> of a custom IRBuilder.
>>
>> * Known bits computation. Some optimizations require the knowledge
>> of which bits in a value are known to be 1 or 0, and do this by using
>> the computeKnownBits() capability for SelectionDAG nodes. We will
>> need some way of getting the same information. In an ideal scenario
>> the replacement infrastructure for this will be more efficient, as
>> this part of the codebase seems to be disproportionately responsible
>> for pathological compile time regressions.
>>
>> * Load/store ordering needs some thought, as we currently don’t have
>> a way to easily check at the MI level what the ordering requirements
>> are on a set of memory operations. SelectionDAG uses the chains to
>> ensure that they’re scheduled to respect the orderings. How to
>> achieve the same thing remains an open question for GlobalISel.
>>
>> * More extensive tests that exercise multiple stages of the
>> pipeline. One advantage of using MIR with GISel is that individual
>> passes can be easily tested by feeding the exact input expected for a
>> particular pass, and checking the immediate output of the pass.
>> However this approach can leave holes in the test coverage. To help
>> mitigate this, we will be exploring writing/generating whole pipeline
>> tests, tracking some IR through each pass and checking how the MIR is
>> mutated. We currently also have a proposed change to allow usage of
>> FileCheck as a library, not just as a stand-alone tool. This would
>> allow us to use FileCheck style checks and Improve testing of
>> currently unused code paths.
>>
>>
>> *Roadmap for enabling optimizations*
>>
>> I’ve filed a few PRs that people can follow or comment on to track
>> the progress towards enabling the -Os optimization level. The rough
>> outline is:
>>
>> PR 38365 - [AArch64][GlobalISel] Never fall back on CTMark or
>> benchmarks (Darwin)
>> PR 38366 - GlobalISel: Lightweight CSE
>> PR 32561 - GlobalISel: placement of constants in the entry-block and
>> fast regalloc result in lots of reloaded constant
>> PR 38367 - GlobalISel: Implement support for obtaining known bits
>> information
>> PR 38368 - GlobalISel: Investigate an efficient way to ensure
>> load/store orderings
>>
>> These, along with general design and implementation work on the
>> combiner, will then lead onto a long road of performance analysis,
>> inevitable bug fixing, and implementing more optimizations.
>>
>> If anyone is interested in discussing in more detail, feel free to
>> reach out on the list, or to any of the GlobalISel developers. We’d
>> especially like to hear about any issues or concerns about porting
>> targets to GlobalISel.
>>
>> Thanks,
>> Amara
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Regards,
Leslie Zhai
More information about the llvm-dev
mailing list