[llvm-dev] Multi-Threading Compilers

Sat Feb 29 18:38:24 PST 2020

On Sat, Feb 29, 2020 at 5:14 PM Nicholas Krause via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On 2/29/20 7:23 PM, River Riddle wrote:
>
>
>
> On Sat, Feb 29, 2020 at 4:00 PM Nicholas Krause <xerofoify at gmail.com>
> wrote:
>
>>
>>
>> On 2/29/20 6:17 PM, River Riddle via llvm-dev wrote:
>>
>>
>>
>> On Sat, Feb 29, 2020 at 2:25 PM David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Sat, Feb 29, 2020 at 2:19 PM Chris Lattner <clattner at nondot.org>
>>> wrote:
>>>
>>>> On Feb 29, 2020, at 2:08 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>
>>>> I've
>>>>> curious as
>>>>> to how MLIR deals with IPO as that's the problem I was running into.
>>>>>
>>>>
>>>> FWIW I believe LLVM's new pass manager (NPM) was designed with
>>>> parallelism and the ability to support this situation (that MLIR doesn't?
>>>> Or doesn't to the degree/way in which the NPM does). I'll leave it to folks
>>>> (Chandler probably has the most context here) to provide some more detail
>>>> there if they can/have time.
>>>>
>>>>
>>>> Historically speaking, all of the LLVM pass managers have been designed
>>>> to support multithreaded compilation (check out the ancient history of the
>>>> WritingAnLLVMPass <http://llvm.org/docs/WritingAnLLVMPass.html> doc if
>>>> curious).
>>>>
>>>
>>> I think the specific thing that might'v been a bit different in the NPM
>>> was to do with analysis invalidation in a way that's more parallelism
>>> friendly than the previous one - but I may be
>>> misrepresenting/misundrstanding some of it.
>>>
>>>
>>>> The problem is that LLVM has global use-def chains on constants,
>>>> functions and globals, etc, so it is impractical to do this.  Every
>>>> “inst->setOperand” would have to be able to take locks or use something
>>>> like software transactional memory techniques in their implementation.
>>>> This would be very complicated and very slow.
>>>>
>>>
>>> Oh, yeah - I recall that particular limitation being discussed/not
>>> addressed as yet.
>>>
>>>
>>>> MLIR defines this away from the beginning.  This is a result of the
>>>> core IR design, not the pass manager design itself.
>>>>
>>>
>>> What does MLIR do differently here/how does it define that issue away?
>>> (doesn't have use-lists built-in?)
>>>
>>
>> The major thing is that constants and global-like objects don't produce
>> SSA values and thus don't have use-lists.
>> https://mlir.llvm.org/docs/Rationale/#multithreading-the-compiler discusses
>> this a bit.
>>
>> For constants, the data is stored as an Attribute(context uniqued
>> metadata, have no use-list, not SSA). This attribute can either placed in
>> the attribute list(if the operand is always constant, like for the value of
>> a switch case), otherwise it must be explicitly materialized via some
>> operation. For example, the `std.constant
>> <https://mlir.llvm.org/docs/Dialects/Standard/#constant-operation>`
>> operation will materialize an SSA value from some attribute data.
>>
>> For references to functions and other global-like objects, we have a
>> non-SSA mechanism built around `symbols`. This is essentially using a
>> special attribute to reference the function by-name, instead of by ssa
>> value. You can find more information on MLIR symbols here
>> <https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>.
>>
>> Along with the above, there is a trait that can be attached to operations
>> called `IsolatedFromAbove
>> <https://mlir.llvm.org/docs/Traits/#isolatedfromabove>`. This
>> essentially means that no SSA values defined above a region can be
>> referenced from within that region. The pass manager only allows schedule
>> passes on operations that have this property, meaning that all pipelines
>> are implicitly multi-threaded.
>>
>> The pass manager in MLIR was heavily inspired by the work on the new pass
>> manager in LLVM, but with specific constraints/requirements that are unique
>> to the design of MLIR. That being said, there are some usability features
>> added that would also make great additions to LLVM: instance specific pass
>> options and statistics, pipeline crash reproducer generation, etc.
>>
>> Not sure if any of the above helps clarify, but happy to chat more if you
>> are interested.
>>
>> -- River
>>
>>
>>> - Dave
>>>
>> River,
>> The big thing from my reading of the Pass Manager in MLIR is that it
>> allows us to iterate through
>> a pass per function or module as a group allowing it to run in async.
>> I've proposed this
>> on the GCC side:
>> https://gcc.gnu.org/ml/gcc/2020-02/msg00247.html
>>
>> Its to walk through the IPA passes which are similar to analyze passes on
>> the LLVM side.
>>
>
> Hi Nicholas,
>
> I can't say anything about the GCC side, but this isn't a particularly
> novel aspect of the MLIR pass manager. In many ways, the pass manager is
> the easiest/simplest part of the multi-threading problem. The bigger
> problem is making sure that the rest of the compiler infrastructure is
> structured in a way that is thread-safe, or can be made thread-safe. This
> is why most of the discussion is based around how to model things like
> constants, global values, etc. When I made MLIR multi-threaded a year ago,
> a large majority of my time was spent outside of the pass manager. For a
> real example, I spent much more time just on multi-threaded pass timing
> <https://mlir.llvm.org/docs/WritingAPass/#multi-threaded-pass-timing>
> than making the pass manager itself multi-threaded.
>
> -- River
>
> Actually in my experience, the biggest problem is if we can detect IPO and
> run async guarantees on that. MLIR runs operations but only for a module or
> set of functions
> without this. One of my dreams would be to run passes in parallel
> including IPO detection and stop if it cannot continue pass a IPO pass or
> set of passes due to changes.
>
> Maybe MLIR does do that but its the one bottleneck that is really hard to
> fix,
>

What MLIR does (that would require quite some work in LLVM) is making sure
that you can process and transform functions in isolation, allowing to run
*local* optimizations in parallel. This does not solve the IPO problem
you're after. As I understand it, this is a difficult thing to design, and
it requires consideration about how you think the passes and the
pass-pipeline entirely.

Running function-passes and "local" optimizations in parallel in LLVM isn't
possible because the structures in the LLVMContext aren't thread-safe, and
because the IR itself isn't thread-safe. Something like just DCE or CSE a
function call requires to modify the callee (through its use-list).

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200229/a7633d2b/attachment.html>