[llvm-dev] Multi-Threading Compilers

Sat Feb 29 22:36:00 PST 2020

On 2/29/20 9:38 PM, Mehdi AMINI wrote:
>
>
> On Sat, Feb 29, 2020 at 5:14 PM Nicholas Krause via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>
>
>     On 2/29/20 7:23 PM, River Riddle wrote:
>>
>>
>>     On Sat, Feb 29, 2020 at 4:00 PM Nicholas Krause
>>     <xerofoify at gmail.com <mailto:xerofoify at gmail.com>> wrote:
>>
>>
>>
>>         On 2/29/20 6:17 PM, River Riddle via llvm-dev wrote:
>>>
>>>
>>>         On Sat, Feb 29, 2020 at 2:25 PM David Blaikie via llvm-dev
>>>         <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>>>         wrote:
>>>
>>>
>>>
>>>             On Sat, Feb 29, 2020 at 2:19 PM Chris Lattner
>>>             <clattner at nondot.org <mailto:clattner at nondot.org>> wrote:
>>>
>>>                 On Feb 29, 2020, at 2:08 PM, David Blaikie
>>>                 <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>
>>>>                     I've
>>>>                     curious as
>>>>                     to how MLIR deals with IPO as that's the
>>>>                     problem I was running into.
>>>>
>>>>
>>>>                 FWIW I believe LLVM's new pass manager (NPM) was
>>>>                 designed with parallelism and the ability to
>>>>                 support this situation (that MLIR doesn't? Or
>>>>                 doesn't to the degree/way in which the NPM does).
>>>>                 I'll leave it to folks (Chandler probably has the
>>>>                 most context here) to provide some more detail
>>>>                 there if they can/have time.
>>>
>>>                 Historically speaking, all of the LLVM pass managers
>>>                 have been designed to support multithreaded
>>>                 compilation (check out the ancient history of the
>>>                 WritingAnLLVMPass
>>>                 <http://llvm.org/docs/WritingAnLLVMPass.html> doc if
>>>                 curious).
>>>
>>>
>>>             I think the specific thing that might'v been a bit
>>>             different in the NPM was to do with analysis
>>>             invalidation in a way that's more parallelism friendly
>>>             than the previous one - but I may be
>>>             misrepresenting/misundrstanding some of it.
>>>
>>>                 The problem is that LLVM has global use-def chains
>>>                 on constants, functions and globals, etc, so it is
>>>                 impractical to do this. Every “inst->setOperand”
>>>                 would have to be able to take locks or use something
>>>                 like software transactional memory techniques in
>>>                 their implementation.  This would be very
>>>                 complicated and very slow.
>>>
>>>
>>>             Oh, yeah - I recall that particular limitation being
>>>             discussed/not addressed as yet.
>>>
>>>                 MLIR defines this away from the beginning.  This is
>>>                 a result of the core IR design, not the pass manager
>>>                 design itself.
>>>
>>>
>>>             What does MLIR do differently here/how does it define
>>>             that issue away? (doesn't have use-lists built-in?)
>>>
>>>
>>>         The major thing is that constants and global-like objects
>>>         don't produce SSA values and thus don't have use-lists.
>>>         https://mlir.llvm.org/docs/Rationale/#multithreading-the-compiler discusses
>>>         this a bit.
>>>
>>>         For constants, the data is stored as an Attribute(context
>>>         uniqued metadata, have no use-list, not SSA). This attribute
>>>         can either placed in the attribute list(if the operand is
>>>         always constant, like for the value of a switch case),
>>>         otherwise it must be explicitly materialized via some
>>>         operation. For example, the `std.constant
>>>         <https://mlir.llvm.org/docs/Dialects/Standard/#constant-operation>`
>>>         operation will materialize an SSA value from some attribute
>>>         data.
>>>
>>>         For references to functions and other global-like objects,
>>>         we have a non-SSA mechanism built around `symbols`. This is
>>>         essentially using a special attribute to reference the
>>>         function by-name, instead of by ssa value. You can find more
>>>         information on MLIR symbols here
>>>         <https://mlir.llvm.org/docs/SymbolsAndSymbolTables/>.
>>>
>>>         Along with the above, there is a trait that can be attached
>>>         to operations called `IsolatedFromAbove
>>>         <https://mlir.llvm.org/docs/Traits/#isolatedfromabove>`.
>>>         This essentially means that no SSA values defined above a
>>>         region can be referenced from within that region. The pass
>>>         manager only allows schedule passes on operations that have
>>>         this property, meaning that all pipelines are implicitly
>>>         multi-threaded.
>>>
>>>         The pass manager in MLIR was heavily inspired by the work on
>>>         the new pass manager in LLVM, but with specific
>>>         constraints/requirements that are unique to the design of
>>>         MLIR. That being said, there are some usability features
>>>         added that would also make great additions to LLVM: instance
>>>         specific pass options and statistics, pipeline crash
>>>         reproducer generation, etc.
>>>
>>>         Not sure if any of the above helps clarify, but happy to
>>>         chat more if you are interested.
>>>
>>>         -- River
>>>
>>>             - Dave
>>>
>>         River,
>>         The big thing from my reading of the Pass Manager in MLIR is
>>         that it allows us to iterate through
>>         a pass per function or module as a group allowing it to run
>>         in async. I've proposed this
>>         on the GCC side:
>>         https://gcc.gnu.org/ml/gcc/2020-02/msg00247.html
>>
>>         Its to walk through the IPA passes which are similar to
>>         analyze passes on the LLVM side.
>>
>>
>>     Hi Nicholas,
>>
>>     I can't say anything about the GCC side, but this isn't a
>>     particularly novel aspect of the MLIR pass manager. In many ways,
>>     the pass manager is the easiest/simplest part of the
>>     multi-threading problem. The bigger problem is making sure that
>>     the rest of the compiler infrastructure is structured in a way
>>     that is thread-safe, or can be made thread-safe. This is why most
>>     of the discussion is based around how to model things like
>>     constants, global values, etc. When I made MLIR multi-threaded a
>>     year ago, a large majority of my time was spent outside of the
>>     pass manager. For a real example, I spent much more time just on
>>     multi-threaded pass timing
>>     <https://mlir.llvm.org/docs/WritingAPass/#multi-threaded-pass-timing>
>>     than making the pass manager itself multi-threaded.
>>
>>     -- River
>     Actually in my experience, the biggest problem is if we can detect
>     IPO and run async guarantees on that. MLIR runs operations but
>     only for a module or set of functions
>     without this. One of my dreams would be to run passes in parallel
>     including IPO detection and stop if it cannot continue pass a IPO
>     pass or set of passes due to changes.
>
>     Maybe MLIR does do that but its the one bottleneck that is really
>     hard to fix,
>
>
> What MLIR does (that would require quite some work in LLVM) is making 
> sure that you can process and transform functions in isolation, 
> allowing to run *local* optimizations in parallel. This does not solve 
> the IPO problem you're after. As I understand it, this is a difficult 
> thing to design, and it requires consideration about how you think the 
> passes and the pass-pipeline entirely.
I've been thinking about it on the GCC side for the last few months. 
It's non trivial through something similar to this may be possible but I
will need to research both pass managers better:
https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue.h#L102

I'm looking on some RFCS for the pass manager and SSA iterators on the 
GCC side. Through one easy thing to do is a lot of core
classes in both GCC/LLVM there are loops that run a lot. LRA on the GCC 
for register allocation does. We may want to figure
out how and if its a good idea to run these loops in small worker 
functions that are only used by the one function that  requires
it. The only real question is some loops are only very performance 
intensive on corner cases which we may require heuristics
in order to decide when to launch i.e. number of elements iterating 
through e.t.c. Maybe that's a bad idea but after looking
through the GCC and LLVM side a little this seems to be a later tedious 
but trivial fix mostly due to figuring out what loops are
expensive enough to do this.

I believe Johannes was going to reach out next week about the Module 
Classes and we will go for there,

Nick
>
> Running function-passes and "local" optimizations in parallel in LLVM 
> isn't possible because the structures in the LLVMContext aren't 
> thread-safe, and because the IR itself isn't thread-safe. Something 
> like just DCE or CSE a function call requires to modify the callee 
> (through its use-list).
>
> -- 
> Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200301/d4999bcf/attachment.html>