<div dir="auto">Am Sa., 11. Jan. 2020 um 07:43 Uhr schrieb Renato Golin <<a href="mailto:rengolin@gmail.com" target="_blank" rel="noreferrer">rengolin@gmail.com</a>>:<br>

> On Sat, 11 Jan 2020 at 00:34, Michael Kruse <<a href="mailto:llvmdev@meinersbur.de" target="_blank" rel="noreferrer">llvmdev@meinersbur.de</a>> wrote:<br>

> > Yes, as mentioned in the Q&A. Unfortunately VPlan is able to represent<br>

> > arbitrary code not has cheap copies.<br>

><br>

> Orthogonal, but we should also be looking into implementing the cheap<br>

> copies in VPlan if we want to search for composable plans.<br>

<br>

VPlan structures have many references to neighboring structures such as parents and use-def chains. This makes adding cheap copies as an afterthought really hard.<br>

<br>

<br>

> > This conversion is a possibility and certainly not the main motivation<br>

> > for a loop hierarchy.<br>

><br>

> I know. There are many things that can be done with what you propose,<br>

> but we should focus on what's the main motivation.<br>

><br>

> From what I can tell, the tree representation is a concrete proposal<br>

> for the many year discussion about parallel IR.<br>

<br>

As I recall, the Parallel IR approaches were trying to add parallel constructs to the existing LLVM-IR. This added the issue that the current infrastructure suddenly need to handle those as well, becoming a major problem for adoption.<br>

<br>

<br>

> The short paper doesn't mention that, nor it discusses other<br>

> opportunities to fix pipeline complexity (that is inherent of any<br>

> compiler).<br>

><br>

> I still believe that many of the techniques you propose are meaningful<br>

> ways to solve them, but creating another IR will invariably create<br>

> some adoption barriers.<br>

<br>

I see it as an advantage in respect of adoption: It can be switched on and off without affecting other parts.<br>

<br>

<br>

> > Are you arguing against code versioning? It is already done today by<br>

> > multiple passes such as LoopVersioningLICM, LoopDistribute,<br>

> > LoopUnrollAndJam and LoopVectorize. The proposal explicitly tries to<br>

> > avoid code bloat by having just one fallback copy. Runtime conditions<br>

> > can be chosen more or less optimistically, but I don't see how this<br>

> > should be an argument for all kinds of versioning.<br>

><br>

> No. I'm cautious to the combination of heuristics search and<br>

> versioning, especially when the conditions are runtime based. It may<br>

> be hard to CSE them later.<br>

><br>

> The paths found may not be the most optimal in terms of intermediate states.<br>

<br>

Versioning is always a trade-off between how likely the preconditions apply and code size (and maybe how expensive the runtime checks are). IMHO this concern is separate from how code versioning is implemented.<br>

<br>

<br>

> > > Don't get me wrong, I like the idea, it's a cool experiment using some<br>

> > > cool data structures and algorithms. But previous experiences with the<br>

> > > pass manager have, well, not gone smooth in any shape or form.<br>

> ><br>

> > What experiments? I don't see a problem if the pass manger has to<br>

> > invalidate analysis are re-run canonicalization passes. This happens<br>

> > many times in the default pass pipelines. In addition, this<br>

> > invalidation is only necessary if the loop optimization pass optimizes<br>

> > something, in which case the additional cost should be justified.<br>

><br>

> My point goes back to doing that in VPlan, then tree. The more<br>

> back-and-forth IR transformations we add to the pipeline, the more<br>

> brittle it will be.<br>

<br>

Agreed, but IMHO this is the price to pay for better loop optimizations.<br>

<br>

<br>

> The original email also proposes, for the future, to do all sorts of<br>

> analyses and transformations in the tree representation, and that will<br>

> likely be incompatible with (or at least not propagated through) the<br>

> conversions.<br>

<br>

Correct, but I'd argue these are different kinds of analyses not necessarily even useful for different representations. MLIR also has its set of analyses separate to those on MLIR.<br>

<br>

<br>

> > In a previous RFC [8] I tried to NOT introduce a data structure but to<br>

> > re-use LLVM-IR. The only discussion there was about the RFC, was about<br>

> > not to 'abuse' the LLVM-IR.<br>

> ><br>

> > <a href="https://lists.llvm.org/pipermail/llvm-dev/2017-October/118169.html" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2017-October/118169.html</a><br>

> > <a href="https://lists.llvm.org/pipermail/llvm-dev/2017-October/118258.html" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/pipermail/llvm-dev/2017-October/118258.html</a><br>

> ><br>

> > I definitely see the merits of using fewer data structures, but it is<br>

> > also hard to re-use something existing for a different purpose (in<br>

> > this case: VPlan) without making both more complex.<br>

><br>

> My point about avoiding more structures and IRs was related to VPlan<br>

> and MLIR, not LLVM-IR.<br>

><br>

> I agree there should be an abstraction layer to do parallelisation<br>

> analysis, but we already have two, and I'd rather add many of your<br>

> good proposals on those than create a third.<br>

><br>

> Perhaps it's not clear how we could do that now, but we should at<br>

> least try to weigh the options.<br>

><br>

> I'd seriously look at adding a tree-like annotation as an MLIR<br>

> dialect, and use it for lean copies.<br>

<br>

Like VPlan, MLIR is a representation with many references between objects from different levels. I do not see how to add cheap copies as an afterthought.<br>

<br>

<br>

<br>

> > For the foreseeable future, Clang will generate LLVM-IR, but our<br>

> > motivation is to (also) optimize C/C++ code. That is, I do not see a<br>

> > way to not (also) handle LLVM-IR until Clang is changed to generate<br>

> > MLIR (which then again will be another data struture in the system).<br>

><br>

> Even if/when Clang generates MLIR, there's no guarantee the high-level<br>

> dialects will be preserved until the vectorisation pass.<br>

<br>

I'd put loop optimizations earlier into the pipeline than vectorization. Where exactly is a phase ordering problem. I'd want to at least preserve multi-dimensional subscripts. Fortunately MemRef is a core MLIR construct and unlikely to be lowered before lowering to another representation (likely LLVM-IR).<br>

<br>

<br>

> And other<br>

> front-ends may not generate the same quality of annotations.<br>

> We may have to re-generate what we need anyway, so no point in waiting<br>

> all the front-ends to do what we need as well as all the previous<br>

> passes to guarantee to keep it.<br>

<br>

I don't see how this is relevant for a Clang-based pipeline. Other languages likely need a different pipeline than one intended for C/C++ code.<br>

<br>

There are not a lot of high-level semantics required to be preserved to build a loop hierarchy.<br>

<br>

Thanks for the productive discussion,<br>

Michael<br></div>