[PATCH] D94973: [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Joel E. Denny via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Feb 26 15:21:20 PST 2021
jdenny added a comment.
In D94973#2591210 <https://reviews.llvm.org/D94973#2591210>, @Meinersbur wrote:
> In D94973#2590867 <https://reviews.llvm.org/D94973#2590867>, @jdenny wrote:
>
>> One property of this patch that has bothered me is that OMPCanonicalLoop is not a loop. Instead, it's an AST node that is sandwiched between a directive and a loop to contain extra information about the loop. The TreeTransform issues we've been discussing highlight how that extra node can be confusing.
>
> There are many other AST nodes that are sandwiched carrying semantic information. Examples: AttributedStmt, ImplicitCast, CapturedStmt of any OMPExecutableStmt, ExprWithCleanups, CXXRewrittenBinaryOperator, CXXBindTemporaryExpr.
Thanks for the examples. I never thought of those cases in that way probably because they're easier to imagine as distinct constructs or operations. But I'm probably splitting hairs.
> I even think that representing semantic information alongside of syntax is one of the principles of Clang's AST.
Interesting. Is that documented somewhere?
>> Now I remember that there's D95496 <https://reviews.llvm.org/D95496>, which seems not to have that property. I haven't reviewed that patch in any detail, and I haven't tried to locate any prior discussions. Can you please summarize why you prefer the current patch over that one?
>
> D95496 <https://reviews.llvm.org/D95496> is a lot more invasive to code that does not even use OpenMP.
Fair enough.
================
Comment at: clang/lib/Sema/TreeTransform.h:8321
+TreeTransform<Derived>::TransformOMPCanonicalLoop(OMPCanonicalLoop *L) {
+ // The OMPCanonicalLoop will be recreated when transforming the loop-associted
+ // directive.
----------------
Meinersbur wrote:
> jdenny wrote:
> > Meinersbur wrote:
> > > jdenny wrote:
> > > > Meinersbur wrote:
> > > > > jdenny wrote:
> > > > > > Meinersbur wrote:
> > > > > > > jdenny wrote:
> > > > > > > > Meinersbur wrote:
> > > > > > > > > jdenny wrote:
> > > > > > > > > > I'm used to seeing `TransformX` call `RebuildX` call `ActOnX`. Why not do that for `X=OMPCanonicalLoop`? Does `TransformOMPExecutableDirective` really need a special case for `OMPCanonicalLoop`?
> > > > > > > > > The intended behaviour is:
> > > > > > > > >
> > > > > > > > > 1. Transform the child loop
> > > > > > > > > 2. Return the child loop as representing the OMPCanonicalLoop (i.e. OMPCanonicalLoop wrapper is removed).
> > > > > > > > > 3. Parent loop-associated directive (e.g. workshare) is processed.
> > > > > > > > > 4. `ActOnOpenMPCanonicalLoop` adds new OMPCanonicalLoop wrapper for loop nest.
> > > > > > > > >
> > > > > > > > > This guarantees maximum flexibility on what of the loop can be changed, such as
> > > > > > > > > * Change lower bound, upper bound, step
> > > > > > > > > * Convert between CXXForRangeStmt and ForStmt
> > > > > > > > > * Change the associated depth (e.g. different value for `collapse` clause)
> > > > > > > > > * Remove the directive and no OMPCanonicalLoop remain
> > > > > > > > >
> > > > > > > > > This also saves adding many lines of code handling transforming each member of OMPCanonicalLoop separately.
> > > > > > > > > The intended behaviour is:
> > > > > > > > >
> > > > > > > > > 1. Transform the child loop
> > > > > > > >
> > > > > > > > For my suggestion, that call would remain within `TransformOMPCanonicalLoop` where it is now.
> > > > > > > >
> > > > > > > > > 2. Return the child loop as representing the OMPCanonicalLoop (i.e. OMPCanonicalLoop wrapper is removed).
> > > > > > > >
> > > > > > > > For my suggestion, this would not happen. I think it's normal for a `TransformX` to return the transformed `X` not the transformed child of `X`. If a caller wants to transform the child, then it should transform the child directly instead.
> > > > > > > >
> > > > > > > > > 3. Parent loop-associated directive (e.g. workshare) is processed.
> > > > > > > >
> > > > > > > > It looks to me like step 3 is currently within `TransformOMPExecutableDirective` and starts before the call to `TranformOMPCanonicalLoop` and thus before step 1. It completes after step 4. Or am I misunderstanding what you're describing as step 3?
> > > > > > > >
> > > > > > > > > 4. `ActOnOpenMPCanonicalLoop` adds new OMPCanonicalLoop wrapper for loop nest.
> > > > > > > >
> > > > > > > > For my suggestion, this would still happen. However, instead of step 2: within `TransformOMPCanonicalLoop`, you would call `RebuildOMPCanonicalLoop`, which would call `ActOnOpenMPCanonicalLoop` as step 4. The effect is you moved `ActOnOpenMPCanonicalLoop` from the caller (`TransformOMPExecutableDirective`) to the callee's callee, but the behavior should remain the same.
> > > > > > > >
> > > > > > > > > This guarantees maximum flexibility on what of the loop can be changed, such as
> > > > > > > > > * Change lower bound, upper bound, step
> > > > > > > > > * Convert between CXXForRangeStmt and ForStmt
> > > > > > > > > * Change the associated depth (e.g. different value for `collapse` clause)
> > > > > > > > > * Remove the directive and no OMPCanonicalLoop remain
> > > > > > > >
> > > > > > > > Flexibility for whom?
> > > > > > > >
> > > > > > > > A class extending `TreeTransform`? With my suggestion, it can override `TransformOMPCanonicalLoop` or `RebuildOMPCanonicalLoop`, depending on how much it wants to alter the transformation.
> > > > > > > >
> > > > > > > > Or a caller of `TransformOMPCanonicalLoop` within `TreeTransform`? I only see one right now, `TransformOMPExecutableDirective`, and I don't see how it needs the flexibility. Are there other callers I missed?
> > > > > > > >
> > > > > > > > Are you trying to create flexibility without requiring deriving from `TreeTransform`? But, as far as I can tell, you're doing so at the expense of normal `TreeTransform` semantics. Doesn't seem worth it.
> > > > > > > >
> > > > > > > > If you see a precedent for your approach elsewhere in `TreeTransform`, please point it out.
> > > > > > > >
> > > > > > > > > This also saves adding many lines of code handling transforming each member of OMPCanonicalLoop separately.
> > > > > > > >
> > > > > > > > Why would you need to? In the other `TransformX` functions I looked at, the arguments to the `RebuildX` function are transformed, and those are typically just the arguments to the `ActOnX` function. In other words, you would just transform the loop within your `OMPCanonicalLoop` as you're doing now.
> > > > > > > I could not find where you outlined your solution, I tried to infer it from your comments.
> > > > > > >
> > > > > > > > I'm used to seeing TransformX call RebuildX call ActOnX.
> > > > > > >
> > > > > > > Introduced new `RebuildOMPCanonicalLoop` that does nothing else then calling `ActOnOMPCanonicalLoop` from the Sema object, like e.g. `RebuildOMPExecutableDirective` does. This allows overriding `RebuildOMPCanonicalLoop` in a derived transform.
> > > > > > >
> > > > > > >
> > > > > > > > Does TransformOMPExecutableDirective really need a special case for OMPCanonicalLoop?
> > > > > > >
> > > > > > > As it does for atomic, critical, section and master.
> > > > > > >
> > > > > > >
> > > > > > > > Flexibility for whom? A class extending TreeTransform?
> > > > > > >
> > > > > > > Yes
> > > > > > >
> > > > > > >
> > > > > > > > If you see a precedent for your approach elsewhere in TreeTransform, please point it out.
> > > > > > >
> > > > > > > The precedence is `TransformOMPExecutableDirective` it unwraps the CapturedStmt to get its body code. The following `TransformStmt` of the unwrapped Stmt never calls `TransfomCapturedStmt` since it was already unwrapped. The re-wrapping is done by the surrounding `ActOnOpenMPRegionStart`/`ActOnOpenMPRegionEnd`.
> > > > > > >
> > > > > > > To reproduce this, I changed `TransformOMPExecutableDirective` to also unwrap the `OMPCanonicalLoop` instead of doing so implicitly when `TransformStmt` calls `TransformOMPCanonicalLoop`. As a result, `TransformOMPCanonicalLoop` (like `TransfomCapturedStmt` for OpenMP directives; maybe `TransfomCapturedStmt` is called for other purposes) should never be called and I put an assertion there instead.
> > > > > > >
> > > > > > > Note that it is not required for a `TransformXYZ` method to exactly return the type in its name, for instance `TransformUnresolvedLookupExpr` may return whatever the template instantiation resolves to.
> > > > > > > I could not find where you outlined your solution, I tried to infer it from your comments.
> > > > > >
> > > > > > Sorry for the confusion.
> > > > > >
> > > > > > I've added suggested edits to show how to tweak the patch into what I'm suggesting. I applied the result to my checkout, and check-clang-openmp still passes.
> > > > > >
> > > > > > I think my suggestion is more consistent with existing TreeTransform design. If it won't work for your use cases, please help me to understand why.
> > > > > >
> > > > > > > > I'm used to seeing TransformX call RebuildX call ActOnX.
> > > > > > >
> > > > > > > Introduced new `RebuildOMPCanonicalLoop` that does nothing else then calling `ActOnOMPCanonicalLoop` from the Sema object, like e.g. `RebuildOMPExecutableDirective` does. This allows overriding `RebuildOMPCanonicalLoop` in a derived transform.
> > > > > >
> > > > > > Thanks. That part looks good.
> > > > > >
> > > > > > > > Does TransformOMPExecutableDirective really need a special case for OMPCanonicalLoop?
> > > > > > >
> > > > > > > As it does for atomic, critical, section and master.
> > > > > >
> > > > > > With my suggestion, the special case for OMPCanonicalLoop doesn't seem necessary.
> > > > > >
> > > > > > > > Flexibility for whom? A class extending TreeTransform?
> > > > > > >
> > > > > > > Yes
> > > > > >
> > > > > > After applying my suggestion, a class extending TreeTransform can override TransformOMPCanonicalLoop or RebuildOMPCanonicalLoop. If that's not sufficient, please help me to understand the use cases that require more flexibility.
> > > > > >
> > > > > > > > If you see a precedent for your approach elsewhere in TreeTransform, please point it out.
> > > > > > >
> > > > > > > The precedence is `TransformOMPExecutableDirective` it unwraps the CapturedStmt to get its body code. The following `TransformStmt` of the unwrapped Stmt never calls `TransfomCapturedStmt` since it was already unwrapped. The re-wrapping is done by the surrounding `ActOnOpenMPRegionStart`/`ActOnOpenMPRegionEnd`.
> > > > > > >
> > > > > > > To reproduce this, I changed `TransformOMPExecutableDirective` to also unwrap the `OMPCanonicalLoop` instead of doing so implicitly when `TransformStmt` calls `TransformOMPCanonicalLoop`. As a result, `TransformOMPCanonicalLoop` (like `TransfomCapturedStmt` for OpenMP directives; maybe `TransfomCapturedStmt` is called for other purposes) should never be called and I put an assertion there instead.
> > > > > >
> > > > > > Yes, I'm assuming TransformCapturedStmt is needed for other purposes and its implementation doesn't satisfy TransformOMPExecutableDirective's needs. In contrast, I think the TransformOMPCanonicalLoop I've suggested seems fine for TransformOMPExecutableDirective's needs, and it follows the usual behavior of any TransformXYZ.
> > > > > >
> > > > > > > Note that it is not required for a `TransformXYZ` method to exactly return the type in its name, for instance `TransformUnresolvedLookupExpr` may return whatever the template instantiation resolves to.
> > > > > >
> > > > > > Understood, but I'm not aware of a case where TransformXYZ returns the transformation of the child and leaves it up to the caller to finish the transformation of XYZ itself. The type that results from that transformation is a different issue.
> > > > > Thanks for you proposed edits.
> > > > >
> > > > > I still think think we should we should handle this like TransformOMPExecutableDirective handles CapturedStmt, because (1) OMPExecutableDirective is in control of how its children is structured, (2) TransformOMPExecutableDirective already does this this way (3) derived TreeTransform can change more aspects of a loop/loop-associated directive without having to override TransformOMPCanonicalLoop/TransformOMPExecutableDirective (4) fewer risk of derived TreeTransform to leave the AST in an inconsistent state and (5) is will not work when the loop depth is dependent on a template.
> > > > >
> > > > > To elaborate on (5):
> > > > > ```
> > > > > template<int n>
> > > > > void func() {
> > > > > #pragma omp for collapse(n)
> > > > > for (...
> > > > > for (...
> > > > > }
> > > > > func<1>();
> > > > > func<2>();
> > > > > ```
> > > > >
> > > > > When parsing the function body, the depth of the loop nest is unknown, hence we cannot OMPCanonicalLoops. They have to be inserted when Sema's `TemplateInstantiator` knows the value of `n`, but with you proposal, there are no OMPCanonicalLoops in the templated AST that `TransformOMPCanonicalLoops` would be called on.
> > > > >
> > > > > I will am still willing to go with your proposal if you think this case should not be supported or that TemplateInstantiator` should override `TransformOMPExecutableDirective` to handle this case.
> > > > >
> > > > >
> > > > > > Understood, but I'm not aware of a case where TransformXYZ returns the transformation of the child and leaves it up to the caller to finish the transformation of XYZ itself.
> > > > >
> > > > > This refers to an outdated version of this patch.
> > > > > I still think think we should we should handle this like TransformOMPExecutableDirective handles CapturedStmt, because (1) OMPExecutableDirective is in control of how its children is structured, (2) TransformOMPExecutableDirective already does this this way
> > > >
> > > > I agree that CapturedStmt is a precedent for the current patch's approach. My struggle is to understand why that approach is relevant for OMPCanonicalLoop, as discussed below.
> > > >
> > > > > (3) derived TreeTransform can change more aspects of a loop/loop-associated directive without having to override TransformOMPCanonicalLoop/TransformOMPExecutableDirective
> > > >
> > > > If not those functions, what would the derived TreeTransform override to produce such changes?
> > > >
> > > > > (4) fewer risk of derived TreeTransform to leave the AST in an inconsistent state and
> > > >
> > > > I'm trying to imagine how this would happen. I suppose there could be a transformation that removes an OMPExecutableDirective from its associated statement, on which it then calls TransformStmt, failing to notice that an OMPCanonicalLoop then needs to be removed too. With my suggestion, TransformStmt would call TransformOMPCanonicalLoop and accidentally rebuild the OMPCanonicalLoop. Without my suggestion, it would fail an assertion, indicating the transformation is broken. Either way, it needs a special case for OMPCanonicalLoop.
> > > >
> > > > OK, that seems like a benefit of the current patch over my suggestion. I'm not sure whether it's worth it (TransformCapturedStmt doesn't have that benefit), but I'm fine with it. If you prefer the current patch, could you please add a comment to TransformOMPCanonicalLoop to document why it's special? The llvm_unreachable text alone doesn't make it clear to me. Here's a long-winded suggestion:
> > > >
> > > > "An OMPCanonicalLoop must appear only as a child of an OMPExecutableDirective that represents a loop directive. To help maintain this invariant, TransformOMPCanonicalLoop always fails an assertion in order to force any transformation to handle OMPCanonicalLoop as a special case. For example, the default TransformOMPExecutableDirective checks for an OMPCanonicalLoop and rebuilds it within the OMPExecutableDirective. On the other hand, a transformation that removes an OMPExecutableDirective from its associated statement should also check for any OMPCanonicalLoop and remove it as well."
> > > >
> > > > > (5) is will not work when the loop depth is dependent on a template.
> > > > >
> > > > > To elaborate on (5):
> > > > > ```
> > > > > template<int n>
> > > > > void func() {
> > > > > #pragma omp for collapse(n)
> > > > > for (...
> > > > > for (...
> > > > > }
> > > > > func<1>();
> > > > > func<2>();
> > > > > ```
> > > > >
> > > > > When parsing the function body, the depth of the loop nest is unknown, hence we cannot OMPCanonicalLoops. They have to be inserted when Sema's `TemplateInstantiator` knows the value of `n`, but with you proposal, there are no OMPCanonicalLoops in the templated AST that `TransformOMPCanonicalLoops` would be called on.
> > > >
> > > > In the current patch, OMPCanonicalLoop is created and TransformOMPExecutableDirective assumes it's present regardless of template parameters. I guess you're saying that will change once you handle depth>1.
> > > > If not those functions, what would the derived TreeTransform override to produce such changes?
> > >
> > > `TransformOMPExecutableDirective`, any`TransformOMPxyzDirective` where xyz is loop-associated, `TransformForStmt` or `TransformCXXForRangeStmt`.
> > >
> > > > Without my suggestion, it would fail an assertion, indicating the transformation is broken.
> > >
> > > The option triggering an assertion seems superior to me.
> > >
> > > `TransformOMPExecutableDirective` is removing `OMPCanonicalLoop` associated to it such that these do not need to be handled individually. `TransformOMPExecutableDirective` knows how many loops it is associated to and has to remove that many `OMPCanonicalLoop`. On the other side, `TransformOMPCanonicalLoop` has no access to its parents and therefore does not know what it is associated to, whether it is still needed, and cannot insert new ones if necessary (such as in my template example).
> > >
> > > > I suppose there could be a transformation that removes an OMPExecutableDirective from its associated statement, on which it then calls TransformStmt, failing to notice that an OMPCanonicalLoop then needs to be removed too.
> > > > Either way, it needs a special case for OMPCanonicalLoop.
> > >
> > > It does with the current patch, in the previous patch `TransformOMPCanonicalLoop` would just have removed all of them. However, the TreeTransform that removes an `OMPExecutableDirective` will already have to remove the CapturedStmts/Decls, so also having to remove `OMPCanonicalLoop` seems consistent.
> > >
> > > I will add your suggested source comment in the next differential update.
> > >
> > > > In the current patch, OMPCanonicalLoop is created and TransformOMPExecutableDirective assumes it's present regardless of template parameters.
> > >
> > > Yes, templates are explicitly not handled in this patch yet (see summary).
> > >
> > > > I guess you're saying that will change once you handle depth>1.
> > >
> > > This makes me remind of that removing the `OMPCanonicalLoop` in `TransformOMPExecutableDirective` with depth>1 is more difficult in `TransformOMPExecutableDirective`. Since the `OMPCanonicalLoop` and the loop statements are interleaved, and the loop statements have to be preserved, it is not sufficient to just select a subtree root. However, `TransformOMPCanonicalLoop` just returning its child is trivial.
> > >
> > > Is the additional complexity worth avoiding the unusual behavior of a TransformXYZ just returning its child?
> > >
> > > > I guess you're saying that will change once you handle depth>1.
> > >
> > > This makes me remind of that removing the `OMPCanonicalLoop` in `TransformOMPExecutableDirective` with depth>1 is more difficult in `TransformOMPExecutableDirective`. Since the `OMPCanonicalLoop` and the loop statements are interleaved, and the loop statements have to be preserved,
> >
> > So, when depth>1 is handled, OMPCanonicalLoop won't always be the child of OMPExecutableDirective?
> >
> > > it is not sufficient to just select a subtree root. However, `TransformOMPCanonicalLoop` just returning its child is trivial.
> >
> > Won't TransformOMPExecutableDirective have to iterate through the subtree and build every new interleaved OMPCanonicalLoop either way? Can't it remove the originals as it goes?
> >
> > > > I guess you're saying that will change once you handle depth>1.
> > So, when depth>1 is handled, OMPCanonicalLoop won't always be the child of OMPExecutableDirective?
>
> If you mean direct child, this already the case now because of the CapturedStmt/CapturedDecl put between the OMPExectableDirective and its associated loops.
>
> If you mean decendent, then no. An OMPCanonicalLoop should always be associated to some OpenMP directive,which is to be found as one of its ancestors, otherwise it would be structurally inconsistent.
>
> Note that there already can be implicit AST nodes sandwiched between the loops of an OpenMP loop nest:
> ```
> #pragma omp for collapse(2)
> for (int i = 0; i < 20; ++i)
> [[likely]]
> for (int j = 0; j < 20; ++j)
> Body(i,j);
> ```
> ```
> `-OMPForDirective 0x202c7d35a20 <line:6:1, col:28>
> `-CapturedStmt 0x202c7d07f00 <line:7:1, line:10:15>
> `-CapturedDecl 0x202c7d078e0 <<invalid sloc>> <invalid sloc>
> |-ForStmt 0x202c7d07ec8 <line:7:1, line:10:15>
> | `-AttributedStmt 0x202c7d07eb0 <line:8:5, line:10:15>
> | |-LikelyAttr 0x202c7d07e88 <line:8:7>
> | `-ForStmt 0x202c7d07e50 <line:9:5, line:10:15>
> ```
>
>
> > > it is not sufficient to just select a subtree root. However, `TransformOMPCanonicalLoop` just returning its child is trivial.
> >
> > Won't TransformOMPExecutableDirective have to iterate through the subtree and build every new interleaved OMPCanonicalLoop either way? Can't it remove the originals as it goes?
>
> What do you mean by "Can't it remove the originals as it goes?". This is what `TransformOMPCanonicalLoop` would have done returning its child.
>
> Btw I found precendence which does exactly that:
> ```
> template<typename Derived>
> ExprResult
> TreeTransform<Derived>::TransformImplicitCastExpr(ImplicitCastExpr *E) {
> // Implicit casts are eliminated during transformation, since they
> // will be recomputed by semantic analysis after transformation.
> return getDerived().TransformExpr(E->getSubExprAsWritten());
> }
> ```
> Btw I found precendence which does exactly that:
> ```
> template<typename Derived>
> ExprResult
> TreeTransform<Derived>::TransformImplicitCastExpr(ImplicitCastExpr *E) {
> // Implicit casts are eliminated during transformation, since they
> // will be recomputed by semantic analysis after transformation.
> return getDerived().TransformExpr(E->getSubExprAsWritten());
> }
> ```
Thanks. That looks like a reasonable precedent to me.
In your comment for TransformOMPCanonicalLoop, please point out the invariant that OMPCanonicalLoop must be a descendant of OMPExecutableDirective.
I would ask that your comment mention that TransformOMPExecutableDirective is what rebuilds the OMPCanonicalLoop nodes that are removed here. However, for depth>1, I don't know if that's where that always happens given the interleaving of OMPCanonicalLoop and loop statements. I'll find out when you write that patch.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D94973/new/
https://reviews.llvm.org/D94973
More information about the cfe-commits
mailing list