[llvm-dev] !llvm.loop ID metadata clarification

Fri Feb 19 09:44:04 PST 2021

On Thu, Feb 18, 2021 at 10:57 PM Michael Kruse <llvmdev at meinersbur.de>
wrote:

> I wrote the lines in the LangRef and can answer the questions.
>
> When LoopIDs were first conceived, the idea was that each loop has its
> own identifier (so you could, e.g. build a DenseMap to information for
> a loop). To avoid that the MDNode of two different would be collapsed
> into one, the design made the first item of the MDNode reference
> itself. This worked because the uniquifier did not do a GVN or
> similar, just compared the content (IMHO still a bad idea, the
> uniquing algorithm could be improved later). Later, 'distinct' MDNodes
> were introduced, but by then code exists that depend on this. Either
> because they check for valid LoopID MDNodes eg. in an assert, skip the
> first element in a LoopID MDNode, or, in case of llvm.access.group
> (which allows either be a single LoopID or a list of LoopIDs) only
> handle the list of LoopIDs case because if it;s only a single LoopID,
> the MDNode again contains the self-reference LoopID.
>
>
> There is a problem with using a LoopID as "identifier":
>
> 1. The MDNode changes when changing the loop properties, e.g.
> LoopVectorize adds llvm.loop.isvectorized to loops it vectorize and
> loops that it determined to be not vectorizable to avoid processing it
> again. Hence, LoopIDs are not persistent.
>
> 2. There is some code that makes copies of code, such as LoopUnroll,
> inlining or LoopVersioning. These copy all the instructions including
> the MDNodes and you get multiple loops with the same LoopID. There is
> even a LoopUnroll test case that checks that. Hence, a LoopID is not
> unique.
>
> Therefore, a LoopID does not fulfill the basic requirements to
> identify a loop. Instead, we went to interpret the MDNode as a bag of
> loop properties which can be changed.
>
>
> With this reinterpretation, I did not change the name 'LoopID' nor
> removed the self-reference to avoid breaking things.
>
>
>
> Am Do., 18. Feb. 2021 um 19:12 Uhr schrieb Mehdi AMINI via llvm-dev
> <llvm-dev at lists.llvm.org>:
> > The last sentence isn't clear to me, I don't see the relationship
> between "the loop-metadata node can be attached to multiple nodes" and "the
> ‘distinct’ keyword has become unnecessary"?
>
> See above. The distinct/self-reference was meant to stop the same
> LoopID to be assigned to multiple loops by the MDNode uniquifier.

However, there are other mechanisms that assign the same LoopID to two
> different loops.
>

OK thanks, I get the sentence now: it isn't only the distinct keyword that
is unnecessary but also the self-reference. Basically the metadata does not
need to be distinct, I think that is what wasn't clear in the sentence
unclear for me.

> > Also It isn't clear why we don't use a distinct metadata instead of a
> self-referencing one?
>
> To not break existing code that assumes the self-reference.
>

So this is just tech debt that can be cleaned up then? We could sweep
through the code base and make it work with unique MDs?

> >
> > Then LangRef mentions:
> >
> > > Loop metadata nodes cannot be used as unique identifiers. They are
> neither persistent for the same loop through transformations nor
> necessarily unique to just one loop.
> >
> > So the metadata itself can't be used as a unique identifier for the
> loop, however in a loop all latches must have the same MDNode for LoopInfo
> to recognize it as valid.
>
> A single loop can have multiple latches. These should be consistent,
> otherwise we don't know which one is the correct information for this
> loop. We could instead choose the first one encountered, but I don't
> think its a solution for robustness. In contrast, metadata is defined
> to always be able to be dropped without losing semantic correctes.
>
>
> > It isn't clear to me why it prevents using unique metadata here without
> the self-referencing trick, and rely on content-based uniquing for equality?
>
> To not break existing code that assumes the self-reference.

>
> > Loop info could still rely on pointer-equality to ensure that the
> metadata is consistent on every backedge, two loops with the same metadata
> would share the same MDNode, but it shouldn't be an issue since these
> aren't used as unique identifiers per LangRef.
>
> Correct.
>
> However, it requires work to fix all the places that assumes the
> self-reference, makes existing IR incompatible and may cause trouble
> downstream. Hal Finkel and me found that this such a change is not
> worth the risk.
>

I'm happy to give it a try to clean-up the codebase and remove all this!

Thanks for the clarification :)

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210219/09fad631/attachment.html>