[llvm-dev] RFC: LoopIDs are not identifiers (and better loop-parallel metadata)

Tue Dec 4 16:50:10 PST 2018

Dear LLVM community,

LLVM IR has a concept of 'LoopID' [1] which is a misnomer:

(a) LoopIDs are not unique: Any pass that duplicates IR will do it
including its metadata (e.g. LoopVersioning) such that thereafter
multiple loops are linked with the same LoopID. There is even a test
case (Transforms/LoopUnroll/unroll-pragmas-disabled.ll) for multiple
loops with the same LoopID.

(b) LoopIDs are not persistent: Adding or removing an item from a
LoopID can only be done by creating a new MDNode and assigning it to
the loop's branch(es). Passes such as LoopUnroll
(llvm.loop.unroll.disable) and LoopVectorize (llvm.loop.isvectorized)
use this to mark loops to be not transformed multiple times or to
avoid that a LoopVersioning original loop is transformed.

The only mechanism in LLVM that uses LoopIDs as identifiers is
llvm.mem.parallel_loop_access [2]. It is added to memory-accessing
instructions and points to one or multiple LoopIDs it is parallel to.
Because of (b), transformation passes can change a loop's llvm.loop
metadata such that llvm.mem.parallel_loop_access may point to an
orphaned LoopID.

Patch [3] fixes this issue. Instead of instructions referencing the
loops they are parallel to, it adds a new loop attribute
"llvm.loop.parallel_accesses" that references the instructions. Any
loop transformation that changes the loop metadata will by default
carry-over the llvm.loop.parallel_accesses property and the
information retained.

Since global metadata cannot reference instructions, [3] also
introduces a new distinct metadata type that represents a group of
instructions. An instruction will reference the a group using the
llvm.access.group metadata, and llvm.loop.parallel_accesses uses the
access group MDNode to reference to the instructions.

Currently, the metadata for parallel memory accesses looks like:

loop:
  load ... !llvm.mem.parallel_loop_access !0
  br label %loop, !llvm.loop !0

!0 = distinct !{!0}                                      ; 'LoopID'

My suggestion uses this pattern instead:

loop:
  load ... !llvm.access.group !2
  br label %loop, !llvm.loop !0

!0 = distinct !{!0, !1}                                  ; 'LoopID'
!1 = !{!"llvm.loop.parallel_accesses", !2}
!2 = distinct !{}                                        ; access group

Patch [4] modifies Clang to generate this kind of metadata. Accesses
in the same loop nest level are assigned to the same access group.

Fixing (a) and (b) instead would require much more work. For (a), any
code that copies entire basic blocks (this includes the inliner since
the same function can be inlined multiple times) would need to make a
copy of the LoopID and update any llvm.mem.parallel_loop_access in the
copy. For (b), in addition to MDNode::setOperand, we need a method to
change the MDNode's size without changing its address. Currently,
MDNode is optimized for an unchangeable number of operands.

I suggest to stop seeing 'LoopID' as an identifier entirely, but to
use it as a bag of loop attributes. Patch [5] updates the reference
manual. We may also think about renaming LoopID in the source code,
and dropping the 'distinct' and 'first item is self-reference'. The
latter two were intended to avoid separate LoopIDs into into one
because their content is identical, i.e. uphold (a). As we have seen,
there are other reasons for loops to have identical LoopIDs. With
patches [3,4], llvm.loop metadata can be collapsed (unlike access
groups), thus the 'distinct' is not necessary anymore. Unfortunately,
there is code in LLVM (and maybe elsewhere) that depends on LoopIDs'
first item, i.e. we cannot get rid of it that easily.

Regards,
Michael

[1] https://llvm.org/docs/LangRef.html#llvm-loop
[2] https://llvm.org/docs/LangRef.html#llvm-mem-parallel-loop-access-metadata
[3] https://reviews.llvm.org/D52116
[4] https://reviews.llvm.org/D52117
[5] https://reviews.llvm.org/D55290