[PATCH] D115244: [LICM] Promote conditional, loop-invariant memory accesses to scalars

Thu Mar 31 00:56:52 PDT 2022

dmilosevic141 updated this revision to Diff 419349.
dmilosevic141 marked 2 inline comments as done.
dmilosevic141 added a comment.
Herald added a project: All.

Thanks @reames! Sorry for the delayed response, I haven't been able to fully focus on this topic in the past couple of months.

> My high level concern here is profitability. It's unclear to me that inserting a flag iv, and the conditional store outside the loop is generally profitable. It's clearly profitable in many cases, but I'm a bit hesitant on whether it makes sense to do as a canonicalization. I'd love to see some data on the impact of this.

Definetly, hopefully I'll be able to provide the data soon.

> Structure wise, I strongly encourage you to use predicated stores not CFG manipulation. LICM does not generally modify the CFG; we can, but the analysis updates are complicated. (See the diamond hoisting code.) I generally think that a predicate store instruction is the "right" construct here as it leads to more obvious optimization outcomes.

I haven't come across the predicated store instructions yet. Based on the first look, they definetly fit the needs better than the //CFG// manipulation, thanks for the directions. 
Here's my vision for using them, please let me know your thoughts:

- Hoisting load instructions into the preheader would still stay the same (using regular load instructions).
- An additional flag would still be needed. It'd be used (e.g. initialized in the preheader basic block, its value propagated through the basic blocks) the same way.
- Sinking the conditional store instructions into the exit block(s) with //CFG// manipulation would be replaced with an appropriate predicated store instructions sunk into the exit block(s).

The summary example would then look like this:

  ...
  entry:
    u.promoted = load u
    br for.cond
  for.cond:
    u.flag = phi [ false, entry ], [ u.flag.next, for.inc ]
    inc = phi [ u.promoted, entry ], [ inc.next, for.inc ]
    ...
  for.body: 
  ...
  if.then:
    inc.if.then = add inc, 1
    br for.inc
  for.inc:
    u.flag.next = phi [ true, if.then ], [ u.flag, for.body ]
    inc.next = phi [ inc.if.then, if.then ], [ inc, for.body ]
    ...
  for.cond.cleanup: ; This is the only exit block.
    u.flag.lcssa = phi [ u.flag, for.cond ] ; Get the flag value.
    inc.lcssa = phi [ inc, for.cond ]
    call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>> &u, i32 <u_alignment>, <1xi1> u.flag.lcssa)
    ret void
  }
  ...

I'll also take some more time diving deeper into the predicated instructions.

> However, our masked.load handling for scalar (e.g. single value vectors) leaves a bit to be desired. I'd started improving it a bit (with this goal in mind), but if you're serious about this, you'd need to spend some time working through related issues there first.

I'm guessing the same goes for the //masked.store.*// intrinsics, so

  call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>>* &u, i32 <u_alignment>, <1xi1> u.flag.lcssa)

should be swapped with something more appropriate for scalar values?
To summarize, I've rebased and fixed a few things @reames pointed out.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115244/new/

https://reviews.llvm.org/D115244

Files:
  llvm/lib/Transforms/Scalar/LICM.cpp
  llvm/test/Transforms/LICM/conditional-access-promotion.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D115244.419349.patch
Type: text/x-patch
Size: 13105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220331/a9eb53e2/attachment-0001.bin>