[PATCH] D57504: RFC: Prototype & Roadmap for vector predication in LLVM

Wed Feb 12 08:05:28 PST 2020

rkruppe added a comment.

In D57504#1854330 <https://reviews.llvm.org/D57504#1854330>, @simoll wrote:

> Exactly. The VE target strictly requires `VL <= MVL` or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

Ok, I didn't realize VE's SETVL works like that. In that case we don't have much of a choice, unfortunately.

> Besides, this still allows you to speculate as long as MVL (as in the UB-causing bound for VL) does not go below VL... could you explain under which circumstance MVL would go below VL by hoisting? This is definitely not the case for static VL targets (x86) and also not for VE.

Of course, for lots of IR that we care about in practice, it will be quite simple to see that hoisting is safe, e.g. because:

- %evl it is a constant -1
- %evl is computed in a way that can be recognized to produce a small enough value (typical strip-mined loops)
- there are earlier unconditional VP operations with the same EVL value (most vectorized functions)

But you need //some// such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code, otherwise fall back to another implementation. This might mean having `vp.foo(..., i32 4)` instructions guarded by a runtime check that effectively determines whether that `4` is a legal value, and hoisting the computation out of the condition introduces UB in the executions where it isn't.

Whether this would lead to any end-to-end miscompilations is another question, but that's not a good excuse to implement known-incorrect optimizations.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57504/new/

https://reviews.llvm.org/D57504