[PATCH] D86022: [ARM] Allow tail predication of VLDn

Tue Aug 18 07:56:31 PDT 2020

dmgreen added a comment.

In D86022#2223660 <https://reviews.llvm.org/D86022#2223660>, @samparker wrote:

> This has made me think a bit.. but it seems to make sense and from looking again at the existing checks, they look enough for correctness - to my pleasant surprise!
> It feels odd though, that these instructions can't be used in a VPT block but we can use them here... I guess because they actually use mechanisms. But have we got confirmation that both/all registers are always guaranteed to be updated properly in a tail-predicated loop? I don't think there's anything in the reference manual about that limitation but I also haven't found it very clear anyway!

Yeah. The architectural manual doesn't use the predicate in the VLD2/4 code. So they can't be predicated and always just load the whole vector width of values. This is OK so long as the original code said it was safe to load from that many memory locations, and as this is from intrinsics (not autovec) that seems valid. I even tested it. And for normal loads it's OK to load less values if they are not going to be used.

> One case which we'd have to avoid is if, in a horrible circumstance, LR is used as an address register which is post-indexed and predication could become UNKNOWN, but that isn't specific to VLD2/4.

Do you mean:

  The behavior of a beat-wise capable instruction that modifies LR and is within a tail predicated low overhead loop
  is CONSTRAINED UNPREDICTABLE, the permitted behaviors are either of

That's a new one to me.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86022/new/

https://reviews.llvm.org/D86022