[PATCH] D78206: [Target][ARM] Make Low Overhead Loops coexist with VPT blocks

Pierre van Houtryve via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 7 05:38:01 PDT 2020


Pierre-vh marked 2 inline comments as done.
Pierre-vh added inline comments.


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:263
+          if (Def->getParent() == MI->getParent())
+            if (getVPTInstrPredicate(*Def) == ARMVCC::None)
+              return false;
----------------
samparker wrote:
> I feel like this will only be safe if we're still guaranteeing that any predicated instruction is predicated upon the VCTP. I'm concerned about some code that may look llike this:
> 
> 
> ```
> loop.ph:
>   a = ...
>   b = ...
>   c = ...
>   n = ...
>   z = ...
>   DLS
> 
> loop:
>   VCTP
>   VPTT z, n
>   VSTRT
>   VPTTTT a, b
>   VLDRT d
>   VLDRT e
>   VADDT z, e, d
>   LE loop
> ```
> 
> Is it possible now to have a VCTP in the loop with nothing actually predicated upon it?
I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected.

> Is it possible now to have a VCTP in the loop with nothing actually predicated upon it?

I think it's possible if something clears VPR.P0, or overwrites it, right after the VCTP, like so:
```
VCTP
VMSR VPR.P0 R0
...
```
(Of course I don't know if that ever happens)
Other instructions, such VCMP/VPT, "and" their result with VPR.P0 so those should be fine I believe.

What do you think I should do here ? Should I check that the Def is inside a VPT block that we know? (e.g. by saving all predicated Defs in a set and checking whether Def is in the set).


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:859
 
-  // Start a new vpt block when we discover a vpt.
-  if (MI->getOpcode() == ARM::MVE_VPST) {
     VPTBlocks.emplace_back(MI, CurrentPredicate);
     CurrentBlock = &VPTBlocks.back();
----------------
samparker wrote:
> I doubt CurrentPredicate is correct here as a VPT would be generating its own, maybe it should be cleared..?
According to the [[ https://static.docs.arm.com/ddi0553/bi/DDI0553B_i_armv8m_arm.pdf?_ga=2.35567010.2014913012.1585728100-518845949.1585040266 | Armv8-M Architecture Reference Manual ]], VPT is similar to VCMP, it `and`'s the result of the comparison with the current element mask (see page 1174) , it doesn't overwrite VPR.P0 completely, so I don't think this needs to be cleared.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78206/new/

https://reviews.llvm.org/D78206





More information about the llvm-commits mailing list