[PATCH] D78206: [Target][ARM] Make Low Overhead Loops coexist with VPT blocks
Pierre van Houtryve via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 7 05:38:01 PDT 2020
Pierre-vh marked 2 inline comments as done.
Pierre-vh added inline comments.
================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:263
+ if (Def->getParent() == MI->getParent())
+ if (getVPTInstrPredicate(*Def) == ARMVCC::None)
+ return false;
----------------
samparker wrote:
> I feel like this will only be safe if we're still guaranteeing that any predicated instruction is predicated upon the VCTP. I'm concerned about some code that may look llike this:
>
>
> ```
> loop.ph:
> a = ...
> b = ...
> c = ...
> n = ...
> z = ...
> DLS
>
> loop:
> VCTP
> VPTT z, n
> VSTRT
> VPTTTT a, b
> VLDRT d
> VLDRT e
> VADDT z, e, d
> LE loop
> ```
>
> Is it possible now to have a VCTP in the loop with nothing actually predicated upon it?
I did this with the assumption that VPT blocks that aren't related to the VCTP would be rejected.
> Is it possible now to have a VCTP in the loop with nothing actually predicated upon it?
I think it's possible if something clears VPR.P0, or overwrites it, right after the VCTP, like so:
```
VCTP
VMSR VPR.P0 R0
...
```
(Of course I don't know if that ever happens)
Other instructions, such VCMP/VPT, "and" their result with VPR.P0 so those should be fine I believe.
What do you think I should do here ? Should I check that the Def is inside a VPT block that we know? (e.g. by saving all predicated Defs in a set and checking whether Def is in the set).
================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:859
- // Start a new vpt block when we discover a vpt.
- if (MI->getOpcode() == ARM::MVE_VPST) {
VPTBlocks.emplace_back(MI, CurrentPredicate);
CurrentBlock = &VPTBlocks.back();
----------------
samparker wrote:
> I doubt CurrentPredicate is correct here as a VPT would be generating its own, maybe it should be cleared..?
According to the [[ https://static.docs.arm.com/ddi0553/bi/DDI0553B_i_armv8m_arm.pdf?_ga=2.35567010.2014913012.1585728100-518845949.1585040266 | Armv8-M Architecture Reference Manual ]], VPT is similar to VCMP, it `and`'s the result of the comparison with the current element mask (see page 1174) , it doesn't overwrite VPR.P0 completely, so I don't think this needs to be cleared.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D78206/new/
https://reviews.llvm.org/D78206
More information about the llvm-commits
mailing list