[PATCH] D76709: [Target][ARM] Adding MVE VPT Optimisation Pass

Wed Mar 25 01:03:12 PDT 2020

Pierre-vh marked 5 inline comments as done.
Pierre-vh added a comment.

In D76709#1940297 <https://reviews.llvm.org/D76709#1940297>, @dmgreen wrote:

> Is it possible to split this into two patches? The pass and "replaces VCMPs with VPNOTs when possible" part, then the second part to replace the re-use with the not.  I think that would make each part easier to review, more manageable.
>
> > The first optimisation (VCMPs into VPNOTs) could technically be done in the MVE VPT Block insertion pass, but I think it's better to do it in this new pass instead of overloading the block Insertion pass too much. 
> >  The second optimisation (VPNOT Insertion for "spill prevention") can't be done in the Block Insertion pass as it needs to be done before register allocation (= before spill/reload instructions are emitted).
> >  Overall, since both optimisations deal with VPNOTs and aren't directly related to VPT Block insertion/creation, I felt that it was a good idea to put them in a separate pass.
>
> Plus (I think) in order to do #2, it's easier if you have already done #1.

Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
I can do it if you want, it's not impossible, but it's going to take me a while to get right.  Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vcmpf.ll:700
 ; CHECK-MVEFP-NEXT:    vpt.f32 le, q1, q0
-; CHECK-MVEFP-NEXT:    vcmpt.f32 lt, q0, q1
+; CHECK-MVEFP-NEXT:    vpnott
 ; CHECK-MVEFP-NEXT:    vpnot
----------------
dmgreen wrote:
> What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered, which means no NaN's. (Floating point compares can be tricky like that).
I didn't know that. In which case is it not "safe" to replace a float VCMP even when the conditions are met?
What would be the correct behaviour in this case?
Should I disable this optimisation for all float VCMPs?

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir:1
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -run-pass arm-mve-vpt-opts %s -o - | FileCheck %s
----------------
dmgreen wrote:
> I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail.
Would putting them all in different basic blocks be enough, or do you prefer functions?
The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76709/new/

https://reviews.llvm.org/D76709