[PATCH] D76709: [Target][ARM] Adding MVE VPT Optimisation Pass

Wed Mar 25 04:17:56 PDT 2020

dmgreen added a comment.

> Unfortunately, I've written this pass in a single commit, so there is no easy way for me to split this patch in 2.
>  I can do it if you want, it's not impossible, but it's going to take me a while to get right.  Also, the second optimisation is a relatively small part of this patch, so patch #1 would still be a large patch.

Sure. If it's relatively small it should hopefully be simple enough to pull out into a separate commit.

Also can you make sure that "register" and "zero" variants of the VCMP's are tested. The inverting login around those might be different to a normal compare.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vcmpf.ll:700
 ; CHECK-MVEFP-NEXT:    vpt.f32 le, q1, q0
-; CHECK-MVEFP-NEXT:    vcmpt.f32 lt, q0, q1
+; CHECK-MVEFP-NEXT:    vpnott
 ; CHECK-MVEFP-NEXT:    vpnot
----------------
Pierre-vh wrote:
> dmgreen wrote:
> > What's going on here? I think it needs to be checking q1 <= q0 && q0 < q1. ord means ordered, which means no NaN's. (Floating point compares can be tricky like that).
> I didn't know that. In which case is it not "safe" to replace a float VCMP even when the conditions are met?
> What would be the correct behaviour in this case?
> Should I disable this optimisation for all float VCMPs?
Umm. I think for floats you cannot swap the operands. That only works for integers. You can still use the opposite condition code (lt <> ge, for example). Have a look at getOppositeCondition and isValidMVECond in ISelLowering for how it's done there.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir:1
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -run-pass arm-mve-vpt-opts %s -o - | FileCheck %s
----------------
Pierre-vh wrote:
> dmgreen wrote:
> > I think this test would be simpler if there was a number of test functions inside it, each testing one thing (or maybe a small collection of things). Otherwise they run into each other a bit, and it will be hard to tell what's wrong if one of them starts to fail.
> Would putting them all in different basic blocks be enough, or do you prefer functions?
> The optimisation is done per basic-block, so it'd behave the same as function (but the test would be shorter).
Separate functions would be more canonical. You can hopefully simplify the test quite a bit to make it so that's not too verbose.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir:9-15
+  entry:
+    %conv.i = zext i16 %p to i32
+    %0 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> undef, <4 x float> %a, <4 x float> %b, i32 %conv.i) #2
+    %1 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> undef, <4 x float> %0, <4 x float> %0, i32 %conv.i) #2
+    %2 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> %inactive1, <4 x float> %1, <4 x float> %b, i32 %conv.i) #2
+    %3 = tail call nnan ninf nsz <4 x float> @llvm.arm.mve.vminnm.m.v4f32.v4f32.v4f32.v4f32.i32(<4 x float> %inactive2, <4 x float> %2, <4 x float> %b, i32 %conv.i) #2
+    ret <4 x float> %3
----------------
Functions don't actually need bodies. They can just contain unreachable. And if the test below doesn't bare any resemblance to the code here, that is probably for the better.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir:20-22
+  attributes #0 = { nounwind readnone "correctly-rounded-divide-sqrt-fp-math"="false" "denormal-fp-math"="preserve-sign" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="128" "frame-pointer"="none" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
+  attributes #1 = { nounwind readnone }
+  attributes #2 = { nounwind }
----------------
I think a lot of this can probably be removed, if you replace +mve.fp it with command line options.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir:27-64
+alignment:       4
+exposesReturnsTwice: false
+legalized:       false
+regBankSelected: false
+selected:        false
+failedISel:      false
+tracksRegLiveness: true
----------------
I think you can remove a lot of this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76709/new/

https://reviews.llvm.org/D76709