[PATCH] D63938: [ARM] Stop using scalar FP instructions in integer-only MVE mode.

Mon Jul 1 09:32:00 PDT 2019

simon_tatham marked 3 inline comments as done.
simon_tatham added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:605
     addRegisterClass(MVT::f32, &ARM::SPRRegClass);
     addRegisterClass(MVT::f64, &ARM::DPRRegClass);
+    if (!Subtarget->hasVFP2Base()) {
----------------
dmgreen wrote:
> Should this be behind hasFPRegs64 instead?
I'm not sure it should, actually. `FPRegs64` is a confusing feature, and quite likely should have been named something different (which is my fault, if so).

As far as I can see the only architectural instruction actually conditional on that feature is the scalar FP //move// instruction that copies a d-register (e.g. `vmov.f64 d0,d1`). It doesn't affect loads and stores of d-regs. So I think it shouldn't affect this decision.

On the other hand, I'm pretty sure you're right that I should be separately deciding whether to `setAllExpand` f32 and f64 based on different feature queries.

================
Comment at: llvm/lib/Target/ARM/ARMISelLowering.cpp:617

   if (Subtarget->hasFullFP16()) {
     addRegisterClass(MVT::f16, &ARM::HPRRegClass);
----------------
dmgreen wrote:
> How come we don't have to do the same for fullfp16 to work?
I think because the fp16 legality boundary happens in a different place.

With the MVE-integer-only level of support for f32 or f64, you have registers of the right size which you can load and store, but you can't do arithmetic on them. But with the not-full fp16 support, you don't //even// have a set of registers the right size – there are no loads and stores of HPRs.

================
Comment at: llvm/test/CodeGen/Thumb2/float-ops.ll:268
+; ONLYREGS: lsls    r2, r2, #31
+; ONLYREGS: vmovne.f32      s2, s0
 ; HARD: lsls    r0, r0, #31
----------------
dmgreen wrote:
> This is worse than before by the looks of it? We move things into fp registers just to move them out again.
Perhaps that's true, but I'd rather fix the correctness first and get a set of regression tests passing, and //then// we can worry about recovering the performance with those tests in place to prevent breaking correctness again.

Especially since in the general case it's not really clear how you //should// choose which kind of register to use for a value. Keeping it in an FP register is obviously wasteful in this case, but in another case where register pressure is high, might it save a spill or two? I think getting the right answers in cases larger than this simple one might not be trivial.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63938/new/

https://reviews.llvm.org/D63938