[PATCH] D157279: [CodeGen] Disable FP LD1RX instructions generation for Neoverse-V1

Wed Aug 9 23:29:52 PDT 2023

dmgreen added a comment.

This sounds like an interesting one.. We have certainly seen cases before where instructions are worth splitting out into multiple parts, but it often helps in one case and hurts in others. It seems like the idea here is that the throughput of sve loads is limited to 2, but with scalar fp loads that can go up to 3? So in load-throughput limited situations the expanded nodes win out (especially if they can use ldp). Wouldn't the opposite be true too though? If it was vector-instruction limited or frontend limited then multiple instructions will be worse? You could imagine it being done in the load/store optimizer if it could detect cases where it could use ldp.

If this is better (and I imagine it might be in many situations), then it can equally apply to integer too. It would just need to be changed to an fp load, to make sure it didn't pay the cost of crossing between register banks.

================
Comment at: llvm/test/CodeGen/AArch64/sve-ld1r.ll:1254-1255
+; CHECK-NO-LD1R-NEXT:    ldr h1, [x0]
+; CHECK-NO-LD1R-NEXT:    mov z0.h, #0 // =0x0
+; CHECK-NO-LD1R-NEXT:    mov z0.h, p0/m, h1
+; CHECK-NO-LD1R-NEXT:    ret
----------------
These with multiple extra instructions look quite a bit worse. It might not apply for predicated instructions with zeros.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157279/new/

https://reviews.llvm.org/D157279