[PATCH] D157279: [CodeGen] Disable FP LD1RX instructions generation for Neoverse-V1
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 9 23:29:52 PDT 2023
dmgreen added a comment.
This sounds like an interesting one.. We have certainly seen cases before where instructions are worth splitting out into multiple parts, but it often helps in one case and hurts in others. It seems like the idea here is that the throughput of sve loads is limited to 2, but with scalar fp loads that can go up to 3? So in load-throughput limited situations the expanded nodes win out (especially if they can use ldp). Wouldn't the opposite be true too though? If it was vector-instruction limited or frontend limited then multiple instructions will be worse? You could imagine it being done in the load/store optimizer if it could detect cases where it could use ldp.
If this is better (and I imagine it might be in many situations), then it can equally apply to integer too. It would just need to be changed to an fp load, to make sure it didn't pay the cost of crossing between register banks.
================
Comment at: llvm/test/CodeGen/AArch64/sve-ld1r.ll:1254-1255
+; CHECK-NO-LD1R-NEXT: ldr h1, [x0]
+; CHECK-NO-LD1R-NEXT: mov z0.h, #0 // =0x0
+; CHECK-NO-LD1R-NEXT: mov z0.h, p0/m, h1
+; CHECK-NO-LD1R-NEXT: ret
----------------
These with multiple extra instructions look quite a bit worse. It might not apply for predicated instructions with zeros.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157279/new/
https://reviews.llvm.org/D157279
More information about the llvm-commits
mailing list