[PATCH] D132559: [AArch64] Add support for 128-bit non temporal loads.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 24 10:37:14 PDT 2022


fhahn added a comment.

In D132559#3746096 <https://reviews.llvm.org/D132559#3746096>, @dmgreen wrote:

> My understanding is that the !nontemporal metadata is a hint to the backend that the load data will not be reused so caching it is unlikely to be useful for performance. It isn't a mandate. Similarly the ldnp pairwise loads are a hint to the microarchitecture that the loaded data is likely not going to be reused so doesn't need caching. The micro-architecture is free to ignore the hint again.
> With that, is it beneficial to force uses of 128bit non-temperoral loads if it leads to more instructions overall? For i256 it was almost certainly a good thing to do, but for i128 it sounds like it might end up slowing things down more than it helps. I'm not sure where the balance point lies.

I agree that this seems more uarch specific. If the selected uarch just ignores the hints, we shouldn't try too hard to generate `LDNP`. But on some uarch avoiding cache pollution can outweigh the drawbacks of having to issue more load instructions (and the overhead of a few extra movs should be negligible on most beefier uarchs). I think codegen could also be improved for the cases where we have input types that need further legalization.

In most cases the hint comes directly from the user who hopefully know how to use it.



================
Comment at: llvm/test/CodeGen/AArch64/nontemporal-load.ll:213
 ; CHECK:       ; %bb.0:
-; CHECK-NEXT:    ldp q1, q2, [x0, #32]
-; CHECK-NEXT:    ldp q3, q4, [x0]
-; CHECK-NEXT:    ldr s0, [x0, #64]
-; CHECK-NEXT:    stp q3, q4, [x8]
-; CHECK-NEXT:    stp q1, q2, [x8, #32]
-; CHECK-NEXT:    str s0, [x8, #64]
+; CHECK-NEXT:    ldnp d1, d0, [x0, #16]
+; CHECK-NEXT:    ldnp d3, d2, [x0, #48]
----------------
I guess we would also use the `LDNQ` variant here. I assume the reason we don't is because `<17 x float>` will get broken down to `<4 x float>` pieces during legalization.

@dmgreen do you by any chance have any ideas on where to best improve this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132559/new/

https://reviews.llvm.org/D132559



More information about the llvm-commits mailing list