[PATCH] D123638: [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 19 23:40:19 PDT 2022


dmgreen added a comment.

Thanks for looking into this! I was hoping to take a look at some point too, this has saved me a job :)

Do you have any performance results to suggest it is a good idea, past the obvious that it sounds like it should be better? Some very quick runs didn't look amazing, but perhaps they are in some ways unlucky.



================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2701
+
+    // Check for broadcast loads.
+    if (Kind == TTI::SK_Broadcast) {
----------------
This code can be before the table. The table and the CostTableLookup should be together.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2703
+    if (Kind == TTI::SK_Broadcast) {
+      bool IsLoad = !Args.empty() && llvm::all_of(Args, [](const Value *V) {
+        return isa<LoadInst>(V);
----------------
I would expect it to have one Arg for a splat.

I guess this would not match the canonical representation for a splat load too, except from where it is being created from scalar code. Not sure what to do about that, but it seems unfortunate that the cost will go up after it has been vectorized.
```
  %l = load i8, i8 *%p
  %i = insertelement <16 x i8> poison, i8 %l, i32 0
  %s = shufflevector <16 x i8> %i, <16 x i8> poison, <16 x i32> zeroinitializer
```


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:283
 
+  bool isLegalBroadcastLoad(Type *ElementTy, unsigned NumElements) const {
+    // ld1r
----------------
I'm not a fan of these "isLegal" methods for things that should just have a cost.
It should either be passed a VectorType or the NumElements should be an ElementCount, to allow for scalable vector types.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:285
+    // ld1r
+    return ST->hasNEON();
+  }
----------------
The "legal" types for NEON would be something where the legalized type is one of
{v8i8, v16i8, v4i16, v8i16, v2i32, v4i32, v1i64, v2i64}.
But also the floating point equivalents of them. I guess I don't really know what "legal" means here. The X86 version of this method seems to account for NumElements, but if a v4i64 can just be legalized to two v2i64's, it can still generate a splatload efficiently and copy that to another value.

Hmm. For now lets say that the ElementCount size needs to be one of {8, 16, 32, 64}, and the size of the vector would be at least 64bits. That should rule out types we don't have at least, and larger vectors can still be treated as cheap.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123638/new/

https://reviews.llvm.org/D123638



More information about the llvm-commits mailing list