[PATCH] D64805: [AArch64] Don't call bcmp for small byte comparisons

Brian Rzycki via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 17 08:27:32 PDT 2019


brzycki added a comment.

Hello @SjoerdMeijer , thank you for the review.



================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:625
+  Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
+  Options.NumLoadsPerBlock = 2;
+  Options.AllowOverlappingLoads = true;
----------------
SjoerdMeijer wrote:
> why 2?
@evandro selected these based on the x8_64 values set in `llvm/lib/Target/X86/X86TargetTransformInfo.cpp`:
```
  Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
  Options.NumLoadsPerBlock = 2;
  // ...
  if (ST->is64Bit()) {
    Options.LoadSizes.push_back(8);
  }
  Options.LoadSizes.push_back(4);
  Options.LoadSizes.push_back(2);
  Options.LoadSizes.push_back(1);
```

I do not know why x86_64 selected these values.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:626
+  Options.NumLoadsPerBlock = 2;
+  Options.AllowOverlappingLoads = true;
+  Options.LoadSizes.push_back(8);
----------------
SjoerdMeijer wrote:
> How about vector loads and alignment? Can we generate them for this, and do we then also need 16 byte loads to loadsizes?
Interesting question, I'll defer to @evandro when he can comment on the ticket. There is a bit more logic in x86_64 he stripped out when building this patch:
```
  if (IsZeroCmp) {
    // Only enable vector loads for equality comparison. Right now the vector
    // version is not as fast for three way compare (see #33329).
    // TODO: enable AVX512 when the DAG is ready.
    // if (ST->hasAVX512()) Options.LoadSizes.push_back(64);
    const unsigned PreferredWidth = ST->getPreferVectorWidth();
    if (PreferredWidth >= 256 && ST->hasAVX2()) Options.LoadSizes.push_back(32);
    if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);
    // All GPR and vector loads can be unaligned. SIMD compare requires integer
    // vectors (SSE2/AVX2).
    Options.AllowOverlappingLoads = true;
  }
```


================
Comment at: llvm/test/CodeGen/AArch64/bcmp-inline-small.ll:2
+; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; CHECK-LABEL: bcmp_b2:
----------------
SjoerdMeijer wrote:
> I think tests are missing for:
> - code-size
> - edge cases for NumLoadsPerBlock and MaxNumLoads
Understood. I'll try to build cases to accommodate these checks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64805/new/

https://reviews.llvm.org/D64805





More information about the llvm-commits mailing list