[PATCH] D64805: [AArch64] Don't call bcmp for small byte comparisons
Brian Rzycki via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 17 08:27:32 PDT 2019
brzycki added a comment.
Hello @SjoerdMeijer , thank you for the review.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:625
+ Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
+ Options.NumLoadsPerBlock = 2;
+ Options.AllowOverlappingLoads = true;
----------------
SjoerdMeijer wrote:
> why 2?
@evandro selected these based on the x8_64 values set in `llvm/lib/Target/X86/X86TargetTransformInfo.cpp`:
```
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
Options.NumLoadsPerBlock = 2;
// ...
if (ST->is64Bit()) {
Options.LoadSizes.push_back(8);
}
Options.LoadSizes.push_back(4);
Options.LoadSizes.push_back(2);
Options.LoadSizes.push_back(1);
```
I do not know why x86_64 selected these values.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:626
+ Options.NumLoadsPerBlock = 2;
+ Options.AllowOverlappingLoads = true;
+ Options.LoadSizes.push_back(8);
----------------
SjoerdMeijer wrote:
> How about vector loads and alignment? Can we generate them for this, and do we then also need 16 byte loads to loadsizes?
Interesting question, I'll defer to @evandro when he can comment on the ticket. There is a bit more logic in x86_64 he stripped out when building this patch:
```
if (IsZeroCmp) {
// Only enable vector loads for equality comparison. Right now the vector
// version is not as fast for three way compare (see #33329).
// TODO: enable AVX512 when the DAG is ready.
// if (ST->hasAVX512()) Options.LoadSizes.push_back(64);
const unsigned PreferredWidth = ST->getPreferVectorWidth();
if (PreferredWidth >= 256 && ST->hasAVX2()) Options.LoadSizes.push_back(32);
if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);
// All GPR and vector loads can be unaligned. SIMD compare requires integer
// vectors (SSE2/AVX2).
Options.AllowOverlappingLoads = true;
}
```
================
Comment at: llvm/test/CodeGen/AArch64/bcmp-inline-small.ll:2
+; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; CHECK-LABEL: bcmp_b2:
----------------
SjoerdMeijer wrote:
> I think tests are missing for:
> - code-size
> - edge cases for NumLoadsPerBlock and MaxNumLoads
Understood. I'll try to build cases to accommodate these checks.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64805/new/
https://reviews.llvm.org/D64805
More information about the llvm-commits
mailing list