[PATCH] D64805: [AArch64] Don't call bcmp for small byte comparisons
Clement Courbet via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 18 06:19:44 PDT 2019
courbet added a comment.
FYI there are some benchmarks in https://reviews.llvm.org/D64082 that are not submitted but could be used as basis for making decisions for this patch.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:625
+ Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
+ Options.NumLoadsPerBlock = 2;
+ Options.AllowOverlappingLoads = true;
----------------
brzycki wrote:
> SjoerdMeijer wrote:
> > why 2?
> @evandro selected these based on the x8_64 values set in `llvm/lib/Target/X86/X86TargetTransformInfo.cpp`:
> ```
> Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
> Options.NumLoadsPerBlock = 2;
> // ...
> if (ST->is64Bit()) {
> Options.LoadSizes.push_back(8);
> }
> Options.LoadSizes.push_back(4);
> Options.LoadSizes.push_back(2);
> Options.LoadSizes.push_back(1);
> ```
>
> I do not know why x86_64 selected these values.
This does trades loads for jumps: with this setting, each block will be twice as big (more work per block) but there will be half as many blocks.
This was found to be more efficient in general on x86, and might not be true for arm. This will have to be benchmarked.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:626
+ Options.NumLoadsPerBlock = 2;
+ Options.AllowOverlappingLoads = true;
+ Options.LoadSizes.push_back(8);
----------------
brzycki wrote:
> SjoerdMeijer wrote:
> > How about vector loads and alignment? Can we generate them for this, and do we then also need 16 byte loads to loadsizes?
> Interesting question, I'll defer to @evandro when he can comment on the ticket. There is a bit more logic in x86_64 he stripped out when building this patch:
> ```
> if (IsZeroCmp) {
> // Only enable vector loads for equality comparison. Right now the vector
> // version is not as fast for three way compare (see #33329).
> // TODO: enable AVX512 when the DAG is ready.
> // if (ST->hasAVX512()) Options.LoadSizes.push_back(64);
> const unsigned PreferredWidth = ST->getPreferVectorWidth();
> if (PreferredWidth >= 256 && ST->hasAVX2()) Options.LoadSizes.push_back(32);
> if (PreferredWidth >= 128 && ST->hasSSE2()) Options.LoadSizes.push_back(16);
> // All GPR and vector loads can be unaligned. SIMD compare requires integer
> // vectors (SSE2/AVX2).
> Options.AllowOverlappingLoads = true;
> }
> ```
For x86, vector loads are enabled only for equality comparison because there is no efficient vector-sized bswap.
I don't know much about arm, but:
* if vector loads and equality comparisons are efficient there is no reason not to use them if IsZeroCmp
* if there is an efficient vector-sized bswap vector sizes can be used in both cases
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64805/new/
https://reviews.llvm.org/D64805
More information about the llvm-commits
mailing list