[PATCH] D28637: [PPC] Inline expansion of memcmp
Friedman, Eli via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 9 12:25:04 PST 2017
On 3/9/2017 11:06 AM, Zaara Syeda via Phabricator wrote:
> syzaara added inline comments.
>
>
> ================
> Comment at: lib/CodeGen/CodeGenPrepare.cpp:2233
> + Builder.Insert(NewBr);
> + return;
> + }
> ----------------
> efriedma wrote:
>> syzaara wrote:
>>> efriedma wrote:
>>>> Hmm... in isOnlyUsedInZeroEqualityComparison mode, it probably makes sense to perform more loads per block, to reduce the number of branches.
>>> I'm not sure what you mean here. We have the branches for doing an early exit so we don't need to continue loading more bytes when we find a difference.
>> A couple loads hitting the cache are cheaper than a branch in cases where the branch isn't predictable... but maybe not a big deal either way.
> I'm still not sure how this reduces the number of branches. We would still need to compare each of the loads to see which one had the difference.
Branchless 16-byte memcmp isOnlyUsedInZeroEqualityComparison:
_Bool memcmp16(__uint128_t *Src1, __uint128_t *Src2) {
return *Src1 == *Src2;
}
Resulting code on x86-64:
movq (%rdi), %rax
movq 8(%rdi), %rcx
xorq 8(%rsi), %rcx
xorq (%rsi), %rax
orq %rcx, %rax
sete %al
retq
> ================
> Comment at: lib/CodeGen/CodeGenPrepare.cpp:2269
> + Value *Subtract = Builder.CreateSub(And1, And2);
> + Value *Res = Builder.CreateSExtOrTrunc(Subtract, Builder.getInt32Ty());
> +
> ----------------
> efriedma wrote:
>> Would `bswap(Src1) > bswap(Src2) ? 1 : -1` be shorter?
> On power, having a load followed by a bswap will get converted to a load byte reversed hardware instruction. But I'm not sure if other targets will have a good way to handle bswap within gprs. It could introduce a longer sequence. Also this way, we are adding more branches with the comparison.
Just worry about which lowering is best one target at a time; we can
always add a target hook later if we need a different lowering for other
targets. And the "? 1 : -1" should get lowered to some branchless sequence.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the llvm-commits
mailing list