[PATCH] D28637: [PPC] Inline expansion of memcmp

Zaara Syeda via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 16 12:00:47 PST 2017

syzaara added a comment.

In https://reviews.llvm.org/D28637#647220, @nemanjai wrote:

> In https://reviews.llvm.org/D28637#647200, @uweigand wrote:
> > On SystemZ, we use the existing callback EmitTargetCodeForMemcmp to expand memcmp calls.    Can you explain how the new hook is different and why we'd need two hooks?
> I think the primary difference is that SystemZ has instructions that can perform these comparisons directly (i.e. with an instruction that compares memory from two addresses). On Power, we have to do the loads, comparison and branch out of the load sequence as soon as a mismatch is found. And the SDAG isn't well suited for adding new control flow.
>  So I'd say the old hook is useful for targets that have instructions that can terminate the comparison on a mismatch without an explicit branch, whereas the new hook is useful for targets that don't.
>  But I'll let Zaara chime in for the full answer.

In https://reviews.llvm.org/D28637#645519, @efriedma wrote:

> How does this compare to the existing SelectionDAGBuilder::visitMemCmpCall/EmitTargetCodeForMemcmp/etc.?

SelectionDAGBuilder::visitMemCmpCall optimizes specific cases of memcmp such as when size=0 or when we are using the return value of memcmp for comparing !=0 for sizes 2,4,8. These sizes wouldn't need multiple basic blocks since we can just use one load and a branch. This expansion is for the more general case when the size is a constant less than a specific threshold and expands into multiple basic blocks with early exits. Both visitMemCmpCall and EmitTargetCodeForMemcmp work on SelectionDAG where introducing control flow and working with multiple basic blocks isn't preferred.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1943
+  if (SizeVal > TLI->getMaxLoadsPerMemcmp(0)) {
+    NumMemCmpGreaterThanMax++;
nemanjai wrote:
> Is this how we want to use `getMaxLoadsPerMemcmp()`? The name seems to suggest to me that this is a threshold for how many actual load instructions can be emitted. But this appears to be used as the maximum number of bytes that can be loaded. Or am I misreading what `SizeVal` is?
Ya, this was the same way it was used for memcpy/memset etc. It seems like number of stores but actually returns number of bytes. Just followed it for consistency.


More information about the llvm-commits mailing list