[PATCH] D28637: [PPC] Inline expansion of memcmp

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 16 03:56:48 PST 2017

nemanjai requested changes to this revision.
nemanjai added a comment.
This revision now requires changes to proceed.

A couple of general comments:

- Make sure you add Hal Finkel, Craig Topper, Renato Golin, Tim Northover, Ulrich Weigand, etc. to the reviewers list to cover other targets
- You may also want to add those that responded to the original RFC
- I think the special treatment of 8-bytes is unnecessary in a target-independent pass (for things like statistics, decisions, etc.)
- Add a bit more testing (multiple blocks, multiple types, negative tests for various reasons such as non-const comparison, max exceeded, etc.)

Comment at: include/llvm/Analysis/TargetTransformInfoImpl.h:255
+  bool expandMemCmp() { return false; }
It would probably be better to leave this optimization much more flexible. As such, you should probably pass some information about a specific call to `memcmp` that would allow the target to decide not only if **ALL** `memcmp` calls should be expanded but if **THIS** `memcmp` call should be expanded.
I'm not sure if the best option is to pass the actual `CallInst` or the important details (perhaps in a `MemcmpInfo` struct - if people would like such a data structure created). I personally prefer the former.
Finally, I think the type for the load should be an output parameter of this function as not every target would necessarily like an `i64*` load on every `memcmp` call.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:85
+STATISTIC(NumMemCmpNotConstant, "Number of memcmp calls without constant size");
+STATISTIC(NumMemCmpNot8ByteMultiples, "Number of memcmp calls without 8 byte multiples");
+STATISTIC(NumMemCmpGreaterThanMax, "Number of memcmp calls with size greater than max size");
This is one of those uses of 8-bytes that don't necessarily mean anything for some targets.
I think I'd opt for a statistic such as `NumMemCmpNotMultipleOfLDSize`.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1875
   BasicBlock *BB = CI->getParent();
   // Lower inline assembly if we can.
Unrelated formatting change. Please refrain from this.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1914
+///   ret i32 %17
+static bool memcmpExpansion(CallInst *CI, const TargetLowering *TLI,
+                            const DataLayout *DL) {
I would re-factor this function into things that are common and things that may change depending on endianness, etc.
For example, deciding whether you'll inline the call, calculating the number of blocks and creating the basic blocks is common code once you know the load types. However, the code to load/compare/calculate the return value will be specific. That can be done either in one separate function with the endianness as a parameter or in two separate functions for LE/BE.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1917
+  if (!DL->isLittleEndian() || DL->getPointerSizeInBits() != 64) {
+    return false;
There isn't anything special about Little Endian targets. We bail out here because the Big Endian code gen was not implemented here. Please add a FIXME comment here to complete the implementation for BE systems as well.

The check for 64-bit target is also an artifact of the current implementation and not a fundamental requirement here.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1924
+  IRBuilder<> Builder(C);
+  Type *Int64Ty = Type::getInt64Ty(C);
+  Type *Int64PtrTy = Type::getInt64PtrTy(C);
These should come from the TTI.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1943
+  if (SizeVal > TLI->getMaxLoadsPerMemcmp(0)) {
+    NumMemCmpGreaterThanMax++;
Is this how we want to use `getMaxLoadsPerMemcmp()`? The name seems to suggest to me that this is a threshold for how many actual load instructions can be emitted. But this appears to be used as the maximum number of bytes that can be loaded. Or am I misreading what `SizeVal` is?

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1951
+  std::vector<BasicBlock *> BBList;
+  std::vector<Value *> XorList;
Any reason to use `std::vector`'s here rather than `SmallVector`?

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1970
+  // remove the previous terminator and add a branch to the first load, compare block
+  StartBlock->getTerminator()->eraseFromParent();
Full sentences please. And the line is too long anyway, please clang-format this patch.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1981
+    // Cast the source pointers to i64* for generating 8 byte loads
+    if (i == 0) {
+      Source1Cast = Builder.CreateBitCast(Source1, Int64PtrTy);
It seems kind of weird to me to have this "iteration zero" check here. I think it's more natural to do this outside the loop and put the loads ahead of the increment GEP's.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:1992
+    LoadSrc1 = Builder.CreateLoad(Int64Ty, Source1Cast);
+    LoadSrc2 = Builder.CreateLoad(Int64Ty, Source2Cast);
+    Src1List.push_back(LoadSrc1);
efriedma wrote:
> You need to explicitly set the alignment to 1 here.
Do we want to set it to 1 or to whatever the alignment the arguments to the CallInst had?
But I agree we need to set it explicitly.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:2230
+  bool expanded = false;
+  if (!CI->isNoBuiltin() && !F->hasLocalLinkage() && F->hasName() &&
+      TLInfo->getLibFunc(F->getName(), Func) &&
Two comments here:
1. I think a complex conditional expression like this is better suited for an early exit (i.e. if any one of the required conditions aren't met, return false).
2. The condition needs a comment explaining why all of these are required.

Comment at: lib/CodeGen/CodeGenPrepare.cpp:2238
+      if (TTI->expandMemCmp()) {
+        expanded = memcmpExpansion(CI, TLI, DL);
+        if (expanded) {
Uppercase variable names.


More information about the llvm-commits mailing list