[PATCH] D36059: [memops] Add a new pass to inject fast-path code for specific library function calls.

Fri Aug 4 02:45:40 PDT 2017

aemerson added a comment.

In https://reviews.llvm.org/D36059#828874, @chandlerc wrote:

> IMO, there is no need for doing this in this place. If we're just leaving a marker here for the target to expand, we don't need to do anything. We already get a chance to custom expand the libcall in the target. Adding the versioning doesn't make that any simpler given that it still needs to introduce a loop.

The difference is that at the target codegen level we can't as easily do the predicate analysis as we can at the IR level.

> If, for a particular target, it is worth emitting a versioned, carefully target-crafted loop or instruction sequence, I would expect them to not use this pass but to custom lower the calls in the backend much like x86 does for constant-size calls.

But in the patch description you say that one of the challenges is constructing *just* the right IR to get efficient codegen from the backend. I understand this is for x86 right now, but if you don't have plans to allow other targets to work well with it, why not put it into the Target/X86 directory and make it a backend-specific IR pass to avoid confusion?

To clear up one last thing for me, are you saying that there are no performance impacts across SPEC at all? Even astar and dealII? Any impacts across the test-suite benchmarks?

================
Comment at: lib/Transforms/Scalar/FastPathLibCalls.cpp:101
+    // on a test against zero where the non-zero edge dominates the location.
+    int Depth = 0;
+    for (DomTreeNode *N = DT.getNode(I.getParent()); N && Depth < 10;
----------------
AFAICT this depth doesn't seem to be modified anywhere.

https://reviews.llvm.org/D36059