[PATCH] D36059: [memops] Add a new pass to inject fast-path code for specific library function calls.

Fri Aug 4 10:25:16 PDT 2017

hfinkel added a comment.

In https://reviews.llvm.org/D36059#831659, @aemerson wrote:

> In https://reviews.llvm.org/D36059#828874, @chandlerc wrote:
>
> > IMO, there is no need for doing this in this place. If we're just leaving a marker here for the target to expand, we don't need to do anything. We already get a chance to custom expand the libcall in the target. Adding the versioning doesn't make that any simpler given that it still needs to introduce a loop.
>
>
> The difference is that at the target codegen level we can't as easily do the predicate analysis as we can at the IR level.
>
> > If, for a particular target, it is worth emitting a versioned, carefully target-crafted loop or instruction sequence, I would expect them to not use this pass but to custom lower the calls in the backend much like x86 does for constant-size calls.
>
> But in the patch description you say that one of the challenges is constructing *just* the right IR to get efficient codegen from the backend. I understand this is for x86 right now, but if you don't have plans to allow other targets to work well with it, why not put it into the Target/X86 directory and make it a backend-specific IR pass to avoid confusion?

What this pass is doing is very generic, as is the IR produced, and the IR seems likely to me to make sense on many targets. No target is obligated to use this, but I'd like this to remain target independent. Target independent passes don't belong in the targets, even if they've only been tuned on one target so far.

> To clear up one last thing for me, are you saying that there are no performance impacts across SPEC at all? Even astar and dealII? Any impacts across the test-suite benchmarks?

================
Comment at: lib/Transforms/Scalar/FastPathLibCalls.cpp:92
+    // If value tracking knows enough, we're done.
+    if (llvm::isKnownNonZero(V, DL, /*Depth*/ 0, &AC, &I, &DT))
+      return true;
----------------
Depth = 0 means no limit, right? Is there a reason this can't go quadratic? If so, we should comment here. Otherwise, maybe we need a depth limit.

https://reviews.llvm.org/D36059