[PATCH] D36059: [memops] Add a new pass to inject fast-path code for specific library function calls.

Sun Jul 30 12:39:54 PDT 2017

chandlerc marked an inline comment as done.
chandlerc added a comment.

In https://reviews.llvm.org/D36059#825488, @mehdi_amini wrote:

> It seems to me to conceptually belong to the backend. Why isn't this part of CodeGenPrepare (or injected by the target as part of its pre-ISel IR passes)?

It definitely is similar to CGP.

The reason I didn't put it there after talking to folks (mostly Hal I think) was because I generally operate under the principle of "if it doesn't need to be in CGP, it should be separate" for maintenance, testing, etc. The usual case which necessitates transforms being in CGP is needing to participate in its iterative process, but that isn't true here. A common practical reason is because the logic is too small or isolated to really make sense as its own pass, but that doesn't seem to be true here as well.

As for where in the pipeline to put it, I'm open to suggestions, but putting it here has some advantages.

This code is forming a loop with array accesses within it. There is a lot of code (from LSR to CGP) that tries to help massage these patterns into the optimal form for the target. I didn't really want to have target-specific IR generation, and so having this pass run before LSR and CGP seems useful.

Similarly, LoopSink may also want to sink computations into this code if we start putting branch weights from profiling into it and some of these regions end up marked cold. So putting this before LoopSink seemed to make sense.

It is actively harmful for this code to be before any of the vectorization or unrolling passes though: we'll try to vectorize and unroll this loop when the whole point was to keep it small! ;] We could use loop metadata to prevent this, but scheduling it afterward seems easier (and honestly, those passes should use the trip count upper bound predicate and avoid the transformations, but that is an issue for another day).

Last but not least, this pass is relatively sensitive to alignment, and so putting it after we re-compute alignment from assumption information seemed goodness.

This narrows the position to one, between the alignment synthesis and LoopSink.

Still, while the above hopefully explains my thought process, it doesn't mean this is the *right* place. Very open to suggestions for other positioning in the pipeline and what issues would be addressed, or just why it would be more natural. Certainly, the closer to the target the better as this is a clearly very target specific transformation.

https://reviews.llvm.org/D36059