[PATCH] D75145: [PassManager] adjust VectorCombine placement
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 6 12:09:00 PST 2020
dmgreen added a comment.
My benchmarks were still running. D75757 <https://reviews.llvm.org/D75757> wasn't in review long enough for them to complete before it went in (they seem to be being a bit slow, and phab seems to be sending emails through in chunks).
It looks like it's made things (a lot) worse, not better. For "normal" code this time, not vectorised. The issue here with vector loops might be improved. It's hard to tell. There are so many other regressions I can't really give you a quick answer. I mean, there are some improvements mixed in, but the total is definitely down. Not sure if this is an ARM issue again, or something more general. It doesn't effect (-Oz) codesize at all, or 6m, which might suggest that it's not just as simple as it disabling some analyses. I will see what I can find out, but we are going in the wrong direction here.
Adding some phase ordering tests for some of this sounds very useful. I'll see what I can add. With unrolling and vectorisation and the rest, they might get quite verbose. I'll see.
And you asked a question; The part of the assembly that was important for performance, from this first case was this vector body:
vldrh.u16 q0, [r0], #16
subs.w r12, r12, #8
vqabs.s16 q0, q0
vstrb.8 q0, [r1], #16
bne .LBB0_4
Which could be using a LE low overhead loop instruction:
vldrh.u16 q0, [r0], #16
vqabs.s16 q0, q0
vstrb.8 q0, [r1], #16
le lr, .LBB0_4
There is a pass in the IR part of the backend that looks for loops, finds the BETC and adds hardware loop intrinsics for it. It's essentially a hardware loop so you don't need to execute the subs or the bne on each iteration.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75145/new/
https://reviews.llvm.org/D75145
More information about the llvm-commits
mailing list