[PATCH] D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true

Fri Jun 1 06:53:51 PDT 2018

rengolin added a comment.

In https://reviews.llvm.org/D46283#1118926, @zatrazz wrote:

> | 401.bzip2 | -2.04 |
> |
>
> I will check if 401.bzip2 slight drop is just noise or something related to this patch, but regardless I do think this change should yield better performance in most scenarios.

We have to be careful, though. In the past we have accepted this kind of scenario as "obviously good" because geomean is better. But that's a fallacy that has brought some pain over the past few years.

While a positive geomean is good, a -2% on bzip will mean someone will be trying to fix that later on, and might just as well undo the "good" changes this patch brings.

This is a never-ending scenario where the overall win is zero (plus or minus something).

I'm not saying we should stop any improvement if we have one bad result, but your bad result is worse than any other is good. This is still worrying, and may mean that the changes bring instability to passes after that (including back-end ones).

> I can only get the maximum throughput when autovectorization do try large vectorization factors. I do try to try optimize the trunc 4 x i32 with a custom LowerTruncStore, but afaiu without either an extra transformation or pass aarch64 backend can't really fuse the high vector instruction (xtn2 in this case) to the maximum throughout.  Something I am investigating is if selecting the largest VF for 'MaximizeVectorBandwidth' is the best strategy or if we should add an architecture hook to enable/disable it.

I believe the fact that LLVM gets this right with the extra flag is a side-effect of how the vectoriser used to work: keep trying until whatever. Very soon this will no longer be the case and the approach for this would probably end up in a VPlan or something.

While I understand the need to get it "faster" on benchmarks, I think we need to look at the bigger picture here.

> For geekbench side I will investigate on PDFRendering, but I really think it is missing vectorization tuning and I am not sure if we should consider it a block.

Until we know more about the reason bzip and PDF are so much worse, there is no way to know if this is a blocker or not.

https://reviews.llvm.org/D46283