[PATCH] D142359: [TTI][AArch64] Cost model vector INS instructions
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 16 05:10:11 PST 2023
dmgreen added a comment.
> I have tried SPEC2017 INT and SPEC FP, and the LLVM test-suite:
>
> - As we already knew, there's only 1 change in SPEC INT and that is x264 which is an uplift,
> - In SPEC FP, there is 1 change, which is a minor regression in 510.parest. It's small, but it's definitely there.
> - Nothing stands out in the llvm test-suite, and I don't see a regression in salsa,
>
> For SPEC INT and FP, overall it is a tiny win, but I wouldn't object to calling it neutral. The uplift is 2.5% and the regression 1.5%.
> So, the way I look at this at the moment, is that this is more an enabler.
>
> About the regression, I looked at the 2 hottest functions in 510.parest which together are good for 55% of the runtime and I see this pattern repeated in different places in both functions:
>
> Before:
>
> 54c16c: 2f00e400 movi d0, #0x0
> 54c170: 2f00e401 movi d1, #0x0
> 54c18c: 6d7f8c02 ldp d2, d3, [x0, #-8]
> 54c19c: fc637964 ldr d4, [x11, x3, lsl #3]
> 54c1a0: fc647965 ldr d5, [x11, x4, lsl #3]
> 54c1a4: 1f420080 fmadd d0, d4, d2, d0
> 54c1a8: 1f4304a1 fmadd d1, d5, d3, d1
>
> After:
>
> 54e3c8: 6f00e400 movi v0.2d, #0x0
> 54e3e4: 3cc10601 ldr q1, [x16], #16
> 54e3f0: fc627962 ldr d2, [x11, x2, lsl #3]
> 54e3f4: 8b030d63 add x3, x11, x3, lsl #3
> 54e3f8: 4d408462 ld1 {v2.d}[1], [x3]
> 54e3fc: 4e61cc40 fmla v0.2d, v2.2d, v1.2d
>
> I think this must be responsible for the minor regression. It is not terribly wrong, but I think it's just this high-latency LD1 variant that is not making this SLP vectorised code faster. This is funny, because we looked at the cost-modelling for this LD1 recently in D141602 <https://reviews.llvm.org/D141602>, but I had not spotted this LD1 here in 510.parest. I am curious to know why the SLP vectoriser thinks this is beneficial, so I will look at that.
>
> In the meantime, I was curious if you had additional thoughts on this.
I was expecting that to be similar to the other fmul+fadd vs fma issue we have seen elsewhere, but I'm not sure it is. Does it reduce the value to a single element again?
As for a few examples, this case is a little worse. I'm not sure if it is using bad costs, but the slp vectorization seems to reach through a phi and the result when put through llc is mostly just more instructions: https://godbolt.org/z/z6hnEcPG1.
Another case which is perhaps simpler is this "distance" one from cmsisdsp: https://godbolt.org/z/1xz7GP3fM. It looks like something might be being scalarized, but again I've not looked into the details.
There are some nice improvements too, if we can get the regressions hopefully fixed.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D142359/new/
https://reviews.llvm.org/D142359
More information about the llvm-commits
mailing list