[PATCH] D131028: [AArch64] Fix cost model for FADD vector reduction
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 5 14:14:15 PDT 2022
fhahn added a comment.
In D131028#3702132 <https://reviews.llvm.org/D131028#3702132>, @dmgreen wrote:
>> Yeah this is an unfortunate potential impact on the SLP vectorizer :(
>>
>> I doubt the improved costs here should make things *much* worse in practice and we already have the same issue with integer `add` reduction and `mla` IIUC. Should any negative impact materialize, I think we should work around an SLP issue in the SLPVectorizer directly, rather than through artificially inflating costs in TTI.
>>
>> It might also increase the incentives to properly addressing the issue :)
>>
>> The motivating use case for those improvements is using more accurate costs in other passes, like D131125 <https://reviews.llvm.org/D131125>
>
> Yeah - I worry that this might come up quite a lot. Adding floats together is pretty common, and multiplying them beforehand seems just as prevalent. I have this example, although it's maybe a little odd due to the extra shuffling in the loop: https://godbolt.org/z/3oqT1b58f.
Thanks for sharing the example! In this particular example with the patch we will use a vector fmul feeding an fadd reduction, but on a first glance this doesn't seem worse and maybe even slightly better overall. Here's the diff between the example with and without the patch (generated by `diff base.s patch.s`)
diff a.s b.s
17,21c17,19
< ldp s0, s1, [x10]
< ldp s2, s3, [x10, #8]
< ldr s4, [x10, #16]
< ldp s6, s18, [x11]
< ldp s5, s17, [x11, #8]
---
> ldr s1, [x10]
> ldur q2, [x10, #4]
> ldr q5, [x11]
25c23
< fmov s7, s6
---
> mov.16b v0, v5
27,34c25,32
< ldr s6, [x1, x13]
< fmov s16, s5
< fmul s5, s7, s1
< fmadd s5, s18, s2, s5
< fmadd s5, s16, s3, s5
< fmadd s5, s17, s4, s5
< fmadd s5, s6, s0, s5
< str s5, [x2, x13]
---
> ldr s4, [x1, x13]
> fmul.4s v3, v5, v2
> faddp.4s v3, v3, v3
> faddp.2s s3, v3
> fmadd s3, s4, s1, s3
> str s3, [x2, x13]
> trn1.4s v5, v4, v5
> mov.s v5[2], v3[0]
36,37d33
< fmov s17, s16
< fmov s18, s7
43,46c39,41
< str s6, [x11]
< str s7, [x11, #4]
< str s5, [x11, #8]
< str s16, [x11, #12]
---
> stp s4, s0, [x11]
> add x12, x11, #12
> str s3, [x11, #8]
47a43
> st1.s { v0 }[2], [x12]
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D131028/new/
https://reviews.llvm.org/D131028
More information about the llvm-commits
mailing list